Facebook researchers use maths for better translations

Facebook, Google and Microsoft as well as Russia’s Yandex, China’s Baidu and others are constantly seeking to improve their translation tools. (File/AFP)
Updated 13 October 2019

Facebook researchers use maths for better translations

  • Facebook researchers say rendering words into figures and exploiting mathematical similarities between languages is a promising avenue
  • Allowing as many people as possible worldwide to communicate is not just an altruistic goal, but also good business

PARIS: Designers of machine translation tools still mostly rely on dictionaries to make a foreign language understandable. But now there is a new way: numbers.

Facebook researchers say rendering words into figures and exploiting mathematical similarities between languages is a promising avenue — even if a universal communicator a la Star Trek remains a distant dream.

Powerful automatic translation is a big priority for Internet giants. Allowing as many people as possible worldwide to communicate is not just an altruistic goal, but also good business.

Facebook, Google and Microsoft as well as Russia’s Yandex, China’s Baidu and others are constantly seeking to improve their translation tools.

Facebook has artificial intelligence experts on the job at one of its research labs in Paris. Up to 200 languages are currently used on Facebook, said Antoine Bordes, European co-director of fundamental AI research for the social network.

Automatic translation is currently based on having large databases of identical texts in both languages to work from. But for many language pairs there just aren’t enough such parallel texts.

That’s why researchers have been looking for another method, like the system developed by Facebook which creates a mathematical representation for words.

Each word becomes a “vector” in a space of several hundred dimensions. Words that have close associations in the spoken language also find themselves close to each other in this vector space.

“For example, if you take the words ‘cat’ and ‘dog’, semantically, they are words that describe a similar thing, so they will be extremely close together physically” in the vector space, said Guillaume Lample, one of the system’s designers.

“If you take words like Madrid, London, Paris, which are European capital cities, it’s the same idea.”

These language maps can then be linked to one another using algorithms — at first roughly, but eventually becoming more refined, until entire phrases can be matched without too many errors.

Lample said results are already promising. For the language pair of English-Romanian, Facebook’s current machine translation system is “equal or maybe a bit worse” than the word vector system, said Lample.

But for the rarer language pair of English-Urdu, where Facebook’s traditional system doesn’t have many bilingual texts to reference, the word vector system is already superior, he said.

But could the method allow translation from, say, Basque into the language of an Amazonian tribe? In theory, yes, said Lample, but in practice a large body of written texts are needed to map the language, something lacking in Amazonian tribal languages.

“If you have just tens of thousands of phrases, it won’t work. You need several hundreds of thousands,” he said.

Experts at France’s CNRS national scientific center said the approach Lample has taken for Facebook could produce useful results, even if it doesn’t result in perfect translations.

Thierry Poibeau of CNRS’s Lattice laboratory, which also does research into machine translation, called the word vector approach “a conceptual revolution.”

He said “translating without parallel data” — dictionaries or versions of the same documents in both languages — “is something of the Holy Grail” of machine translation.

“But the question is what level of performance can be expected” from the word vector method, said Poibeau. The method “can give an idea of the original text” but the capability for a good translation every time remains unproven.

Francois Yvon, a researcher at CNRS’s Computer Science Laboratory for Mechanics and Engineering Sciences, said “the linking of languages is much more difficult” when they are far removed from one another.

“The manner of denoting concepts in Chinese is completely different from French,” he added.
However even imperfect translations can be useful, said Yvon, and could prove sufficient to track hate speech, a major priority for Facebook.


Twitter sets out plans for banning political ads

Updated 15 November 2019

Twitter sets out plans for banning political ads

  • Rival Facebook Inc, saying it did not want to stifle political speech, has steadfastly refused calls from some politicians and others to follow Twitter’s lead
  • Twitter said it will use a combination of automated technology and human teams to enforce the new ad policies

WASHINGTON: Twitter Inc. on Friday laid out its plan for banning political ads just as campaigns for the 2020 US presidential election heat up, and for banning ads that advocate for a certain outcome on social and political causes.
Twitter said last month that it would ban political advertising, as social media companies have faced growing calls to stop accepting ads that spread false information and could sway elections.
Twitter said it will define political content under its policy as anything that references “a candidate, political party, elected or appointed government official, election, referendum, ballot measure, legislation, regulation, directive, or judicial outcome.”
“We believe political message reach should be earned, not bought,” Twitter Chief Executive Jack Dorsey said in announcing the ban.
Rival Facebook Inc, saying it did not want to stifle political speech, has steadfastly refused calls from some politicians and others to follow Twitter’s lead, and said it would not vet political ads for misleading claims on its site.
The ban, which is expected to take effect on Nov. 22 and includes ads from political candidates, political parties or government officials themselves, was initially derided by US President Donald Trump’s reelection campaign.
The popular social media platform will allow companies and advocacy groups to run ads that promote awareness and discussion about social causes, such as environmental protection. But they will not be allowed to push for a certain political or legislative change on the issue, especially if they are advocating for something that benefits their business, Del Harvey, vice president of trust and safety, said in a conference call on Friday.
Under the new policy for example, Sierra Club or gun rights advocates could still promote their causes, but they would not be able to single out politicians they support or target those they would like to see defeated in elections, or lobby for political outcomes.
Advertisers who wish to run ads that promote awareness about a cause will be able to target users at the state level or higher, but not by their zip-code. And those advertisers will not be able to target people based on their political leanings, Twitter said.
Twitter said it will use a combination of automated technology and human teams to enforce the new ad policies.
It said it sought to make the new rules as clear as possible. But other major tech companies, including Facebook and Alphabet Inc’s Google, have had widely publicized struggles to moderate the vast amount of content uploaded to their sites.
News publishers that meet certain criteria will continue to be able to run ads on Twitter that reference political content, but they cannot advocate for or against a political topic.