In the digital age, when we face a language barrier, there are a host of internet resources to overcome it. We often rely on translation platforms, dictionary apps, Wikipedia in other languages, or the simple “click to translate” option on many websites.
This holds true even when facing an emergency, say a flood or wildfire in a French- or Spanish-speaking part of the world. In such areas, you can rely on machine translation to get information in English about the areas affected or how to stay safe. You would be able to find the information you need, when you need it.
But if you were a Kanuri speaker in northeast Nigeria or a Rohingya speaker in Bangladesh, you almost don’t have that option. In 2019 more people than ever in low- and middle-income countries own a mobile phone and have access to the internet. Yet life-saving information is often not available in a language they understand.
Language technology offers solutions. Tools such as machine translation, natural language processing, and speech recognition technology can bridge the digital divide, while also presenting opportunities for better and more accountable aid to those who need it most.
Language Translation Technology Today
Language technology has made huge strides in recent years, but really only in the languages of international politics and business. And while language technology encompasses a wide breadth of tools and processes, there is one advancement in particular that can help cross the digital divide: machine translation.
This technology is available for major international languages. But it is basically non-existent for marginalized or “low-resource” languages. These are languages that are widely spoken but have inconsistent language technology support mostly due to low levels of existing digital content (if any).
Speakers of these languages often belong to less prosperous and powerful geographical regions or ethnic groups. In turn, this makes them more vulnerable when a crisis hit.
Two Examples of Translation Challenges
For example, Hausa, a lingua franca across much of Western Africa, is spoken as a first language by 44 million people. It has some support in Google Translate, Wikipedia, Facebook, and so on.
But Kanuri, a language spoken by almost 8 million people has no presence in Google Translate or Wikipedia. Yet, many Hausa and Kanuri speakers are experiencing one of the world’s worst humanitarian crises in northeast Nigeria, and need information in their own language.
Similarly, Rohingya, an oral language spoken by around 2 million people and without a standardized script, lacks any language technology support. Yet, Rohingya is the only spoken language that all Rohingya people living in the world’s largest refugee camp in Bangladesh understand and prefer.
Despite the significant numbers of speakers and their critical communication needs, these languages are underrepresented online as well as in the language industry.
Gamayun, The Language Equality Initiative
For language technology to function in the mother tongues of people most affected by poverty or humanitarian emergencies around the world, it needs a substantial boost. Translators without Borders is spearheading Gamayun, the language equality initiative to make it happen.
We are working with technologists, native speaker communities, aid organizations, and donors to develop scalable language technology for marginalized languages.
In order to be effective, language technology requires huge amounts of translated and aligned content in the languages of interest. This is known as parallel data. One can easily find publicly available parallel data sets for languages like French and Spanish. But this is not the case for languages like Kanuri or Rohingya.
Yet, the overall state of language technology and previous efforts such as developing machine translation from scratch for Haitian Creole tell us that even small datasets can have a large impact. This is why Gamayun focuses on building voice and text datasets for machine learning in marginalized languages. Multilingual glossaries and other parallel text data provide the input we need to train a machine translation engine.
Humanitarians already collect a lot of potentially useful parallel text data, such as through needs assessments. Gamayun can use that text data to better analyze information from speakers of marginalized languages. Radio dramas and audio recordings, such as those on disease prevention, can be the foundation of text-to-speech models to reach people with low literacy.
At Translators without Borders, we are tackling the problem by building a core dataset of content in English and using it to proactively build parallel data. Engaging native speakers of low-resource languages and broadening our community of translators is key. Their involvement will help us to develop machine translation integrations that aid organizations can easily scale in times of need.
We need to be clear about the goals. It might be too hard to build a machine translation engine for low-resource languages comparable to Google Translate for French or Spanish. Instead, it might be more efficient to adapt existing technology to address language barriers in specific contexts. For example, we could use technology to conduct social media monitoring in low-resource languages to know which rivers are flooding or which villages are affected by a disease outbreak.
Following this approach, Gamayun focuses on fit-for-purpose language technology that can make a significant difference in the reach, effectiveness, and accountability of aid efforts in multilingual contexts.
Gamayun Initiative Benefits and Impact
Where innovative approaches to aid and language technology meet, we are starting to see some critical advancements for crossing the digital language divide.
For example, to support the World Food Programme’s food security monitoring system in the Middle East, Translators without Borders built a machine translation prototype in Levantine Arabic. The World Food Programme tested it to analyze unstructured data and feedback from the Syrian population who speak this dialect of Arabic. Initial testing indicates that this engine will provide an efficient method for humanitarian organizations to access local media sources and better understand the needs of their target populations.
Taking a step back, the actual reason we work on language technology is to break down barriers, in the aid sector, but also beyond.
- We want to build translation engines that enable people to read news that was not written in their language.
- We want to enable them to ask questions about their health when they don’t have access to a doctor who speaks a language they understand.
- We want to enable them to inform humanitarians and other service providers about their needs in a crisis situation in the language they are most comfortable with.
Above all, we want everyone to have the same technological opportunities to have their voices heard and understood.
Get Involved with the Gamayun Initiative
Most people around the world speak a marginalized language. We need to make digital technology work for them. That’s not to say that it can happen overnight, but we cannot wait for the perfect tool or the next crisis to get started.
Whether you’re a nonprofit organization, business, or government, you can join us in closing the digital language divide:
- Let us know if you have a use case for innovative language technology that you would like to collaborate with us on.
- Help us grow our voice and text datasets by giving us access to content in marginalized languages.
- And if you’re attending the NetHope Global Summit later this year, make sure to join the session on language technology.
Written by Grace Tang, Gamayun Project Manager, and Alp Öktem, Computational Linguist, at Translators without Borders
Good points you raise. Unfortunately it’s another case of trying to make an omelette without breaking the egg.
What value does language translation bring to a farmer in the village? Google translate is a good project, but beyond being a pet or hobby project, it’s a simply another demonstration of how dumb machines are.
By the way even if a language has 5 people, there ‘s a need. Adam Smith and others already solved this problem with demand and supply, at least that what I learnt in high school economics.
To be clear, am not against what you are advocating for. The next cure for cancer might come from language translation.
Artificial intelligence has been around for more than 50 years and has failed, or at least failed to make any breakthrough till recently.
Language translation will never be solved by computers, okay that is a bold statement. In my language a head has at least 5 or more names, depending on which part of the head, in English the closest can be forehead, face and that’s it.
I see a need for language translation, the example you five are very bad. Google, Facebook and others can splash billions due to their large resource envelopes but they will never solve the language conundrum.
For many countries, tourism is one of the biggest sources of revenue. To attract more visitors websites need to be translated, however using a language translator tool is one of the worst ways you can go about this. You need a human translator.
The next thing we shall complain about is not having minorities catered for in emojis.
Automation is good, without a shadow of doubt households used to have about 5 helpers in the 1920s. Now with a microwave the size of a small box, these jobs have vanished.
The jobs of human translators, will never vanish at least for the next 100 years, unless some genius kid solves this problem. The golden child who will solve this is probably in some refugee camp in Syria, or was abducted by boko haram.
Google, Facebook won’t solve this, how do I know this, well anyone remember google circles, how quickly did it die.
Many organizations talk about innovation today, but do they have a clue how innovation works. Innovation comes from solving a need, it’s not manufactured from thin ice.
Thanks for raising this topic. The smart people will have better ideas than mine, servus.
There’s no mention of work done by Christian missionaries, who were pioneers of transliteration and translation of many smaller languages in Africa. Your examples are of languages spoken by primarily Muslim populations, so these potentially rich sources of insight may not have come onto the radar in the Gamayun initiative. Many early missionaries were highly educated, some having honors degrees from Cambridge or Oxford, and they documented their work in materials now held in mission and some university archives. More recently, missionaries have pioneered computers in translation work: as early as the 1980s, Bible translators were using portable laptops with solar chargers to document languages in remote parts of Kenya.
Thanks for the interesting post. How do we get in touch with you Grace? I have some translations that might be of interest to you.
Thanks, Marga! I can be reached at [email protected]