This article was originally published on INDIAai on Feb 26, 2021.
Highlights
Breaking the misnomers – how automation and CAT tools can empower translators to deliver high volume translation projects like the National Language Translation Mission (NLTM)
“2021 is the year of many important milestones for our history. I mention a few of these: It is the 75th year of Independence; 60 years of Goa’s accession to India; 50 years of the 1971 India-Pakistan War; it will be the year of the 8th Census of Independent India; it will also be India’s turn at the BRICS Presidency; the year for our Chandrayaan-3 Mission; and the Haridwar Maha Kumbh.”
This is an excerpt from the budget speech of 2021 by Finance Minister Nirmala Sitharaman. The speech, apart from having this milestone defining paragraph, had one that grabbed my attention which was tucked between boosts to increase digital payments, and boasts of our reach in space.
“We will undertake a new initiative – National Language Translation Mission (NLTM). This will enable the wealth of governance-and-policy related knowledge on the Internet being made available in major Indian languages.”
Internet users in India have grown from 7.5% in 2010 to 34.4% in 2020. Every third person now has access to the Internet, most likely has a Whatsapp account, watches Youtube and checks the score of a cricket match, not by standing in front of a television store, but from his/her handheld device.
Ours is a country of diversity, and we take pride in this fact. Despite the stark differences, we have learned to work around this, somehow we find our way to work together and showcase brilliance, take for example the recent series win against Australia, in the fourth and final test match. The winning team lineup of 11 had players from 8 different states, and with that 8 different mother tongues, and yet they came together and won.
The Internet has been one of the greatest innovations of human history, it levels the playing field, rich or poor, man or woman, everyone has the same access to resources and information. But does it really? What is the Internet of any use to one who does not know English? Having access to the Internet but not knowing English is like standing in front of a vault of gold, but not knowing the combination that unlocks it. For us Indians, this challenge is multifold, we are a country of multiple languages and dialects, our Constitution recognizes 22 languages as official languages in it’s 8th Schedule. Hence, we need to create 22 key combinations, for making sure having access to the Internet in India truly becomes a level playing field.
National Language Translation Mission (NLTM) is therefore a welcome initiative, and a much needed one, it has potential to be a game changer in information dissemination. The task now is to make sure the massive efforts that follow this announcement are coordinated and not in vain.
Technical Blueprint to deliver NLTM
NLTM talks about government-and-policy related knowledge. We have 54 ministries, and each has its own set of policies, notices, reports which are to be published in major Indian languages as part of this mission, apart from the numerous circulars the government comes up with on a day to day basis. This is a massive effort, and traditional ways of doing translation which is to manually re-create the document in another language will make this herculean task more difficult than it should be.
Translation involves one to read, comprehend, translate and write. While reading is the easier of the tasks, comprehending and translating it into another language requires more than one skill.
- One, to know enough vocabulary of the target language so as to do justice to the articulation.
- Two, to have the tools and ability to write in the target language (keeping in mind this happens in the digital medium, as doing a manual effort of writing on paper and then typing it out is out of question).
- Three, to ensure the spellings in the target language are correct.
- Four, keep the format of the target document the same as the source. The task of converting a standard simple vocabulary document of 10 pages, takes roughly 48 hours for a person to translate manually.
This time taken to translate can be brought down significantly by use of technology. The following are some scenarios where technology expedites the strengths of a translator, and fills in gaps where there are deficiencies.
Breaking the misnomers – how automation and CAT tools can empower translators
We, humans, in our biological being are neither the strongest, nor the fastest. It is our capacity to innovate that has made impossible milestones achievable. The invention of the personal computer has proven to be the bicycle for the mind moment.
Computer Assisted Translation (CAT) tools are one such boon to the translators, it makes the job at hand easier. It does so by assisting in multiple repetitive tasks so that the translator can focus on the quality of the output. CAT tools have been around since the mid 1980s, 40 years since there have been numerous additions to its feature sets, helping translators increase their productivity. Machine translation itself has improved productivity by more than 74%.
The sheer volume of existing policy related documents, articles etc, is massive, and has sensitive content. Accuracy of translation has to be upto the mark, as mistakes can cause miscommunication and lead to unwanted legal issues. Machine translation in its state today is not upto the mark to be trusted with just itself, it needs human intervention.
Apart from the massive volume of translations that needs to be done for the content that exists today, none of the websites or departments function without updating content almost on a daily basis. Translations of such volumes and managing the fluidity manually will be impossible, and previous initiatives have had to be dropped. Use of technology not just makes it possible but also identifies a lot of updated content regularly that may have been left without translations. Thus it actually grows the demand. Take for example, the US translation industry, the projected percent change in employment in the translation industry sector is set to be 20% from 2019 to 2029 according to the U.S Bureau of Labour Statistics. The average growth rate for all occupations is 4%. This shows that with CAT tools and automation in fact help in growing the industry, with faster turnaround times and high quality.
By making this pool of information available in Indian languages covers availability. For a successful citizen services deployment there are other hurdles to cross.
Discoverability is one big challenge. The ability to search on websites and documents in English is taken for granted, but searching in Indian languages is nowhere near the ease of English. For starters, most of us on the Internet do not have access to standard typing methods, our workstations are designed with English as a medium of communication in mind. One has to put additional effort in order to type in their own mother tongue. Even after one does so there are inherent gaps in the Unicode standards and the implementation of rendering technologies in today’s modern computers that allows for mistakes in typing Indian languages which inherits search issues.
The average Indian citizen does not know English, and is not tech savvy to look up and search on the internet, the form factor on which the next wave of Internet users are going to come is not quite suited for easy typing as well, this calls to action to create intuitive voice bots for citizen services in Indian languages for it to be successful.
The thought behind NLTM, is one with good intentions, it aims to bridge the language divide that exists in information dissemination from the Government’s point of view. Translating government-and-policy related knowledge into major Indian languages and making it available on the internet is a welcome step, if done correctly it will have a significant impact.