2017 In Language Technology – The Indian Internet’s Second Big Turning Point

Even as 2017 draws to a close, the steady march of technology continues at brisk pace, ever gaining momentum.  Included in its orbit is the field of language technology.

 

A few years ago, the landscape of India’s internet was fundamentally changed by the explosion of mobile users coming online, and we’re now seeing a similar phase of rapid growth – this time powered by Indian languages.

 

Language technology has seen itself take several large strides this year as well, with advances and its own milestones marking how far it’s come.

 

Here’s a summary of some of the more important leaps forward that have happened this year in language technology, and their implications for the average Indian.

Indian Languages & Language Technology Grow Together

 

Quantifying Indic Language Reach Online


In April 2017, Google and KPMG released a report, Indian Languages – Defining India’s Internet, on the presence and reach of Indian languages online.

The key takeaways included the fact that Indic language internet users (234 mil) have already surpassed English users (175 mil), and that this trend will only accelerate as time goes by. 90% of Indians coming online for the first time over the next 5 years will do so in their own language, bringing those numbers to a projected 536 mil Indian language users vs 199 English users.

 

Sometimes, these stats can be surprising. According to the Digital Indian Language Report by Reverie Language Technologies, Hindi, Marathi, and Gujarati are the 3 most used Indian languages online, even though Gujarati is not among the top 3 Indian languages by native speakers.

 

Caged in by the lack of language localisation and services in Indian languages, these Indian language users have so far stuck to low friction verticals – Reverie’s report lists social media, messaging, browsing, and entertainment as the verticals these users use the most – but that will change, as companies start building solutions that target this user base.

 

Which brings us to the next development.

 

India’s Mobile Language Mandate

 

In a significant push forward, the Government of India mandated digital Indic language support in 22 languages for all mobile devices in India, a push decisively in favor of their increased digital presence. Once this mandate goes into effect (Feb 1st, 2018), all new phones in India will have to support all 22 official Indian languages, as well as input functionality in at least two Indian languages.

 

One of the larger implications of this move will be that device support that caters to India’s non-English speaking population – over 1 billion people – will become a prerequisite for digital devices, a new default setting.


With the proliferation of cheap data plans and affordable handsets, one can only imagine how this move could impact the country’s internet in the longer run.

 

Digital Government Services

 

In addition, the Government of India has been pushing for more government services to be available online. As internet penetration increases, the internet increasingly becomes both an outreach platform for and a facilitator of government services.


State governments are also pushing for language localisation across digital platforms, both on smartphones, and on websites.

 

The Government of India’s UMANG (Unified Mobile Application For New-Age Governance) app was unveiled in November this year, and it came with support for 12 Indian languages. UMANG’s nature as an all-in-one government app that lets citizens find and access other government services means that UMANG’s language support will facilitate easier access with government services in general.

 

BHIM, with its mission of bringing digital payments to the masses, was also built keeping accessibility in mind. It was released with support for multiple Indian languages, ensuring that the average Indian citizen would have as much access to digital payments as their English speaking, upper middle class fellow countryman.



Machine Learning & Voice Search

 

One of the biggest developments that marked this year in language technology was the advent of machine learning and voice search.

 

Machine learning helps power more accurate, precise translation, something that’s essential for localising content at scale. It allows translation systems to learn from millions of examples and patterns and continuously improve the naturalness of its translation. Indian languages have certain linguistic quirks that can confuse translation systems otherwise, like stark differences in formal and colloquial vocab. Water for example, can be jal or pānī depending on formality, and the wrong variant would sound horribly out of place.  

 

Voice search, of course, lets users find content by allowing them to speak to their devices. Indians who are coming online for the first time may be more comfortable searching by voice than typing, since Indic language typing would be something completely new to them. Voice, on the other hand, isn’t. According to Google’s own data, 28% of Google searches done in India are powered by voice queries.

 

Building Solutions – Challenges Involved

 

Tech companies are finally waking up to the fact that Indian languages need digital support too, and that involves creating a user experience that is completely optimized for Indian languages – merely providing a suboptimal, patchwork user experience won’t do.


Developing language tech, however, comes with its own numerous challenges.

There’s a very real scarcity in actual resources for building digital support for Indian languages.
The European Union for example, has EuroParl, a database of corpora (parallel language vocab data) for multiple European languages. Indian languages have nothing comparable.

 

This means these resources for Indian languages have to be built from the ground up, something that will pose a challenge for language technology.

 

It’s an exciting space to be in, as there are a whole host of problems to be solved, and whatever solutions are built will end up impacting the lives of hundreds millions of Indians, forming an essential part of their daily lives. As the reach of technology grows and impacts everyday life more and more, language technology will also continue to grow.

 

If there’s one thing that 2017 has taught us, it’s that if you’re building solutions for India, you should ignore Indian languages at your own risk.