Dec 20, 2018
3 min read
Making #AIForAll A Reality
Recently, Team Reverie was invited to take part in NITI Aayog led workshop titled Bhārat NLP. This workshop was part of the Government’s wider #AIForAll initiative. #AIForAll is an initiative intended to facilitate the building and scaling up of Indian AI capability in the Indian startup ecosystem, which can then be deployed at scale across government and enterprise services to help hundreds of millions access them better..
In a document stating their vision for #AIForAll, NITI Aayog outlines the 5 main areas they have chosen to focus on, for AI solution development – healthcare, education, agriculture, smart cities, smart mobility.
Artificial intelligence has been used to power solutions and solve multiple problems and pain points for people for years now. NLP, or natural language processing, harnesses the power of AI to work with language. NLP can be applied to multiple use cases like taking text and analyse the intent of who wrote it, or automating translation from one language to another. With the Indian internet constantly growing, the scope of NLP’s use keeps expanding as well.
The first phase of the #AIForAll initiative will involve building & scaling an Indic NLP toolkit.
Which brings us to our next point. We have consistently stressed upon the Indian internet’s inherently Indian nature, given that more than half of Indian internet users use the internet in their own language and not English.
Bhārat NLP Stack & The Indian Internet
While NLP has great potential to help us understand and work with our digital content dominated world, it has primarily only been used with data that’s in English. Just like most other things related to India’s world of tech, language still serves as a large barrier. NLP hasn’t made significant waves when it comes to understanding and serving Indian language internet users.
The Government of India, realizing the need for a change, kickstarted its Bhārat NLP initiative led by NITI Aayog. This initiative seeks to build a tech stack for NLP in Indian languages, making the power of AI available to Indian language internet users.
This workshop was primarily a platform for leaders in the field to share insights on how to collaboratively build an NLP language stack. While companies have years of expertise in building language technology solutions for India, NITI Aayog will pool in resources from various government bodies as a common platform, a platform companies will be able to contribute to in the interest of the greater Indian language community.
NITI Aayog and Microsoft had organized this Bhārat NLP workshop to exchange ideas on building and maintaining this Indic NLP stack. Team Reverie was invited to deliver a keynote speech.
Building An Indic NLP Stack
This vision includes a Bhārat NLP language technology stack that can then be used by companies and individuals alike to help power existing solutions. The NLP stack will add Indian language capability to these existing tools, expanding new frontiers for them. Companies will be able to focus on building solutions without having to build an NLP stack from scratch for Indian language users.
In addition, this will make more digital services available to Indian language users. Existing services can easily be made available in Indian languages just by taking advantage of the Bhārat NLP stack.
Creating NLP based solutions for Indian languages will involve its own set of challenges. While researchers have been working with English language data for decades, NLP work with Indian languages is still in a fairly nascent stage. The Bhārat NLP project can potentially function as an accelerator for the growth of Indian language NLP.
Collaborative Database Building
Building NLP for Indian languages has to be accompanied by the collection and indexing of data in Indian languages, with equivalents across languages. For example, data in Kannada should include English equivalents. Since data collection is a massive, time intensive task, a common, government led platform will mean that companies can pool in their efforts into one shared database, almost like a crowdfunding exercise for Indian language data. This data crowdfunding is for the larger community, instead of individual companies working on these projects alone. The community needs all the help it can get.
We are optimistic that Team Reverie, with our years of expertise in working on building language technology solutions for India, can play a role in building this language stack for Bhārat NLP, and by extension, help the larger Indian language community. We’re excited to be a part of the Bhārat NLP project and eagerly await the next step in the process!