Dec 27, 2022
8 min read
At Reverie Language Technologies, irrespective of who we are, where we are from, and the job positions we hold, we all work towards achieving the same goal with singular focus.
And, that is the mission based on which Reverie was founded: achieving language equality on the Indian Internet in the digital medium, especially the Internet in order to benefit the vastly ignored local-language customer base.
We might work as a single unit trying to solve this hard problem with language technologies and multilingual solutions. But, it takes more to achieve this mission.
To achieve language equality on the Indian Internet, there are three key factors to be considered:
- Robust language ecosystem
- Affordable, language-friendly devices
- Availability and reach of local-language content on the Internet
Robust language ecosystem
Localisation of content on the Internet is heavily dependent on the accessibility of a robust infrastructure that enables available language technologies to aid in content transformation.
Globalisation, localisation, internationalisation, translation, and transliteration are some of the language services that current language industry leaders provide. Collectively, these services make up for a well-rounded language market infrastructure for catering to the local-language audience. Such services require high-precision sub-specialties, expertise, and advanced software to support the delivery of accurate and relevant language solutions.
Language industry stakeholders
Language service providers (LSPs) help make businesses’ products and services available to the multilingual audience. They offer expertise on language, customs, and culture of target markets. These service providers adapt both written text and spoken information to a wide variety of local-language consumers.
The evolving language technologies enable software developers to create tools for multilingual content mining, speech recognition, translation memories, machine translation, analytics, and more, which are crucial for a lesser time-to-market for organisations aiming to go multilingual.
Although the above-mentioned language technologies exist, manual intervention is imperative to maintaining the accuracy and nuances of languages. In-house localisation teams comprising linguists, interpreters, and translators work in tandem with LSPs to achieve accurate localisation of content.
Language publications, research analysts and training institutes are a fundamental part of the language industry. Researchers, academic programs, specialized publications, and training companies largely contribute to the language market infrastructure.
The Indian language market infrastructure is still evolving. Several Indian language service providers and language technology companies have found solid ground in the market. However, the lack of extensive corpus in Indian languages poses a heavy problem.
The Indian government also plays a crucial role in on-boarding millions of local-language literates in India digitally. Under the Digital India initiative, the government of India has launched a significant variety of policies and resources related to localisation and language solutions.
Adequate skill training
The advent of language technologies necessitates the skill training required to effectively operate and consume them.
Providing such skill training in local languages would ensure the inclusion of local-language users equipped with the knowledge and understanding of local languages that would be beneficial in building accurate language solutions.
The Indian government has already begun to take steps to train a targeted segment of the Indian population for availing employment opportunities in the IT/ITES sector.
Affordable, language friendly devices
The abundance of cost-effective smartphones and generous demand for mobile computing have contributed to the proliferation of smartphone usage in India.
In order to accommodate the growing base of local-language smartphone users and to on-board millions of unconnected users, smartphones with multilingual support would need to be made more affordable in tier-II and tier-III cities.
Projection of smartphone usage in India until 2019
By 2017, the number of smartphone users in India, projected to be around 244 million, is expected to surpass that of the United States, which is projected to be around 220 million. The smartphone penetration rate in India is projected to reach more than 20% by 2018.
The number of smartphone users worldwide is projected to amount to nearly 2.7 billion by 2019. Over a third of the total global population is expected to own a smartphone by 2017.
Localisation of mobile applications
Since the launch of smartphones, the user base in India has witnessed an explosive growth in the mobile phone market. Because this user base is estimated to grow in millions in the forthcoming years, localised mobile apps would become imperative to cater to local-language users at scale.
The Indian government recently mandated all mobile phones to facilitate local-language support.
Content on device-based apps and mobile web browsers, such as mobile search, system information, user interfaces, web applications, calendars, e-mailers, notifications, alerts and messages, GPS, and so on, will be first on the list to be localised for consumption.
However, mobile app localisation is not without challenges. The Indian language scripts have roots in ancient Brahmi and Perso-Arabic script families. Owing to a large set of vowels, symbols and consonants, the possibilities of syllable formation are infinite. Therefore, the conjunct formation can sometimes be even omnidirectional, leading to complications in display and font rendering.
Some areas that present said challenges are:
- Display: Context and overview of local-language display may be lost to the limited screen size of the mobile. Mobile devices often have limited support for font sizes and only few fonts. In addition,illegible fonts hinder readability.
- User input: Because the available local-language keypads mimic the layout of the English-language keypads, intuitive user input poses a challenge, owing to the numerous possibilities of syllable and conjunct formations.
- User evolution: Local-language mobile users in India are largely a part of first-generation digital users. Whereas the English users have had around 20 years to get accustomed to the digital landscape, Indian local-language users still require training to adapt.
- Lack of standardisation: Indian local-language scripts have not been unanimously standardised. Inconsistencies occur in the form of assigning Unicode encoding for wrong permutations and combinations of syllable formations.
This gap in language computing is still being addressed.
Availability & reach of local language content on the Internet
World Bank studies suggest a 1.3% increment in GDP when Internet connectivity is increased by 10%.
The availability and accessibility of local-language content on the Internet through content creation, publishing and distribution is key to sustaining the digital language ecosystem.
“The preference of the Indian consumers towards regional language content is constantly on the upswing, with 93% of the time spent on videos in Hindi and other regional languages. With an increasing number of users having their own unique needs waiting to be served by technology, support for regional languages becomes imminent for smartphone makers,” says Google.
In order to create a local-language parallel of existing app or Internet content, there can be three potential approaches for converting English content to local languages:
- Manual translation: Because of the sheer volume of content, manual translation is not a feasible solution given the high costs and time required to complete the task. Moreover, keeping multiple copies of the same content is inefficient in long term.
- Generic machine translation: Translation engines from Google, Bing etc. are generic and translate the content without analysing the content for context. In languages, intent and context are very important. For example, the word ‘Play’ could mean different things in a sports context or a musical context. Though these engines address almost all domains and content types, they are still largely context-unaware because of which they are able to provide 40-50% accuracy at best. This can do more harm than good as a 50% accurate translation gives a negative experience to the end user who is trying to use and understand the nuances of mobile phones and internet.
- Context-aware/domain-specific machine translation: An engine that is context-aware, providing domain-specific translations aided by specific dictionaries can be highly accurate and can offer a holistic experience to local-language users.
Language first user experience
In order for localisation to be effective, it has to be incorporated right from the start by adopting a language-first user experience.
There is a rising demand to create websites and apps with completely localised content.
A vast majority of the websites and apps currently provide only partially-localised content. The most accessed pages of website are in local language; however, deeper content is still largely available only in English.
The maturity of localisation features offered should be enhanced by a bid to provide a better user experience.
Current local-language deployments are still only a part of the value chain, not the complete value chain. For instance, advertisements in local language are incomplete unless the services are completely accessible in local language.
Search, discoverability, and distribution of local language content
The digital input behaviour for Indian languages is not similar to Latin-based scripts such as English.
The InScript keyboard developed for Indian scripts was built on a QWERTY-based keyboard that caters to the typing needs with a simultaneous usage of 10 fingers.
In mobile environments, typing with one, or at most, two fingers is deemed more comfortable, owing to the minimised screen size. Therefore, it is more challenging to design a keypad that works on mobile phones for easy and intuitive typing for all the Indian scripts on the same screen.
Present devices come with Unicode encoding. However, Unicode has allowed archaic characters, which are not in use in present day. This calls for the standardisation of digital Indian scripts.
Since Indian scripts are complex in nature, the display output should be 100% accurate. Therefore, the fonts used for each script should be legible, clear and be equipped with the authentic shape associated with the script.
Search and display of local-language content should be made intuitive and easily accessible to local language users. Distribution of local-language content is imperative to reach the users at scale for easy discoverability.
What’s next to achieve language equality on the Indian Internet ?
Many businesses are just about waking up to the fact that the English-literate population of India makes for only 10-12%. Because their online services are predominantly in English, their user base covers a very tiny portion of a huge, largely untapped market base.
Will they wake up in time?