Leading the way for Indian language standardisation
From 1983 when the first integrated Devanagari computer was developed, and till now there has been negligible progress in building the infrastructure for Indian languages on the internet; which has been hampering the Indian language landscape.
At Reverie our vision is to build an ecosystem that provides a path to facilitate language equality on the Internet and standardisation of Indian languages in the digital space.
Indian languages have always thrived in the physical realm, but their journey into the digital world hasn’t been smooth. While the internet promises to be a mass media, Indian language users face challenges with lack of a standardized typing tool or a keypad, clunky interfaces, limited fonts, and editing nightmares. The problem is as big as 63.2% (8,79,83,455) of Indian language literates facing issues in utilizing the digital space.
Standardisation will enable Indian language users to fulfill their Fundamental Rights on the Internet
The creation of Large Language Models (LLM) for English had the advantage of having a large amount of data available on the internet, but this is not the case for Indian native languages. The content is not easily searchable, and there is negligible data. The engagement for Indian languages on the digital medium is limited, which makes standardisation all the more necessary.
Delving deeper into the history of Indian language computing
The beginning
India’s first Devanagari computer was created and demonstrated in 1983
Indian language computing has been around for more than 50 years. The Indian Script Code for Information Interchange (ISCII) encoding standard and keyboard layout standard INSCRIPT were officially released by the Bureau of Indian Standards (BIS) in 1988. The ISCII standard document was not just a list of characters but covered every aspect of script properties. The principles and rules that govern the script behavior in computing were outlined to make the implementation unambiguous, intuitive and efficient.Â
The thriving ecosystem
System to ensure a thriving ecosystem of Indian language publishing and communication within computers
The analogy of ISCII with ASCII can be seen in the study for 7 years to decide encoding of a linear script that had all of 96 characters. When it came to Indian languages, it was about more than just scripts but also about so many languages. The purpose of Mr. RMK Sinha and his team was to create a base code of Indian scripts and languages that would not require computing users to reinvent the wheel. They would be able to deal with text and Indic Computing freely. The standard document itself contained recommendations for the arrangement and use of characters.
Early adopters
Widely adopted by Indian companies even before the first version of Windows
Several Indian companies developed software following the ISCII standard, and a font standard called ISFOC was also widely adopted, even before the first version of Windows. The GIST group was founded by Mr. Mohan Tambe to productize these technologies. The Indian government issued digital voter ID cards in the early 1990s, which were multilingual, and Indian railways started printing reservation charts in Indian languages. Vivekananda Pani worked on a DOS-based word processor called APEX language processor that supported all Indian languages, including spellchecker, based on ISCII.
Challenges with the current set of Standards
The major challenges rooted from the three standards that are followed to enable use of Indian languages on digital devices;
Unicode
Impacts the encoding of characters for various Indian scripts
In 1991, Unicode was released with characters used as standards in different parts of the world, including Indian scripts encoded in ISCII. However, Unicode made erroneous adoptions, such as the universal encoding listed in a representative column that was adopted as the Devanagari character set. Additionally, the standards document omitted Indian language properties, resulting in noisy data.Â
OpenType - Limiting India’s Font Industry
The font format that has the glyphs(shapes) and the rules that will apply for selection and placement of glyphs to display characters and combinations.Â
OpenType is a scalable font format jointly developed by Microsoft and Adobe that is widely used in Windows, Linux, and Android OS. It was the only font format suitable for implementing Unicode support for Indian scripts. However, OpenType has a long list of issues including the inability to make an unambiguous definition of rules for different languages, limitations in implementing appropriate text rendering behavior, and inability to handle anything beyond the rules. Designing and rendering OpenType fonts require expensive software and advanced hardware, which makes them prohibitive for the industry. Furthermore, rendering software that does not support OpenType, such as the Unity game development library, cannot be used for games in Indian languages.
INSCRIPT – The keyboard layout
It was designed for the 101 keys keyboard prevalent at the time of its development.
The devices of today (mostly mobile phones, tables etc. with touch keypads) are very different and do not pose the limitations of hardware keyboards. Moreover, the need for a common layout for all scripts can also be freed and efficient layouts that make each language typing faster in a layout specific to that language can benefit the users a lot. Tamil 99 is a great example and has been very widely adopted because of its efficiency in typing specifically for the Tamil language.
Reverie's wishlist for Indic language tech growth
Our CEO & C0-founder Vivekanand Pani talks about his wishlist for Indic languages, and challenges in developing Indian language technology.
The need of the hour
A growing movement is pushing for change. We at Reverie advocate for:
Revised ISCII
Update the ISCII to 16 bits to accommodate more characters for Indian scripts. This update should include separate code pages for each script and character sets taught in schools. Additionally, Unicode or any other character encoding initiative should adopt this Indian standard for Indian scripts used in India.
New font format
Indian scripts are complex and nonlinear, so a defined font format is necessary to avoid illegible display possibilities. A font standard should be defined based on the set of combinations that are finite and unambiguous.
Tailored keyboard layouts
The current INSCRIPT keyboard layout was developed based on 101 keys keyboard, but the devices used today are different. Thus, efficient layouts that make each language typing faster in a layout specific to that language can benefit the users a lot.
De-bundling of Operating Systems (OS) with language packs
This is necessary to facilitate larger proliferation of Indian languages’ content on the internet and help India break the .01% Indian language content presence on the internet.
Starting with young learners
It’s important to teach students about the three steps in learning the writing system for Indian languages: the Varnamala (alphabet), the barahkhadi (matras with consonants) and the yuktakshars (conjuncts). Students should also learn the necessary ways to use languages in document writing, editing, typesetting, aligning, and text features used specifically for Indian languages to write prose, poetry, accounting, and core subjects.
This is not just about technology; it's about:
Preserving cultural heritage
Promoting inclusivity in the digital world
Empowering millions to express themselves and connect in their own tongues
By embracing the unique needs of Indian languages, we can write a new chapter in the story of Indian language computing. Let’s create a digital world that truly welcomes the rich tapestry of India’s voices.
Jumpstart your localisation efforts with practical tips, tricks, and practices!
Internet for Indians, one language at a time
Despite having 22 official languages and hundreds of dialects, Internet […]
An entire generation of Indians has grown up unable to type in its own language on mobile phones
For many Indians who are still offline, language is a […]