Leading the way for Indian language standarisation

From 1983 when the first integrated Devanagari computer was developed, and till now there has been negligible progress in building the infrastructure for Indian languages on the internet; which has been hampering the Indian language landscape.

At Reverie our vision is to build an ecosystem that provides a path to facilitate language equality on the Internet and standardisation of Indian languages in the digital space.

Indian languages have always thrived in the physical realm, but their journey into the digital world hasn’t been smooth. While the internet promises to be a mass media, Indian language users face challenges with lack of a standardized typing tool or a keypad,  clunky interfaces, limited fonts, and editing nightmares. The problem is as big as 63.2% (8,79,83,455) of Indian language literates facing issues in utilizing the digital space.

Standardisation will enable Indian language users to fulfill their Fundamental Rights on the Internet

The creation of Large Language Models (LLM) for English had the advantage of having a large amount of data available on the internet, but this is not the case for Indian native languages. The content is not easily searchable, and there is negligible data. The engagement for Indian languages on the digital medium is limited, which makes standardisation all the more necessary.

Delving deeper into the history of Indian language computing

India’s first Devanagari computer was created and demonstrated in 1983

Indian language computing has been around for more than 50 years. The Indian Script Code for Information Interchange (ISCII) encoding standard and keyboard layout standard INSCRIPT were officially released by the Bureau of Indian Standards (BIS) in 1988. The ISCII standard document was not just a list of characters but covered every aspect of script properties. The principles and rules that govern the script behavior in computing were outlined to make the implementation unambiguous, intuitive and efficient. 

System to ensure a thriving ecosystem of Indian language publishing and communication within computers

The analogy of ISCII with ASCII can be seen in the study for 7 years to decide encoding of a linear script that had all of 96 characters. When it came to Indian languages, it was about more than just scripts but also about so many languages. The purpose of Mr. RMK Sinha and his team was to create a base code of Indian scripts and languages that would not require computing users to reinvent the wheel. They would be able to deal with text and Indic Computing freely. The standard document itself contained recommendations for the arrangement and use of characters.

Widely adopted by Indian companies even before the first version of Windows

Several Indian companies developed software following the ISCII standard, and a font standard called ISFOC was also widely adopted, even before the first version of Windows. The GIST group was founded by Mr. Mohan Tambe to productize these technologies. The Indian government issued digital voter ID cards in the early 1990s, which were multilingual, and Indian railways started printing reservation charts in Indian languages. Vivekananda Pani worked on a DOS-based word processor called APEX language processor that supported all Indian languages, including spellchecker, based on ISCII.

Challenges with the current set of Standards

The major challenges rooted from the three standards that are followed to enable use of Indian languages on digital devices;

Impacts the encoding of characters for various Indian scripts

In 1991, Unicode was released with characters used as standards in different parts of the world, including Indian scripts encoded in ISCII. However, Unicode made erroneous adoptions, such as the universal encoding listed in a representative column that was adopted as the Devanagari character set. Additionally, the standards document omitted Indian language properties, resulting in noisy data. 

The font format that has the glyphs(shapes) and the rules that will apply for selection and placement of glyphs to display characters and combinations. 

OpenType is a scalable font format jointly developed by Microsoft and Adobe that is widely used in Windows, Linux, and Android OS. It was the only font format suitable for implementing Unicode support for Indian scripts. However, OpenType has a long list of issues including the inability to make an unambiguous definition of rules for different languages, limitations in implementing appropriate text rendering behavior, and inability to handle anything beyond the rules. Designing and rendering OpenType fonts require expensive software and advanced hardware, which makes them prohibitive for the industry. Furthermore, rendering software that does not support OpenType, such as the Unity game development library, cannot be used for games in Indian languages.

It was designed for the 101 keys keyboard prevalent at the time of its development.

The devices of today (mostly mobile phones, tables etc. with touch keypads) are very different and do not pose the limitations of hardware keyboards. Moreover, the need for a common layout for all scripts can also be freed and efficient layouts that make each language typing faster in a layout specific to that language can benefit the users a lot. Tamil 99 is a great example and has been very widely adopted because of its efficiency in typing specifically for the Tamil language.

Reverie's wishlist for Indic language tech growth

Our CEO & C0-founder Vivekanand Pani talks about his wishlist for Indic languages, and challenges in developing Indian language technology.

The need of the hour

A growing movement is pushing for change. We at Reverie advocate for:

Revised ISCII
Revised ISCII

Update the ISCII to 16 bits to accommodate more characters for Indian scripts. This update should include separate code pages for each script and character sets taught in schools. Additionally, Unicode or any other character encoding initiative should adopt this Indian standard for Indian scripts used in India.

New font format
New font format

Indian scripts are complex and nonlinear, so a defined font format is necessary to avoid illegible display possibilities. A font standard should be defined based on the set of combinations that are finite and unambiguous.

Tailored keyboard layouts
Tailored keyboard layouts

The current INSCRIPT keyboard layout was developed based on 101 keys keyboard, but the devices used today are different. Thus, efficient layouts that make each language typing faster in a layout specific to that language can benefit the users a lot.

 De-bundling of Operating Systems (OS) with language packs
De-bundling of Operating Systems (OS) with language packs

This is necessary to facilitate larger proliferation of Indian languages’ content on the internet and help India break the .01% Indian language content presence on the internet.

Starting with young learners
Starting with young learners

It’s important to teach students about the three steps in learning the writing system for Indian languages: the Varnamala (alphabet), the barahkhadi (matras with consonants) and the yuktakshars (conjuncts). Students should also learn the necessary ways to use languages in document writing, editing, typesetting, aligning, and text features used specifically for Indian languages to write prose, poetry, accounting, and core subjects. 

This is not just about technology; it's about:

Preserving cultural heritage
Preserving cultural heritage
Promoting inclusivity in the digital world
Promoting inclusivity in the digital world
Empowering millions to express themselves and connect in their own tongues
Empowering millions to express themselves and connect in their own tongues

By embracing the unique needs of Indian languages, we can write a new chapter in the story of Indian language computing. Let’s create a digital world that truly welcomes the rich tapestry of India’s voices.

Jumpstart your localisation efforts with practical tips, tricks, and practices!

Learn about our quest through building a language standardisation for India

Reverie Language Technologies Limited, a leader in Indian language localisation and user engagement technology solutions for over a decade, is working towards a vision to create Language Equality on the Internet.

Reverie’s language practice is dedicated to helping clients future-proof their rapidly expanding content by combining cutting-edge technologies like Artificial Intelligence and Neural Machine Translation (NMT) with best-practice approaches for optimizing content and business processes.

Copyright © 2024 Reverie Language Technologies Limited All Rights Reserved. 

SUBSCRIBE TO REVERIE

The latest news, events and stories delivered right to your inbox.