|

Last updated on: September 6, 2024

My Journey in Indic Computing

Share this article

This AI generated Text-to-Speech widget generated by Reverie Vachak.

In nearly thirty years, Indian language digital technologies and their adoption have evolved in strange ways. Have we progressed to be ready for the GenAI race?

The Fascination for CDAC

May 9, 1997.

This was the day I joined CDAC(Centre for Development of Advanced Computing) at Pune and was introduced to Indian language computing. I had no knowledge or interest in the subject. I was indifferent but very excited about being a part of CDAC, a scientific society of India. A place where some of the greatest minds were developing supercomputers for India because the US had imposed sanctions on us. 

The Maverick - To be or not to be

During my school days, I used to be very interested in a variety of things. Space, automobiles and energy topped the list and I may have devoured every possible book or magazine I could get in those days on these subjects. I designed an electric car in my secondary school diary and went after trying to run an old standard gazel in biogas. My diary was also filled with designs of wind turbines. I simulated a digitised dashboard for cars of the period for my engineering project. The dashboard introduced a fuel flow meter and could compute the runtime fuel efficiency using an 8085 microprocessor. I say these to show that I was far away from “languages”.

How a nervous wreck got into CDAC

It was Feb 1997 when I first walked into the CDAC Head Quarters in Pune University campus to appear for the interview. The air was different there. The people I met were so polite, helpful and nice that I wondered if I deserved it. I wondered how could an institution that has such honor and fame, didn’t bring arrogance in its people. The corridors of CDAC made me feel as if I desperately wanted to be a part of the action in the interest of our nation’s digital being. I wanted to be the squirrel helping in the building of Ram Setu. 

My first interview was a three (3) membered panel – Shashank Bhat, Anupam Saurabh and Tarun Malviya. I remember that they were very impressed by my certificates of mathematics and physics (which were dated, from my school days). I don’t remember anything else about the interview except that when it was over, Anupam asked me if I had anything to ask. I said I was nervous. They all got curious and said, it could be because of the interview. I replied that I am usually not nervous, but this particular one is certainly very different. They asked why. I said, I feel I should be a part of this institution and the effort. I am nervous thinking about what if I don’t make it. Years later, Shashank told me that they decided to give me a chance because of that statement of nervousness.

ISCII- ALP-LEAP- ISM - meet my first mentors

During the first few weeks when one joined the GIST (Graphics and Intelligence based Script Technology) group, one had to go through the ISCII document thoroughly and use the ALP(Apex Language Processor), LEAP and ISM(ISFOC Script Manager) software. ALP and Leap were DOS and Windows based independent word processing applications respectively. ISM helped a user to use Indian language fonts and type in any other Windows application (there was no native support for any Indian language in any OS or other software until 2001). Understanding the fundamentals was the primary focus. The fundamentals began with understanding the character encoding and how it is implemented and eventually used by a user.

Every aspect of editing, searching, sorting and implementing a text processing algorithm was covered. Indian scripts are complex and do not render linearly on screen. So, what a user types and what they should see becomes the most important aspect of the fundamentals. The script grammar and unambiguous display while editing played a major part of the implementation of what was called the “Script Technology”. It was impossible to miss the importance and therefore the name for GIST. English language (the default on computers until then for me) didn’t need a script technology. But, there was so much about our languages. So much just in the basics. The entire computer hardware and standards were centered around English. For example, the keyboard had one key for each English letter. The screen was divided into 80 columns and 25 rows based on the fixed letter sizes for English. The printers were designed with 9 pins hammer based on the fixed height of a text row for English letters. Even the 1 byte ASCII character standard used the values outside of the English letters reserved for a variety of functions or uses within different software, network, protocols or hardware. Everything that had to be done to make Indian languages “usable” first for anything using a computer, had to work with these hardware by developing new software. For example, no Indian script could work with fixed width or fixed height characters. Characters join and get wider or stack and get taller. Cursor moves over conjuntcs or clusters. 

A backspace can remove a consonant and create new clusters that might still be in the midst of being edited and hence must not “become” clusters, while the same logic cannot apply to “delete” because it may end up forming illegal sequences. This meant, the technology needed defining for all cases. These new software had to be developed around these hardware limitations and the standards had to factor all such limitations. It taught me a lot. A lot of which eventually wouldn’t remain as limitations but nothing that could have been avoided in the learning about “Script Technology”, no matter where and when it may be applied. And this holistic knowledge, I can confidently say, was common knowledge for every GISTian. It made an unparalleled team with outstanding values, passion and intellect. The impact that GIST had already made was unprecedented and so far, unsurpassed.

The Building Blocks

I got the chance to work with Raymond Doctor in developing the transliteration engine and then the spell checkers for several languages including Odia, my mother tongue, which I developed with Dr. Prafulla Tripathy. My involvement kept getting deeper and I got to work on almost all language tools. My deeper involvement may be attributed to my keen interest in understanding how languages worked and improvising algorithms to be able to get the most optimal outcomes. This, I must say, was the DNA at GIST. ALP, an 11 Indian languages word processor that had been developed years before I joined, had typing, display, spell checking, text attributes, printing and what not, supported while running on the less than 600KB memory of DOS environment. The more I learned, the more I admired the people who laid the foundations and developed the incredible set of technologies, which now appeared frictionless to use. I got committed to the cause. Indians deserved great tools to be able to use their languages digitally and I resolved to give my best.

Technology barriers and lack of standardisation is limiting India’s Internet

In these 27 years, I have had to take a break from language technologies for a year before we founded Reverie. But, in the past two decades, Indian language software has evolved in strange ways. The transition from ISCII to Unicode may be one of the most disconnected one in technical history. Corpus or work done in ISCII in the past, are now called non-standard legacy data and are mostly lost. Compatibility with ISCII based software and document formats was not considered during the migration. So, all the software developed based on Unicode, no more focused on the aspects specific to Indian languages. Or, if they did, it was in bits and pieces based on the level of alarms that could be raised by influential users. 

It is the year 2024 now. In contrast to the adoption we had seen almost three decades ago when computers were prohibitively expensive for many, native language users today struggle to type easily and accurately in their native languages. Those who do, struggle to find good fonts in Unicode to publish in. The searches do not find Indian language text efficiently. The corpus collected by researchers are filled with ambiguous texts created erroneously due to incorrect software standards. Most technocrats have announced that typing in Indian languages is difficult and we must develop speech interfaces for Indians. The same people who would happily want to write their diaries or letters home on a paper in their own language and script, want to rather struggle and use English letters to be able to send whatsapp messages in their native languages.

All of these, when almost all the limitations of the early days are gone.

We have made and will continue to make progress with technology. But, we have distanced ourselves from technology development the right way. As I write this, Indian language technology and solution developers are racing to build large language models for the languages nearly 1.5 billion people speak and a billion people read and write. With practically negligible useful digital data available, a lot of innovation goes in trying to build such data “artificially”. But, data is supposed to be built by people when they engage. Quality data in digital engagements is generated through ease of use. Isn’t that the reason why English is so digitally rich?

I explained this briefly at an event six years ago which was recorded by MediaNama.

 – The video was first broadcasted by MediaNama

It’s been a long journey. An honest and sincere one. Unfinished still.

I await the day when our country’s internet fulfills the sense of democracy and equitability as expressed in these immortal lines penned by Gurudev Rabindra Nath Tagore-

Where the mind is led forward by thee into ever widening thought and action into that heaven of freedom, my father, let my country awake.

Share this article
Subscribe to Reverie's Blogs & News
The latest news, events and stories delivered right to your inbox.

You may also like

SUBSCRIBE TO REVERIE

The latest news, events and stories delivered right to your inbox.