Internationalization & Unicode® Conference (IUC)
Date: November 1-3, 2016
Venue: Santa Clara, CA, USA
Hear Vivekananda Pani, Co-Founder & CTO, Reverie Language Technologies speak on:
Why India needs cross-language computing. Most Indians are polyglots and are comfortable using more than one local language at a time. However, only 10% of them are conversant in the English language. With millions of local-languages users willing to be a part of the Indian Internet, developing cross-language computing tools makes leeway for local-language digital content discovery and consumption.
Challenges in cross-language computing for Indic languages:
At a fundamental level, the existing input methodology followed for Indic scripts allow room for ambiguity and undesired characters. This in itself complicates the cross-language information discovery and retrieval. Further, the lack of basic NLP tools like stemmers, lemmatizers, spelling correctors, and PoS taggers dilutes the ability to index and retrieve content in multiple languages. Language detection across multiple scripts and transliteration schemes are an impending challenge. Moreover, less than 0.1% of digital content on the Internet is in languages other than English, which makes it extremely difficult for researchers to build machine learning models.
In this session you will learn:
· Approaches for correcting spelling and transliteration errors in a multilingual environment
· Building NLP tools for Indic languages: Challenges in PoS tagging and Named Entity Recognition
· Building contexts and addressing semantics: Synonomy, Polysemy, and Word Sense Disambiguation
· Towards multilingual content discovery: New approaches to multilingual indexing and relevance ranking
· Language independent NLP: Character Encoding and Deep Neural Networks
· Machine Translation at scale: can we increase content availability and accessibility?