Thursday, July 06, 2006

Urdu, Thai, Hungarian, Bengali, Punjabi, Tamil and Yoruba.

Less commonly taught languages.
The goal of this project is to create and share resources to support additional basic research and initial technology development in what have been called Less Commonly Taught Languages. These languages have also been called Low Density, not for the population of native speakers but rather for the scarcity of resources. A typology that distinguishes both population of native speakers might label them High Density/Sparse Resource language since the languages of current focus have more than a million speakers but inadequate resources for building human language technologies.
Linguistic Data Consortium was founded in 1992 to provide a new mechanism for large-scale development and widespread sharing of resources for research in linguistic technologies. Based at the University of Pennsylvania, the LDC is a broadly-based consortium that now includes more than 100 companies, universities, and government agencies.