Innovative AI Solutions for Preserving Endangered Languages

Endangered Languages and the Role of AI in Preservation

In the vast tapestry of human culture, language stands as a fundamental pillar. It is more than just a medium for communication; it is the lifeblood of history, culture, and legacy. However, as Ross Perlin, co-director of the Endangered Language Alliance, highlights, nearly half of the world's 7,000 languages are now endangered, possibly vanishing by the next century. Such a loss would not only erase languages but obliterate innumerable traditions and stories tied to them. Interestingly, the power of artificial intelligence (AI) offers innovative methods to tackle this dire situation.

Exploring the current state of language decline, we find that historical language bans once enforced by governments in countries like Canada and the U.S. have contributed to this crisis. Perlin argues in his book, "Language City," that these practices form a significant motive for modern linguists and Indigenous communities working tirelessly to preserve their ancestral tongues. According to him, there is a moral imperative to act, as the extinction of languages is often unnaturally enforced, rather than a natural evolution.

Tech giants like Google and IBM have joined forces with endangered language organizations to breathe new life into these tongues using AI and advanced language models. Yet, the task is far more daunting for languages with minimal digital footprints, known as low-resource languages.

A Different Type of Bot: Revitalizing Owens Valley Paiute

The story of Jared Coleman, a member of the Big Pine Paiute Tribe in California, offers an uplifting example of overcoming these challenges. Coleman stumbled upon an archive of the Owens Valley Paiute language, containing a treasure trove of dictionaries and audio recordings, including that of his ancestor. With a profound personal connection but little comprehension, he was inspired to learn the language.

Combining a passion for his heritage and skills in computer science, Coleman developed an online dictionary. Later, he saw potential in using large language models like ChatGPT to facilitate the learning of Paiute. However, the typical vast dataset required for such technology was absent.

Refusing to be deterred, Coleman embarked on a novel approach alongside collaborators at the University of Southern California. Rather than relying on extensive sentence databases, they utilized a new machine translation strategy. This involved using rule-based instructions that teach the AI the grammatical and vocabulary intricacies of the language.

By integrating native speakers into the mix, Coleman found a powerful teaching tool. The model, trained by native guidance rather than bulk data, avoids generating erroneous translations and prioritizes linguistic accuracy. The tool's foundational stage aims to aid future generations in embracing their native language, while Coleman's online dictionary, Kubishi, remains accessible as an instrumental resource.

Setting the Benchmark: The Maori Journey

Parallel to efforts in the California desert, Māori communities in New Zealand have also pursued AI to preserve Te Reo Māori, their ancestral language. Historical adversities, such as the Native Schools Act of 1867, had severely hampered its usage. The Māori Language Act of the 1980s, however, finally provided legal recognition, invigorating preservation efforts.

Te Hiku Media, a pioneering Māori broadcast company, spearheaded an initiative in 2016 to develop an automatic speech recognition model. With a three-decade-long archive of audio, the company aimed to digitize the language. Still, they confronted the intricacies of adapting tech tools to low-resource linguistic environments.

The solution required manually annotating speech data with the expertise of native speakers, following cultural and contextual insights. This meticulous process resulted in a speech recognition model boasting an impressive 92% accuracy. Such precision ensures Te Reo is represented faithfully in digital realms, preempting the pitfalls of mistranslation evident in other models.

Owning the Future: The Significance of Data Ownership

Beyond developing tools, the ownership of linguistic data emerges as a pivotal theme among Indigenous communities. Controlling this data signifies self-determination, a sentiment strongly echoed by Peter-Lucas Jones, CEO of Te Hiku Media. Ownership allows these communities to guarantee their language archives serve benevolent, empowering purposes, steering clear of exploitative academia-focused endeavors of the past.

This emphasis on data agency aspires to shield communities from vulnerabilities and ensure that generated tools will perpetually serve educational and cultural enrichment. As technology continues to evolve, it is hoped that this paradigm shift will sustain the wealth of human languages and the profound cultural stories interwoven with them.

출처 : Original Source

Leave a Comment