Many of the world’s languages are disappearing. David Chiang wants to do something about that.
Chiang, USC Viterbi research assistant professor and computational linguist at the USC Information Sciences Institute, worries that the loss of these languages would mean a diminution of scientific knowledge and cultural understanding.
That’s why he spent two weeks in May 2012 leading a language preservation workshop funded by the National Science Foundation to save endangered languages in Papua New Guinea, a country in the South Pacific with rich cultural and linguistic diversity. Papua New Guinea has arguably the highest language density and diversity on earth — over 800 different languages, or a new language for every 215 square miles.
“Each of these languages is a window into a culture as well as into cognition,” Chiang said. “Each of them is worth preserving.”
Joined by his collaborator, Steven Bird of the University of Melbourne, Chiang and his team gathered data on 15 endangered languages in Papua New Guinea.
From college students to elders to local workers, people from throughout the country participated in the study to document their dying languages. In the study, participants transcribed stories in their language into a computer database that catalogued the information.
This data collection contributed to Bird’s ambitious goal of documenting all of the world’s languages.
Chiang’s expertise is called natural language processing, a field of computer science which studies how computers understand language and then finds practical applications for this information. Chiang often draws on knowledge from linguistics, which studies how the brain processes language.
Through melding computer science and linguistics during his Papua New Guinea trip, Chiang provided an innovative approach to documenting these languages. While traditional linguists interact with native speakers one at a time and often record their interviews, Chiang’s method allows multiple speakers to document their language by inputting words and expressions simultaneously on individual computers.
Preserving languages the traditional way requires considerable effort. To study them, linguists must undertake extensive international travel to interact with native speakers over long periods of time. Only then can they collect the words they recognize as the most important to save a language.
The process is costly and time-consuming. In addition, linguists are highly specialized and few in number. Documenting language in this way poses difficulty for recording the multitude of the world’s dying ones.
By taking an unorthodox approach to documenting language, Chiang hopes to surmount the challenges linguists traditionally face of gathering extensive information with limited time and resources.
Chiang’s hopes his innovative approach will make language documentation cheaper and quicker. In Papua New Guinea, his use of computers allowed the team to collect more data in a shorter period of time than otherwise possible.
After three flights totaling 18 hours, Chiang, Bird, and their international team of computational and traditional linguists received a warm welcome in the country.
The town buzzed with excitement as the new visitors arrived, particularly ones with an interest in preserving local languages.
Chiang hopes that the widespread enthusiasm will spark pride in the local languages, helping to preserve and even revitalize them. Giving people a way to document their language through technology increases the likelihood of passing them on, Chiang said.
“Documentation is one way to encourage a language to stay preserved,” Chiang said. “A language will only stay alive if its community wants it to stay alive.”
Roughly half of the world’s 7,000 languages are endangered, according to Ethnologue, an online database of the world’s languages. Languages become endangered when 100,000 people or less speak them. They disappear when more dominant languages take over, or native speakers die out without passing them on.
This could happen to the languages Chiang’s team studied in Papua New Guinea, including Alekano, a language spoken in the eastern highlands. Only 25,000 people speak it, which is less than USC’s combined undergraduate and graduate student populations. Ethnologue classifies Alekano as shifting, which means it is not being successfully passed on to children.
These endangered languages also have a strong scientific significance. Studying them allows linguists to gain a deeper and fuller understanding of how humans learn and communicate.
“Part of the interest of modern science, including social science, is to understand as much as possible about the human mind,” said Khalil Iskarous, a USC linguist who works with endangered languages in Taiwan. “When languages are dying or already dead, we lose an enormous amount of our capability to understand the human mind for which language is one of the principal tools.”
At the end of two weeks, Chiang’s team had collected about 20,000 words on a database. That allowed them to begin building machine translation systems that convert the indigenous languages into English.
Despite his success, Chiang could only think of how to improve.
The Papua New Guinea trip was mainly exploratory, but Chiang says this is just the beginning. After securing funding, his team will start planning their next venture. Possible locations include Taiwan, Africa, or another trip to Papua New Guinea to continue cataloguing the country’s endangered languages.
Chiang wants to focus on fixing some of the problems his team encountered on its recent trip, such as increasing the speed of data collection.
“We learned about what we need to do to take a deeper step into the problem,” Chiang said.
His interest in the intersection of language and computers started at a young age. Growing up before the age of the Internet, he enjoyed playing “text adventure” computer games. The games had no graphics, and the user controlled the action by typing in text commands that generated a story.
Another entertaining computer program Chiang enjoyed was a virtual persona named Eliza. Users could write to her and she would respond, simulating a conversation. This interaction with the computer intrigued him.
As an undergraduate computer science major at Harvard University, Chiang became hooked on linguistics after only a single course on the subject. He went on to receive his Ph.D. from the University of Pennsylvania in computational linguistics and natural language processing. There, he met University of Melbourne’s Bird, his collaborator for the Papua New Guinea project.
While the team collected a substantial amount of data, it still lacks enough words or resources to completely decipher these languages. Chiang’s method of documentation requires more data collection, but it will hopefully make these languages more accessible to linguists of the future.
“It’s not as good as talking to live speakers, but a million words of a language that’s gone is far better than nothing,” said Chiang.