by Jianwen Xu
On Friday, September 7, Kevin Scannell (Professor of Computer Science at Saint Louis University) gave a presentation for an audience of 20 on how the development of new technology furnishes speakers of minority languages with greater opportunities to use their language on various online platforms.
There are about 7100 languages spoken around the world; however, almost half of them are “endangered” according to UNESCO. 2500 to 3000 have some sort of online presence, with less than 1000 still used by their online language communities. Scannell gave some examples of new language technologies, such as optical character recognition. Turning to Irish and Celtic specifically---the focus of his current research---Scannell introduced the requirements for make Irish a “Google-able” language, which are as follows: 1. A bigger, better dataset; 2. Machine Learning standards tailored to linguistics; 3. Technical capacities without communities; 4. Collaboration with Google, Facebook, Twitter, and other social media platforms. Scannell’s lecture focused on bigger datasets for Irish and fitting machine learning standards into Irish linguistics. To get a better dataset for a minority language, Scannell has worked on collecting many different forms of minority language presence online and specifically on Twitter, through a project known as “Indigenous Tweets.”
To conclude the lecture, Scannell discussed the standardization of minority languages online. The shifting landscape is towards crowdsourced translations, such as Google Translate. During the Q&A, Scannell mentioned that a trend in the field has been switching to a neural network independent of language-specific details, and shared success stories of language modeling.
Professor Kevin Scannell presents his research on language technology