'Porjai' speaks your language

'Porjai' speaks your language

AI project is first to recognise dialects across Thailand

"We see Porjai as a ... project where [many] can participate in training the AI model." — Orathai Sangpetch, VP of research and stragegy, CMKL university

The Thai Dialect Artificial Intelligence (AI) model "Porjai" marks a milestone by being the first virtual assistant created for Thai people that uses dialects from across the country.

A research team drawn from six leading universities, moreover, has established a joint programme for further AI research in the upcoming academic year.

Orathai Sangpetch, vice president of research and strategy at CMKL University -- a collaboration between Carnegie Mellon University and King Mongkut's Institute of Technology Ladkrabang -- said the project acknowledges difficulties Thais have encountered during the pandemic, especially in regard to online shopping platforms.

Initially, the "Friends in Need of PA" volunteer foundation helped local sellers in remote areas to offer their products on online platforms, as Covid restrictions had limited close contact between vendors and sellers.

Ms Orathai said over time she was reminded of how young children who cannot read still manage to use virtual assistants like Alexa or Siri to guide them in using applications on smartphones. The research team then set their sights on developing an AI model that understands Thai language and dialects. To that end, Porjai was developed and now helps those who make a living through their small businesses to take advantage of e-commerce and use their own dialects to converse with the AI model.

Ms Orathai explained that Porjai shared similar features with Siri, Alexa and Google Assistant but the unique feature of Porjai is that it recognises dialects spoken in every region of Thailand.

She said that there has never been an AI model that recognises Thai dialects.

"We started our research using the Automatic Speech Recognition system to pave the way for product promotion. Locals who speak dialects can input their product details on online platforms without much difficulty," she said.

The AI development programme is meant to build inclusive technology whereby Thai people who do not speak with the central-Thai accent can gain benefit from e-commerce, whereas before either a standardised Thai accent or English was required.

The project began in September 2020. Research teams from provincial universities continue to collect dialect data via the Porjai website, where now more than 300,000 dialect voice samples and more than 400,000 common voice samples are available for everyone to access.

"Porjai's ability to understand dialects is still at a beginner's level. We think of her as a two-year-old baby who has just started learning simple vocabularies and sentences. She still needs a lot of training to recognise more complicated dialect syntax," said Ms Orathai.

She added that dialects are apparent mostly in spoken rather than written forms. So it is truly a challenge for the AI model to generate comprehensive transliterations of dialects.

"The Thai language does not have a punctuation system as in English and it may confuse our AI model. However, we have a professor from Chulalongkorn University whose expertise helps us train Porjai the way people around the world have trained Alexa and Siri," she said.

The developer team has selected four major regional Thai dialects: northern, northeastern, southern and central. Still, lack of adequate data impedes accurate generation of helpful information with AI.

She said fieldwork conducted across Thailand elicited enthusiastic participation by locals across various age ranges. Elders and university students alike were keen to give their voice samples to perfect the dialect corpus, and they were glad to be able to be a part of such a useful and promising AI developing project.

"We see Porjai as a pioneer project where people in local communities and younger generations can participate in training the AI model. Our corpus has free access. Students, researchers or companies can make use of our data."

Ms Orathai encouraged everyone to visit Porjai's website and input voice samples to expand the AI's speech recognition ability. She hoped that educational institutions and students would participate in the data collection or otherwise help promote the Porjai project to locals in their communities so that one day Porjai will benefit all Thai people regardless of their location or age.

The developers comprise researchers, faculty members and students from CMKL University as well as Chulalongkorn University, Khon Kaen University, Prince of Songkla University and Chiang Mai Rajabhat University. CMKL offers a joint programme in Artificial Intelligence and Computer Engineering (AiCE) starting from the first semester of the 2022 academic year.

Prof Supachai Pathumnakul, the Deputy Permanent Secretary at the Ministry of Higher Education in Science, Research and Innovation (MHESI), said the ministry supports the study of AI technology as the field is in high demand in the workforce.

He mentioned that Thailand's Higher Education Sandbox acknowledges the importance of AI technology with the country now facing increasing digital challenges, and the AiCE programme will play a big role in reinventing the Thai university system, he said.

Do you like the content of this article?