Page 1 of 1

She stated that the lack of high-quality resources

Posted: Sat Feb 08, 2025 7:02 am
by Rina7RS
“[T]his unavailability of data is a challenge that we tackled through our scientific work and gave improvements on base level work by a bigger margin. Another thing is the requirement of High processing power which is, though readily available but at a bigger price, not suitable for research most of the time, making it time-consuming (cloud-based servers might be a short-term solution),” Utsa explained.

It’s a similar issue for Vassilina Nikoulina, a researcher of NAVER Labs Europe, who has published a study entitled, “SMaLL-100: Introducing Shallow Multilingual Machine Translation Model for Low-Resource Languages.” , both for training models, but also for evaluation of these models, was a major challenge. She further explained that when the resources are available, they are often specialized content, such as religious texts translations, and can be of quite low quality.

“[The] recent trend for multilingual models partially addresses this problem hong kong mobile database because low-resource languages benefit from knowledge transfer from high-resource languages when trained jointly. However, in practice, very little attention is paid to the real low-resource languages in such models: most of them are evaluated on high-resource datasets, and low-resource languages represent just a tiny fraction of the training data. We believe that one could benefit much more from the knowledge transfer if various training factors are considered with caution: eg. the proportion of high-resource languages vs. low-resource languages in the training dataset, model size, and the training procedure. If our final target is low-resource language translation, we should pay attention to the choice of these factors,” Vassilina said.