Natural Language Processing and Speech Technologies for Central Asian Turkic Languages: A Review of Current Methods, Resources, and Challenges, Орталық Азия түркі тілдеріне арналған табиғи тілдерді өңдеу және сөйлеу технологиялары: қазіргі әдістер, ресурстар және мәселелерге шолу, Обработка естественного языка и технологии речи для тюркских языков Центральной Азии: обзор современных методов, ресурсов и проблем

Authors: Palidan Muhetaer

Publication: Actual Problems of the Present

Published: Dec 29, 2025

Source: Crossref

Back to Search View Original Cite This Article

Abstract

<jats:p>This article provides a comprehensive review of contemporary research in the field of natural language processing (NLP) and speech technologies for Central Asian Turkic languages, including Kazakh, Kyrgyz, and Uzbek. Although a number of theoretical and applied studies have been published in recent years, these languages continue to be classified as low-resource. This situation is primarily caused by the limited availability of annotated text corpora, insufficient speech data, the parallel use of Cyrillic and Latin scripts, and the absence of unified annotation and evaluation standards. The article systematically examines current approaches to morphological segmentation, named entity recognition, sentiment analysis, and automatic speech recognition. Agglutinative morphology and vowel harmony are discussed as key typological features of Turkic languages that strongly influence computational processing strategies. The effectiveness of both rule-based and neural morphological analyzers is highlighted. The paper also describes the adaptation of computational models originally developed for Turkish, English, and Russian through subword modeling, character-level embeddings, and multilingual transformer architectures. In addition, cross- lingual transfer learning is evaluated as a promising approach to mitigating data scarcity. The study identifies corpus fragmentation, inconsistent annotation schemes, and the lack of standardized speech resources as major challenges. The author argues for the development of open-access datasets, the introduction of shared evaluation tasks, and the strengthening of institutional collaboration between linguists and computational language technology specialists. The findings of the study are of both theoretical and practical importance for the development of sustainable and effective language technologies for low-resource languages.</jats:p>

Keywords

speech languages language computational article

Abstract

Keywords

Related Articles

Comparing the Performance Evaluation Models of Gas Refineries Using AHP and TOPSIS

Investigating the Non-carcinogenic Risk and Hazard Quotient of Heavy Metals in High-traffic Districts of Tehran Metropolis, Iran

Natural Sustainability Ethics for Resolving Current Outer Space Challenges

Distributed Processing for Knowledge Management in cooperative e-Government administration projects

Exploring the Effects of Annual Income, History of Area and Natural Beauty on Loyalty to a Tourist Destination: The Case of Thessaloniki, Greece