Abstract
<jats:p>This doctoral dissertation investigates methods to enhance the automatic assessment of second language (L2) speech intelligibility and pronunciation within Computer-Assisted Language Learning (CALL) systems. To address non-native speech variability and data scarcity, the research explores three main avenues: leveraging linguistically grounded features, refining and predicting multi-dimensional speech intelligibility measures, and applying advanced end-to-end architectures. First, a data-driven classification study demonstrates that standardized acoustic-phonetic features effectively distinguish non-native from native speech. Second, the thesis validates speech intelligibility measures, revealing that visual analogue scale ratings and transcription-based accuracy capture distinct communicative dimensions, both of which can be predicted using automated acoustic models. Third, focusing on pluricentric languages, the research shows that cumulating cross-variety speech resources enhances automatic speech recognition performance for non-dominant varieties but degrades pronunciation error detection. Finally, the dissertation introduces novel end-to-end frameworks that integrate articulatory features, significantly improving mispronunciation detection accuracy and lowering diagnostic error rates. Overall, this work integrates phonetic knowledge into deep learning architectures to support next-generation automated tutoring systems with detailed, subsegmental feedback.</jats:p>