Abstract
<jats:title>Abstract</jats:title> <jats:p> The creation of specialized spoken corpora is essential for linguistic research and numerous practical applications, yet it faces significant technical, methodological, and legal challenges. This paper explores three primary approaches to corpus development: <jats:italic>in vivo</jats:italic> (real‐life recordings), <jats:italic>in vitro</jats:italic> (scripted recordings), and <jats:italic>in silicio</jats:italic> (data generated by large language models). While <jats:italic>in vivo</jats:italic> offers unmatched authenticity and naturalness, legal and ethical constraints often render this method impractical. The <jats:italic>in vitro</jats:italic> approach provides a controlled and legally compliant alternative, with potential for high ecological validity if the data collection procedure is carefully designed. Meanwhile, <jats:italic>in silicio</jats:italic> corpora present a scalable and cost‐effective solution, though concerns about naturalness and linguistic variability persist. Each method carries distinct advantages and limitations, and the choice among them should be guided by the specific goals and constraints of the research project. </jats:p>