Building Specialized Corpora of Spoken Business Language

Authors: Piotr Pęzik, Anna Cichosz, Mikołaj Deckert

Publication: The Encyclopedia of Applied Linguistics

Published: Mar 26, 2026

Source: Crossref

Back to Search View Original Cite This Article

Abstract

<jats:title>Abstract</jats:title> <jats:p> The creation of specialized spoken corpora is essential for linguistic research and numerous practical applications, yet it faces significant technical, methodological, and legal challenges. This paper explores three primary approaches to corpus development: <jats:italic>in vivo</jats:italic> (real‐life recordings), <jats:italic>in vitro</jats:italic> (scripted recordings), and <jats:italic>in silicio</jats:italic> (data generated by large language models). While <jats:italic>in vivo</jats:italic> offers unmatched authenticity and naturalness, legal and ethical constraints often render this method impractical. The <jats:italic>in vitro</jats:italic> approach provides a controlled and legally compliant alternative, with potential for high ecological validity if the data collection procedure is carefully designed. Meanwhile, <jats:italic>in silicio</jats:italic> corpora present a scalable and cost‐effective solution, though concerns about naturalness and linguistic variability persist. Each method carries distinct advantages and limitations, and the choice among them should be guided by the specific goals and constraints of the research project. </jats:p>

Keywords

corpora linguistic research legal in vivo

Building Specialized Corpora of Spoken Business Language

Abstract

Keywords

Related Articles

<scp>AI</scp> for Deep Work and Specialized Tasks

Preface

Corpora and Forensic Linguistics

Online Corpora

Emotion, Manipulation, and Ideology in Specialized Genres: A Critical Approach to Meaning and Power in Professional Discourse