Abstract
<jats:p>The textbook presents the basic concepts and results related to the mathematical theory of reinforcement learning. The author defines the concept of a Markov decision process (MDP), which is the central concept of reinforcement learning theory, formulates and proves the basic properties of a MDP, introduces the concept of a MDP policy, and provides an algorithm for constructing an optimal policy for a given MDP. The textbook reflects the material on reinforcement learning presented within the framework of the Interfaculty course “Mathematical Foundations of Machine Learning and Forecasting” given by the author at Moscow State University in the 2024-2025 academic year. This textbook is of interest to students and specialists in the field of artificial intelligence.</jats:p>