Abstract
<jats:p>This paper investigates the influence of software development process characteristics on the predictive performance of machine learning models for issue resolution time estimation. The study is based on anonymized open datasets from the Hyperledger, JFrog, and Mojang projects, derived from issue tracking systems used in software development. Random Forest, Gradient Boosting, and CatBoost models were employed for prediction. The results demonstrate a consistent superiority of machine learning approaches over a naive baseline prediction based on the mean value of the target variable. Mean Absolute Error (MAE) was reduced by 37.7–75.5% depending on the dataset, with the best result achieved on the JFrog dataset, where MAE decreased from 14,948 to 3,665 seconds. Feature importance analysis revealed that process-related characteristics provide the greatest contribution to prediction quality, including the number of status changes, the number of participants involved in task execution, and the time to first progress. For the most influential process features, permutation importance values reached 257–540, substantially exceeding the contribution of static task attributes such as issue type and priority. The datasets exhibit varying degrees of process formalization. For the highly structured JFrog records, the coefficient of determination reached 0.76, while for Mojang it did not exceed 0.32. This variability indicates a direct relationship between prediction accuracy and the explanatory power of ML models on the one hand, and the completeness of event logging throughout the task lifecycle in the tracking system on the other. The most significant features and their principal distinctions from other attributes are identified and discussed.</jats:p>