Abstract
<jats:p>Diabetes is one of the most common chronic diseases affecting a large population worldwide. Early identification of diabetes is important because delayed diagnosis can lead to severe health complications such as heart disease, kidney failure, nerve damage, and vision-related disorders. In recent years, Machine Learning techniques have been widely used in the healthcare sector for disease prediction and medical data analysis. These techniques help in identifying hidden patterns from historical patient data and assist in faster decisionmaking. This project presents a machine learning-based diabetes prediction system developed using patient medical attributes such as glucose level, blood pressure, insulin level, body mass index (BMI), age, pregnancies, skin thickness, and diabetes pedigree function. The system uses the Pima Indians Diabetes Dataset for training and testing purposes. Exploratory Data Analysis (EDA) was performed to understand dataset characteristics and analyze feature distribution before model training. For prediction, the Random Forest Classifier algorithm was implemented because of its effectiveness in handling classification problems and structured healthcare datasets. The dataset was divided into training and testing sets, and feature scaling techniques were applied during preprocessing. The trained model was evaluated using performance metrics such as accuracy, precision, recall, F1-score, and confusion matrix. The model achieved satisfactory prediction performance for diabetes classification. To provide an interactive user experience, the trained machine learning model was integrated into a Streamlit-based web application. The application allows users to enter patient health information and receive instant prediction results regarding diabetes risk. Additional visualization features were included to improve understanding of model performance and prediction behavior. The proposed system demonstrates the practical application of machine learning in healthcare prediction systems. The project highlights how data-driven approaches can support preliminary disease risk assessment and contribute toward intelligent healthcare solutions.</jats:p>