Beyond black‐box AI: Comparing ChatGPT‐4 interpretability and accuracy to CNNs in melanocytic lesions diagnosis

Authors: Ofer Reiter, Cristian Navarrete‐Dechent, Mor Atlas et al.

Publication: JDDG: Journal der Deutschen Dermatologischen Gesellschaft

Published: Apr 2, 2026

Source: Crossref

Back to Search View Original Cite This Article

Abstract

<jats:title>Summary</jats:title> <jats:sec> <jats:title>Background</jats:title> <jats:p>Artificial intelligence (AI) algorithms have advanced and recently shown high accuracy in diagnosing skin cancer from dermoscopic images. This study compared the diagnostic performance of the large language model ChatGPT‐4 with that of specialized convolutional neural network (CNN)‐based models in analyzing melanocytic lesions.</jats:p> </jats:sec> <jats:sec> <jats:title>Patients and Methods</jats:title> <jats:p>A cross‐sectional comparative study was conducted using 117 dermoscopic images. The performance of ChatGPT‐4 was assessed under two conditions: diagnosing lesions directly without annotations and diagnosing after annotating dermoscopic features. Results were compared with CNN‐based models (YPSONO and ResNet) and human expert evaluations. The confusion matrices of all the models were calculated in addition to the diagnostic accuracy, sensitivity, specificity, and interobserver agreement (Cohen's Kappa).</jats:p> </jats:sec> <jats:sec> <jats:title>Results</jats:title> <jats:p>ChatGPT‐4 achieved 92 % sensitivity, 89 % specificity, and an accuracy of 89.7 % in direct diagnosis. When annotations were required, sensitivity and specificity dropped to 68 % and 64 %, respectively. Agreement with experts on dermoscopic patterns was minimal (Cohen's Kappa = 0.13). ChatGPT‐4 outperformed CNN models in direct diagnosis but exhibited notable limitations in describing dermoscopic features.</jats:p> </jats:sec> <jats:sec> <jats:title>Conclusions</jats:title> <jats:p>ChatGPT‐4 demonstrated promising potential for accurate melanoma versus nevus classification without annotations, surpassing CNN‐based models. However, its limited ability to describe dermoscopic features accurately highlights the need for further research and training.</jats:p> </jats:sec>

Keywords

dermoscopic chatgpt4 models accuracy diagnosing

Beyond black‐box AI: Comparing ChatGPT‐4 interpretability and accuracy to CNNs in melanocytic lesions diagnosis

Abstract

Keywords

Related Articles

Comparing and Prioritizing Different Methods of Collection and Decontamination of Waste in Decentralized Healthcare Centers

Comparing the Performance Evaluation Models of Gas Refineries Using AHP and TOPSIS

Comparing of the Effects of Acceptance and Commitment Therapy and Emotion Regulation Training on Diabetes Empowerment

Different or alike? Comparing computer-based and paper-based card sorting

The Experiences of Making Infant Feeding Choices by African, Caribbean and Black HIV-Positive Mothers in Ontario, Canada