Abstract
<jats:p>This article investigates the application of convolutional neural networks (CNNs) to improve the accuracy and robustness of small unmanned aerial vehicle (UAV) detection in images and video streams, with a particular focus on one-stage real-time detectors derived from the YOLO family and their lightweight adaptations for edge deployment. The study outlines how preserving fine-grained spatial cues during downsampling, strengthening multi-scale feature fusion in the neck/head, and incorporating selective (cost-aware) attention modules can enhance the detection of tiny targets while reducing false alarms caused by birds, clouds, compression artifacts, and cluttered backgrounds. It examines the core design mechanisms of CNN-based detectors for small objects, emphasizing the role of high-resolution branches, efficient feature pyramid topologies, and stable bounding-box regression when objects occupy only a few pixels. Additionally, the article discusses key evaluation aspects, including the importance of small-object metrics and the speed–accuracy trade-off that governs practical anti-UAV systems operating under strict latency, memory, and power constraints. Furthermore, the article considers operational challenges such as domain shifts across landscapes and weather conditions, low-light and infrared scenarios, and the need for temporal consistency in video, where integrating post-processing and tracking can improve stability beyond frame-level performance. By synthesizing recent research trends and practical constraints, the article underscores the necessity for continued development of CNN-centric design strategies and benchmarking protocols to support reliable real-time UAV detection on resource-limited platforms. To better handle extremely small targets, it highlights low-overhead choices such as a P2 scale branch, anti-aliasing downsampling, IoU-aware losses that stabilize regression on tiny boxes, calibrated confidence scoring, hard-negative mining, and deployment-minded quantization/pruning with NPU-friendly operators, etc. </jats:p>