Back to Search View Original Cite This Article

Abstract

<jats:p>&lt;p&gt;Signature-based detection and lightweight heuristics increasingly struggle with rapidly evolving malware, while running every file in a sandbox is too costly; therefore, practical malware triage requires automated decisions under a strict false-alarm budget. This study aims to improve threat detection efficiency by developing a transferable machine-learning classifier that preserves high malware recall while explicitly controlling false positives and keeping the sandbox workload manageable. We hypothesize that decision thresholds should be selected not by optimizing an average metric on a held-out test split, but via an explicit error budget: using out-of-fold predictions on benign files to set a blocking threshold such that the number of false positives does not exceed K, and then deploying a three-zone policy (&amp;laquo;block / send for review / allow&amp;raquo;). Experiments were conducted on the UCI dataset &amp;laquo;Malware static and dynamic features VxHeaven and Virus Total&amp;raquo; (6,248 files; 1,084 features; reduced to 244 after removing constant features), with evaluation performed not only under a standard random split but also under two cross-source transfer scenarios (train on VxHeaven, test on VirusTotal, and vice versa), which emulate real-world domain shifts. We compared linear models and tree-based ensembles and additionally examined score calibration (mapping raw model scores to better-behaved probabilities) to support robust thresholding. To provide a conservative and evidence-based assessment of false positives under small benign test samples, we reported exact binomial confidence intervals for the false-positive rate. The main gain was achieved by the proposed Stage12 policy (K-based thresholding from out-of-fold benign predictions): in the VxHeaven &amp;ndash; VirusTotal scenario, recall reached 0.8227 with a sandbox review rate of 0.2092 and zero observed false positives; compared to the baseline gray-zone policy, recall increased by +0.2816 while the review load decreased by 2.29&amp;times;. In the VirusTotal &amp;ndash; VxHeaven scenario, recall reached 0.9670 with a review rate of 0.0735 and an observed false-positive rate of 0.0084; relative to the gray-zone baseline, recall increased by +0.1234 and the review load decreased by 2.61&amp;times; at the same observed false-positive level. These results demonstrate that K-budgeted, out-of-fold threshold selection enables an operationally controlled detection regime under domain shift: it improves recall and reduces the need for expensive sandboxing while maintaining a defensible false-alarm control. The scientific novelty is an evidence-backed integration of transfer evaluation, explicit false-positive budgeting, and a three-zone decision policy, where the operating point is determined by a formal error constraint rather than by optimizing a single average score.&lt;/p&gt;</jats:p>

Show More

Keywords

recall review while false positives

Related Articles