Guided Malware Sample Analysis Based on Graph Neural Networks

Yi Hsien Chen; Si Chen Lin; Szu Chun Huang; Chin Laung Lei; Chun Ying Huang

doi:10.1109/TIFS.2023.3283913

Guided Malware Sample Analysis Based on Graph Neural Networks

Yi Hsien Chen, Si Chen Lin, Szu Chun Huang, Chin Laung Lei, Chun Ying Huang^*

^*Corresponding author for this work

Organisation & Governance

Research output: Contribution to journal › Article › Scientific › peer-review

1 Citation (Scopus)

14 Downloads (Pure)

Abstract

Malicious binaries have caused data and monetary loss to people, and these binaries keep evolving rapidly nowadays. With tons of new unknown attack binaries, one essential daily task for security analysts and researchers is to analyze and effectively identify malicious parts and report the critical behaviors within the binaries. While manual analysis is slow and ineffective, automated malware report generation is a long-term goal for malware analysts and researchers. This study moves one step toward the goal by identifying essential functions in malicious binaries to accelerate and even automate the analyzing process. We design and implement an expert system based on our proposed graph neural network called MalwareExpert. The system pinpoints the essential functions of an analyzed sample and visualizes the relationships between involved parts. We evaluate our proposed approach using executable binaries in the Windows operating system. The evaluation results show that our approach has a competitive detection performance (97.3% accuracy and 96.5% recall rate) compared to existing malware detection models. Moreover, it gives an intuitive and easy-to-understand explanation of the model predictions by visualizing and correlating essential functions. We compare the identified essential functions reported by our system against several expert-made malware analysis reports from multiple sources. Our qualitative and quantitative analyses show that the pinpointed functions indicate accurate directions. In the best case, the top 2% of functions reported from the system can cover all expert-annotated functions in three steps. We believe that the MalwareExpert system has shed light on automated program behavior analysis.

Original language	English
Pages (from-to)	4128-4143
Number of pages	16
Journal	IEEE Transactions on Information Forensics and Security
Volume	18
DOIs	https://doi.org/10.1109/TIFS.2023.3283913
Publication status	Published - 2023

Keywords

Graph neural network
machine learning for security
malware analysis
reverse engineering

Access to Document

10.1109/TIFS.2023.3283913

Guided_Malware_Sample_Analysis_Based_on_Graph_Neural_NetworksFinal published version, 3.15 MB

Cite this

@article{ae1da5ba167349fb89934330eed009fb,

title = "Guided Malware Sample Analysis Based on Graph Neural Networks",

abstract = "Malicious binaries have caused data and monetary loss to people, and these binaries keep evolving rapidly nowadays. With tons of new unknown attack binaries, one essential daily task for security analysts and researchers is to analyze and effectively identify malicious parts and report the critical behaviors within the binaries. While manual analysis is slow and ineffective, automated malware report generation is a long-term goal for malware analysts and researchers. This study moves one step toward the goal by identifying essential functions in malicious binaries to accelerate and even automate the analyzing process. We design and implement an expert system based on our proposed graph neural network called MalwareExpert. The system pinpoints the essential functions of an analyzed sample and visualizes the relationships between involved parts. We evaluate our proposed approach using executable binaries in the Windows operating system. The evaluation results show that our approach has a competitive detection performance (97.3% accuracy and 96.5% recall rate) compared to existing malware detection models. Moreover, it gives an intuitive and easy-to-understand explanation of the model predictions by visualizing and correlating essential functions. We compare the identified essential functions reported by our system against several expert-made malware analysis reports from multiple sources. Our qualitative and quantitative analyses show that the pinpointed functions indicate accurate directions. In the best case, the top 2% of functions reported from the system can cover all expert-annotated functions in three steps. We believe that the MalwareExpert system has shed light on automated program behavior analysis. ",

keywords = "Graph neural network, machine learning for security, malware analysis, reverse engineering",

author = "Chen, {Yi Hsien} and Lin, {Si Chen} and Huang, {Szu Chun} and Lei, {Chin Laung} and Huang, {Chun Ying}",

year = "2023",

doi = "10.1109/TIFS.2023.3283913",

language = "English",

volume = "18",

pages = "4128--4143",

journal = "IEEE Transactions on Information Forensics and Security",

issn = "1556-6013",

publisher = "IEEE",

}

TY - JOUR

T1 - Guided Malware Sample Analysis Based on Graph Neural Networks

AU - Chen, Yi Hsien

AU - Lin, Si Chen

AU - Huang, Szu Chun

AU - Lei, Chin Laung

AU - Huang, Chun Ying

PY - 2023

Y1 - 2023

N2 - Malicious binaries have caused data and monetary loss to people, and these binaries keep evolving rapidly nowadays. With tons of new unknown attack binaries, one essential daily task for security analysts and researchers is to analyze and effectively identify malicious parts and report the critical behaviors within the binaries. While manual analysis is slow and ineffective, automated malware report generation is a long-term goal for malware analysts and researchers. This study moves one step toward the goal by identifying essential functions in malicious binaries to accelerate and even automate the analyzing process. We design and implement an expert system based on our proposed graph neural network called MalwareExpert. The system pinpoints the essential functions of an analyzed sample and visualizes the relationships between involved parts. We evaluate our proposed approach using executable binaries in the Windows operating system. The evaluation results show that our approach has a competitive detection performance (97.3% accuracy and 96.5% recall rate) compared to existing malware detection models. Moreover, it gives an intuitive and easy-to-understand explanation of the model predictions by visualizing and correlating essential functions. We compare the identified essential functions reported by our system against several expert-made malware analysis reports from multiple sources. Our qualitative and quantitative analyses show that the pinpointed functions indicate accurate directions. In the best case, the top 2% of functions reported from the system can cover all expert-annotated functions in three steps. We believe that the MalwareExpert system has shed light on automated program behavior analysis.

AB - Malicious binaries have caused data and monetary loss to people, and these binaries keep evolving rapidly nowadays. With tons of new unknown attack binaries, one essential daily task for security analysts and researchers is to analyze and effectively identify malicious parts and report the critical behaviors within the binaries. While manual analysis is slow and ineffective, automated malware report generation is a long-term goal for malware analysts and researchers. This study moves one step toward the goal by identifying essential functions in malicious binaries to accelerate and even automate the analyzing process. We design and implement an expert system based on our proposed graph neural network called MalwareExpert. The system pinpoints the essential functions of an analyzed sample and visualizes the relationships between involved parts. We evaluate our proposed approach using executable binaries in the Windows operating system. The evaluation results show that our approach has a competitive detection performance (97.3% accuracy and 96.5% recall rate) compared to existing malware detection models. Moreover, it gives an intuitive and easy-to-understand explanation of the model predictions by visualizing and correlating essential functions. We compare the identified essential functions reported by our system against several expert-made malware analysis reports from multiple sources. Our qualitative and quantitative analyses show that the pinpointed functions indicate accurate directions. In the best case, the top 2% of functions reported from the system can cover all expert-annotated functions in three steps. We believe that the MalwareExpert system has shed light on automated program behavior analysis.

KW - Graph neural network

KW - machine learning for security

KW - malware analysis

KW - reverse engineering

UR - http://www.scopus.com/inward/record.url?scp=85161621413&partnerID=8YFLogxK

U2 - 10.1109/TIFS.2023.3283913

DO - 10.1109/TIFS.2023.3283913

M3 - Article

AN - SCOPUS:85161621413

SN - 1556-6013

VL - 18

SP - 4128

EP - 4143

JO - IEEE Transactions on Information Forensics and Security

JF - IEEE Transactions on Information Forensics and Security

ER -

Guided Malware Sample Analysis Based on Graph Neural Networks

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this