Bonan Ruan
Portrait of Bonan Ruan

Bonan Ruan

阮博男

Ph.D. Candidate, School of Computing

National University of Singapore

Advised by Prof. Zhenkai Liang

I am a third-year Ph.D. candidate at NUS. My research sits at the intersection of agentic system security, software supply chains, and program analysis. Recent work studies LLM-induced risks in GitHub CI workflows, anomaly detection for LLM-based agents, malicious MCP servers, vulnerability propagation, and practical security tooling for complex software systems.

Education

National University of Singapore

Ph.D. Candidate, School of Computing

Jan. 2024 - present

National University of Singapore

Master of Computing

2022 - 2023

Tongji University

B.E. in Information Security with one year on German

2014 - 2019

Honors & Awards

  • Research Achievement Award, School of Computing, NUS2025
  • Distinguished Paper Award, USENIX Security2025
  • Best Practical Paper Award, RAID2024

News

Selected Publications

Representative papers with abstracts, BibTeX, and project materials.

Full publication list

[Preprint] Heimdallr: Characterizing and Detecting LLM-Induced Security Risks in GitHub CI Workflows

Bonan Ruan, Yeqi Fu, Chuqi Zhang, Jiahao Liu, Jun Zeng, Zhenkai Liang

arXiv 2026

@article{ruan2026heimdallr,
  title={Heimdallr: Characterizing and Detecting LLM-Induced Security Risks in GitHub CI Workflows},
  author={Ruan, Bonan and Fu, Yeqi and Zhang, Chuqi and Liu, Jiahao and Zeng, Jun and Liang, Zhenkai},
  journal={arXiv preprint arXiv:2605.05969},
  year={2026}
}

GitHub Continuous Integration (CI) workflows increasingly integrate Large Language Models (LLMs) to automate review, triage, content generation, and repository maintenance. This creates a new attack surface: externally controllable workflow inputs can shape LLM prompts and outputs, which may in turn affect security decisions, repository state, or privileged execution. Although LLM security and CI security have each been studied extensively, their intersection remains underexplored. In this paper, we present the first study of LLM-induced security risks in GitHub CI workflows. We characterize the problem along the full execution chain and develop a taxonomy of high-level risk classes and concrete threat vectors. To detect such risks in practice, we design Heimdallr, a hybrid analysis framework that normalizes workflows into an LLM-Workflow Property Graph (L-WPG) and combines triggerability analysis, LLM-assisted dataflow summarization, and deterministic propagation to synthesize concrete threat-vector findings. Evaluated on 300 manually annotated unique workflows, Heimdallr achieves high accuracy on LLM-node identification (F1 = 0.994), triggerability classification (99.8%), and threat-vector detection (micro-average F1 = 0.917). As part of an ongoing detection and disclosure effort, we have so far responsibly disclosed 802 vulnerable workflow instances across 759 repositories and received 71 acknowledgments.

[ICLR 2026] DevOps-Gym: Benchmarking AI Agents in Software DevOps Cycle

Yuheng Tang*, Kaijie Zhu*, Bonan Ruan, Chuqi Zhang, Michael Yang, Hongwei Li, Suyue Guo, Tianneng Shi, Zekun Li, Christopher Kruegel, Giovanni Vigna, Dawn Song, William Yang Wang, Lun Wang, Yangruibo Ding, Zhenkai Liang, Wenbo Guo

14th International Conference on Learning Representations

@inproceedings{tang2026devopsgym,
  title={DevOps-Gym: Benchmarking AI Agents in Software DevOps Cycle},
  author={Yuheng Tang and Kaijie Zhu and Bonan Ruan and Chuqi Zhang and Michael Yang and Hongwei Li and Suyue Guo and Tianneng Shi and Zekun Li and Christopher Kruegel and Giovanni Vigna and Dawn Song and William Yang Wang and Lun Wang and Yangruibo Ding and Zhenkai Liang and Wenbo Guo},
  booktitle={The Fourteenth International Conference on Learning Representations},
  year={2026},
  url={https://openreview.net/forum?id=bP48r4dt7Z}
}

Even though demonstrating extraordinary capabilities in code generation and software issue resolving, AI agents' capabilities in the full software DevOps cycle are still unknown. Different from pure code generation, handling the DevOps cycle in real-world software, including developing, deploying, and managing, requires analyzing large-scale projects, understanding dynamic program behaviors, leveraging domain-specific tools, and making sequential decisions. However, existing benchmarks focus on isolated problems and lack environments and tool interfaces for DevOps. We introduce DevOps-Gym, the first end-to-end benchmark for evaluating AI agents across core DevOps workflows: build and configuration, monitoring, issue resolving, and test generation. DevOps-Gym includes 700+ real-world tasks collected from 30+ projects in Java and Go. We develop a semi-automated data collection mechanism with rigorous and non-trivial expert efforts in ensuring the task coverage and quality. Our evaluation of state-of-the-art models and agents reveals fundamental limitations: they struggle with issue resolving and test generation in Java and Go, and remain unable to handle new tasks such as monitoring and build and configuration. These results highlight the need for essential research in automating the full DevOps cycle with AI agents.

[Preprint] TraceAegis: Securing LLM-Based Agents via Hierarchical and Behavioral Anomaly Detection

Jiahao Liu, Bonan Ruan, Xianglin Yang, Zhiwei Lin, Yan Liu, Yang Wang, Tao Wei, Zhenkai Liang

arXiv 2025

@article{liu2025traceaegis,
  title={TraceAegis: Securing LLM-Based Agents via Hierarchical and Behavioral Anomaly Detection},
  author={Liu, Jiahao and Ruan, Bonan and Yang, Xianglin and Lin, Zhiwei and Liu, Yan and Wang, Yang and Wei, Tao and Liang, Zhenkai},
  journal={arXiv preprint arXiv:2510.11203},
  year={2025}
}

LLM-based agents have demonstrated promising adaptability in real-world applications. However, these agents remain vulnerable to a wide range of attacks, such as tool poisoning and malicious instructions, that compromise their execution flow and can lead to serious consequences like data breaches and financial loss. Existing studies typically attempt to mitigate such anomalies by predefining specific rules and enforcing them at runtime to enhance safety. Yet, designing comprehensive rules is difficult, requiring extensive manual effort and still leaving gaps that result in false negatives. As agent systems evolve into complex software systems, we take inspiration from software system security and propose TraceAegis, a provenance-based analysis framework that leverages agent execution traces to detect potential anomalies. In particular, TraceAegis constructs a hierarchical structure to abstract stable execution units that characterize normal agent behaviors. These units are then summarized into constrained behavioral rules that specify the conditions necessary to complete a task. By validating execution traces against both hierarchical and behavioral constraints, TraceAegis is able to effectively detect abnormal behaviors. To evaluate the effectiveness of TraceAegis, we introduce TraceAegis-Bench, a dataset covering two representative scenarios: healthcare and corporate procurement. Each scenario includes 1,300 benign behaviors and 300 abnormal behaviors, where the anomalies either violate the agent’s execution order or break the semantic consistency of its execution sequence. Experimental results demonstrate that TraceAegis achieves strong performance on TraceAegis-Bench, successfully identifying the majority of abnormal behaviors. We further validate TraceAegis’ practicality through an internal redteaming process conducted within a technology company, where it effectively detects abnormal traces generated by red-team attacks.

[ASE 2025] Propagation-Based Vulnerability Impact Assessment for Software Supply Chains

Bonan Ruan, Zhiwei Lin, Jiahao Liu, Chuqi Zhang, Kaihang Ji, Zhenkai Liang

40th IEEE/ACM International Conference on Automated Software Engineering

@inproceedings{ruan2025vpss,
  title={Propagation-Based Vulnerability Impact Assessment for Software Supply Chains},
  author={Ruan, Bonan and Lin, Zhiwei and Liu, Jiahao and Zhang, Chuqi and Ji, Kaihang and Liang, Zhenkai},
  booktitle={Proceedings of the 40th IEEE/ACM International Conference on Automated Software Engineering},
  pages={65--77},
  year={2025}
}

Identifying the impact scope and scale is critical for software supply chain vulnerability assessment. However, existing studies face substantial limitations. First, prior studies either work at coarse package-level granularity producing many false positives or fail to accomplish whole-ecosystem vulnerability propagation analysis. Second, although vulnerability assessment indicators like CVSS characterize individual vulnerabilities, no metric exists to specifically quantify the dynamic impact of vulnerability propagation across software supply chains. To address these limitations and enable accurate and comprehensive vulnerability impact assessment, we propose a novel approach: (i) a hierarchical worklist-based algorithm for whole-ecosystem and call-graph-level vulnerability propagation analysis and (ii) the Vulnerability Propagation Scoring System (VPSS), a dynamic metric to quantify the scope and evolution of vulnerability impacts in software supply chains. We implement a prototype of our approach in the Java Maven ecosystem and evaluate it on 100 real-world vulnerabilities. Experimental results demonstrate that our approach enables effective ecosystem-wide vulnerability propagation analysis, and provides a practical, quantitative measure of vulnerability impact through VPSS.

[USENIX Security 2025] Fuzzing the PHP Interpreter via Dataflow Fusion

Yuancheng Jiang, Chuqi Zhang, Bonan Ruan, Jiahao Liu, Manuel Rigger, Roland Yap, Zhenkai Liang

34th USENIX Security Symposium

Distinguished Paper Award

@inproceedings{jiang2025fuzzing,
  title={Fuzzing the PHP Interpreter via Dataflow Fusion},
  author={Jiang, Yuancheng and Zhang, Chuqi and Ruan, Bonan and Liu, Jiahao and Rigger, Manuel and Yap, Roland HC and Liang, Zhenkai},
  booktitle={34th USENIX Security Symposium (USENIX Security 25)},
  pages={6143--6158},
  year={2025}
}

PHP, a dominant scripting language in web development, powers a vast range of websites, from personal blogs to major platforms. While existing research primarily focuses on PHP application-level security issues like code injection, memory errors within the PHP interpreter have been largely overlooked. These memory errors, prevalent due to the PHP interpreter's extensive C codebase, pose significant risks to the confidentiality, integrity, and availability of PHP servers. This paper introduces FlowFusion, the first automatic fuzzing framework to detect memory errors in the PHP interpreter. FlowFusion leverages dataflow as an efficient representation of test cases maintained by PHP developers, merging two or more test cases to produce fused test cases with more complex code semantics. Moreover, FlowFusion employs strategies such as test mutation, interface fuzzing, and environment crossover to increase bug finding. In our evaluation, FlowFusion found 158 unknown bugs in the PHP interpreter, with 125 fixed and 11 confirmed. Comparing FlowFusion against the official test suite and a naive test concatenation approach, FlowFusion can detect new bugs that these methods miss, while also achieving greater code coverage. FlowFusion also outperformed state-of-the-art fuzzers AFL++ and Polyglot, covering 24% more lines of code after 24 hours of fuzzing. FlowFusion has gained wide recognition among PHP developers and is now integrated into the official PHP toolchain.

[RAID 2024] KernJC: Automated Vulnerable Environment Generation for Linux Kernel Vulnerabilities

Bonan Ruan, Jiahao Liu, Chuqi Zhang, Zhenkai Liang

27th International Symposium on Research in Attacks, Intrusions and Defenses

Best Practical Paper Award · Black Hat Asia 2025 Briefings

@inproceedings{ruan2024kernjc,
  title={Kernjc: Automated vulnerable environment generation for linux kernel vulnerabilities},
  author={Ruan, Bonan and Liu, Jiahao and Zhang, Chuqi and Liang, Zhenkai},
  booktitle={Proceedings of the 27th International Symposium on Research in Attacks, Intrusions and Defenses},
  pages={384--402},
  year={2024}
}

Linux kernel vulnerability reproduction is a critical task in system security. To reproduce a kernel vulnerability, the vulnerable environment and the Proof of Concept (PoC) program are needed. Most existing research focuses on the generation of PoC, while the construction of environment is overlooked. However, establishing an effective vulnerable environment to trigger a vulnerability is challenging. Firstly, it is hard to guarantee that the selected kernel version for reproduction is vulnerable, as the vulnerability version claims in online databases can occasionally be incorrect. Secondly, many vulnerabilities cannot be reproduced in kernels built with default configurations. Intricate non-default kernel configurations must be set to include and trigger a kernel vulnerability, but less information is available on how to recognize these configurations.

To solve these challenges, we propose a patch-based approach to identify real vulnerable kernel versions and a graph-based approach to identify necessary configs for activating a specific vulnerability. We implement these approaches in a tool, KernJC, automating the generation of vulnerable environments for kernel vulnerabilities. To evaluate the efficacy of KernJC, we build a dataset containing 66 representative real-world vulnerabilities with PoCs from kernel vulnerability research in the past five years. The evaluation shows that KernJC builds vulnerable environments for all these vulnerabilities, 32 (48.5%) of which require non-default configs, and 4 have incorrect version claims in the National Vulnerability Database (NVD). Furthermore, we conduct large-scale spurious version detection on kernel vulnerabilities and identify 128 vulnerabilities that have spurious version claims in NVD. To foster future research, we release KernJC with the dataset in the community.

Book Chapter

Cloud Native Security: Practice and Architecture

Wenmao Liu, Guolong Jiang, Ming Pu, Bonan Ruan, Xiaohu Ye

Beijing: China Machine Press. ISBN: 9787111691839. 2021.

Contributed chapters: 3, 4, 14, and 16.

Helping many cloud security practitioners get started and advance.

Douban | Code

Cloud Native Security book cover

Talks

Selected talks and briefings related to my research and security practice.

[Black Hat Asia 2025] KernJC: Automated Vulnerable Environment Generation for Linux Kernel Vulnerabilities

Abstract | Video | Slides | WP | Code

Linux kernel vulnerability reproduction is a critical task in system security. To reproduce a kernel vulnerability, the vulnerable environment and the Proof of Concept (PoC) program are needed. Most existing research focuses on the generation of PoC, while the construction of environment is overlooked. However, establishing an effective vulnerable environment to trigger a vulnerability is challenging. Firstly, it is hard to guarantee that the selected kernel version for reproduction is vulnerable, as the vulnerability version claims in online databases can occasionally be incorrect. Secondly, many vulnerabilities cannot be reproduced in kernels built with default configurations. Intricate non-default kernel configurations must be set to include and trigger a kernel vulnerability, but less information is available on how to recognize these configurations.

To solve these challenges, we propose a patch-based approach to identify real vulnerable kernel versions and a graph-based approach to identify necessary configs for activating a specific vulnerability. We implement these approaches in a tool, KernJC, automating the generation of vulnerable environments for kernel vulnerabilities. To evaluate the efficacy of KernJC, we build a dataset containing 66 representative real-world vulnerabilities with PoCs from kernel vulnerability research in the past five years. The evaluation shows that KernJC builds vulnerable environments for all these vulnerabilities, 32 (48.5%) of which require non-default configs, and 4 have incorrect version claims in the National Vulnerability Database (NVD). Furthermore, we conduct large-scale spurious version detection on kernel vulnerabilities and identify 128 vulnerabilities that have spurious version claims in NVD. To foster future research, we release KernJC with the dataset in the community.

[KCon 2022] Dilemma: runC's Achilles' Heel

Abstract | Video | Slides | WP

This talk explores the exploitation of vulnerabilities in container runtimes, focusing on two critical issues: CVE-2019-5736 in runC and CVE-2022-0847 (Dirty Pipe) in the Linux kernel. These vulnerabilities highlight the risks inherent in containerized environments, such as privilege escalation and host compromise, which pose significant threats to modern infrastructure security. The presentation begins with an analysis of Dirty Pipe, a Linux kernel vulnerability that allows unprivileged processes to overwrite data in read-only files, enabling code injection into privileged processes. We demonstrate how this exploit facilitates container escape when combined with the runC vulnerability, which allows attackers to overwrite the runC binary on the host system, achieving root access. Through live demonstrations, we showcase advanced exploitation techniques, including ELF manipulation, memory injection via Dirty Pipe, and stealthy post-exploitation persistence. We also examine practical mitigations, such as hardening runC and kernel-level defenses, to secure containerized environments. This comprehensive analysis provides valuable insights for security researchers and practitioners into detecting, mitigating, and understanding vulnerabilities in container infrastructures.

[OID Asia 2021] Metarget: Auto-construction of Vulnerable Cloud Native Infrastructure

1.4k GitHub stars; listed in the CNCF Cloud Native Landscape; widely used in industry.

Abstract | Slides | WP | Code

This talk introduces Metarget, an innovative framework designed for the automatic construction of vulnerable cloud-native environments. By facilitating the deployment of multi-layered, vulnerable infrastructures, Metarget enables researchers and ethical hackers to efficiently simulate complex attack scenarios ranging from container exploitation to cluster-level persistence. The presentation highlights offensive methodologies in cloud-native security, including real-world case studies such as post-penetration attacks against Kubernetes clusters. Using Metarget, we explore vulnerabilities like CVE-2020-15257 and CVE-2020-8559, demonstrating how attackers can achieve lateral movement and full cluster compromise. Additionally, we showcase k0otkit, a post-penetration persistence technique for Kubernetes, emphasizing its role in automating and advancing offensive security research. Through a detailed analysis of offensive strategies, this talk illustrates how tools like Metarget accelerate defensive innovations, paving the way for more robust cloud-native security practices.

[CIS 2020] k0otkit: A Universal Manipulation Technique in Post-Penetration against Kubernetes

Abstract | Slides | WP | Code

This presentation introduces k0otkit, a universal post-penetration control technique for Kubernetes (K8s) clusters. By leveraging Kubernetes-native features such as DaemonSets, Secrets, and container injection, k0otkit provides attackers with rapid, covert, and persistent control over large-scale clusters. This talk explores the evolution of k0otkit through its various iterations, highlighting advancements in stealth, persistence, and efficiency, including the adoption of fileless attack techniques and encrypted communication. The discussion outlines a typical Kubernetes penetration process, emphasizing container escape, privilege escalation, and lateral movement, leading to full cluster control. Through live demonstrations, we showcase how k0otkit exploits Kubernetes vulnerabilities, automates reverse shell deployment, and achieves seamless cluster-wide compromise. Finally, the talk concludes with key defensive strategies to mitigate these risks, including implementing Pod security policies, detecting anomalous container behavior, and protecting against fileless attacks. This comprehensive analysis offers valuable insights for both offensive and defensive Kubernetes security research.

Teaching

Academic Service

  • Artifact Evaluation reviewer for USENIX Security 2026
  • External reviewer for ACM ASIACCS 2026
  • External reviewer for ACM CCS 2025

Work Experience

National Cybersecurity R&D Lab

Research Intern, Singapore

Sep. 2022 - Apr. 2023

NSFOCUS, Inc.

Security Researcher, China

Jul. 2019 - Jun. 2022

Huawei, Ltd.

Software Developing Engineer Intern, China

Jul. 2017 - Aug. 2017