2024 Q4 | Science of Security Virtual Organization

2024 Q4

Leveraging Machine Learning for Binary Software Understanding

Research Team Status

Names of researchers and position
(e.g. Research Scientist, PostDoc, Student (Undergrad/Masters/PhD))
- Yan Shoshitaishvili - Lead PI, Associate Professor
- Adam Doupe - Co-I, Associate Professor
- Chitta Baral - Co-I, Professor
- Divij Handa - PhD Student
- William Gibbs - PhD Student
- Michael Tompkins - PhD Student
Any new collaborations with other universities/researchers?
- None

Project Goals

What is the current project goal?
- Task 2 (Option Year 1): Higher-level decompliation abstraction. The focus here is to abstract the binary software beyond the decompiled code into human-level representations.
  - Task 2.1: Code to Human Description
  - Task 2.2: Translating Decompiled Code
  - Task 2.3: Code to High Level Structural Representations
How does the current goal factor into the long-term goal of the project?
- Long-Term Goal: Achieving binary software understanding, in order to make identifying security issues much easier and cheaper.
- Task 2 builds upon the foundations created by Task 1 by working towards being able to describe code in natural language, in a variety of programming languages, and to more abstract structural representations such as flow graphs or state transition diagrams.

Accomplishments

Address whether project milestones were met. If milestones were not met, explain why, and what are the next steps.
What is the contribution to foundational cybersecurity research? Was there something discovered or confirmed?
Impact of research
- Internal to the university (coursework/curriculum)
- External to the university (transition to industry/government (local/federal); patents, start-ups, software, etc.)
- Any acknowledgements, awards, or references in media?

Recompilable Decompilation:

Oct-Dec 2024: The goal of this project is to make angr's decompiled code recompilable, ensuring that the recompiled binary not only compiles successfully but also exhibits the intended behavior. A key focus is on verifying the correctness of the recompiled binaries' behavior, ensuring they faithfully reproduce the original functionality. We do this validation by trying to achieve byte equivalence.

Decompiled code typically does not recompile out of the box because it does not conform to the C syntax rules expected by compilers like GCC. We have developed a preliminary pipeline that attempts to recompile the decompiled code and verify the functionality of the recompiled binary.

From our last reported update, our pipeline for recompilation is to try to recompile at the function level, i.e., recompile all individual functions separately and then start recompiling groups of functions into a single object file. After this, we link them together at the end. In this approach, we encountered mismatches between the function prototypes of external and binary (internal) functions, incorrect variable data types, and so forth among the functions that were recompiled together (in the presence of source code). This essentially tries to make a decompiled function closer to its source code. However, recovering the exact prototype of an external function is difficult due to the loss of information, and similarly for other cases as well. We fixed some of these issues in angr. For an estimate, we can recompile approximately 25% of the functions from coreutils. The next step is to try to recompile all the decompiled functions together (without the source code).

Software Reconstruction and Collaborative Reverse Engineering

Oct-Dec 2024: The research investigates collaborative dynamics in software reconstruction within reverse engineering (RE), focusing on human factors in the recovery and recompilation phases. Unlike traditional RE, which is often an individual effort, this study explores reconstruction as a team-driven process, particularly in large-scale projects like video game recovery. The research analyzes the methodologies and workflows used by the video game community, a highly active and diverse group that engages in cross-platform and multi-language software reconstruction.

By studying these projects, the research aims to uncover the technical and social aspects of collaboration in RE, including knowledge sharing, role distribution, and decision-making. Additionally, it examines challenges in recompilation, such as preserving software functionality, handling missing dependencies, and improving tool support for team-based workflows. Insights from this study will contribute to understanding RE as a collaborative effort, inform the development of better reconstruction tools, and support the broader goal of software preservation. Ultimately, this research redefines RE beyond solo efforts, emphasizing teamwork in tackling complex software recovery challenges.

AI Assisted Reverse Engineering and Decompilers

Oct-Dec 2024: We have continued to make progress on our recent work REaLLM, which aims to study how humans, decompilers, and LLMs interact in reverse engineering software. Our findings and tooling continue to inch reverse engineering toward a more automated and AI-backed approach. As of today, over 39 humans have participated in our study, resulting in over 62 hours of recorded reverse engineering time. Our analysis of the data is still preliminary, but we have some interesting first findings. Below is a graph showing the average gain in time LLM users had over non-LLM users on a per-function basis while reverse engineering two programs.

[Image Link: Time Saved in Functions Over Non-LLM Users]

Orange points are functions that implement common algorithms, such as Base64 Decode. First, this graph shows that LLM users often save time over non-LLM users while reverse engineering, though the gains are small (around 60 seconds on average). Second, time gains are not dominated by common functions, which may indicate that LLMs do less recall than anticipated.

What is not shown in this graph is that although LLMs seem to be helpful on average, we find that when they are harmful, they are significantly harmful. For instance, of the 51 functions shown in this graph, only 4 had statistically significant differences in performance time between LLM and non-LLM users. In 3 cases, LLM usage more than doubled the average time spent looking at a function. This data indicates that LLMs can be marginally helpful, but when they are harmful, they have much more serious consequences on understanding.

We continue to analyze this data for findings and plan to have a paper draft ready for view by mid-April. A research prototype is also already available that implements the LLM findings we discuss above: https://github.com/mahaloz/DAILA.

Rust Decompilation

Oct-Dec 2024: Our research on Rust decompilation aims to develop a Rust decompiler on top of C/C++ decompiler angr to generate semantically equivalent Rust pseudocode. We have finished a prototype of Rust decompiler called Oxidizer. The contributions we have now are (i) we have better type recovery for Rust decompilation - we are now able to recover struct return types and struct argument types of a function; (ii) we have better control flow and data flow simplification that significantly reduces lines of code, number of variables, and other metrics; (iii) We completed a prototype of Rust decompiler.

This research project is still in progress. We are working on some incremental work including adding more malware samples to our evaluation, improving Rust type recovery and so on.

Publications and presentations

Add publication reference in the publications section below. An authors copy or final should be added in the report file(s) section. This is for NSA's review only.
Optionally, upload technical presentation slides that may go into greater detail. For NSA's review only.

No new published papers since last quarterly report.

Lead PI:

Yan Shoshitaishvili

Co-Pi(s):

Adam Doupé