Research Team Status

  • Names of researchers and position 
    (e.g. Research Scientist, PostDoc, Student (Undergrad/Masters/PhD))
  • Skyler Grandel, PhD student
  • Dung Thuy "Judy" Nguyen, PhD student
  • Kailani "Cai" Lemieux-Mack , PhD student
  • Yifan Zhang, PhD student
  • Preston Robinette, PhD student
  • Eli Jiang, undergraduate student
  • Evelyn Guo, undergraduate student

 

  • Any new collaborations with other universities/researchers?
    • Worked with CMU on recent reverse engineering/decompilation project.  They provided raw data concerning human comprehension of decompilation enhanced with AI models.  

Project Goals

  • What is the current project goal?
    • This quarter, we have been addressing robustness and generalizability of machine learning models.  This will ultimately contribute to the enhancement of neural network based malware classifiers.  We previously reported advancements in machine unlearning and purification techniques.  This quarter, we have investigated a new approach to domain generalization through the use of an interpolative style transfer technique that enables clients in a federated learning scenario to improve model performance while retaining data privacy.   In the context of malware classification, this can enable federated learning scenarios where different clients own different subsets of malware and to transfer between them -- in turn, providing a basis for synthesizing plausible novel samples for improving classification defense. 
       
  • How does the current goal factor into the long-term goal of the project?
    • The overall goal of the project is to improve neural malware classifiers through the consideration of new malware classes and families.  Domain shift is a key issue -- the ability to foresee what "tomorrow's malware" will look like to support accurate detection as adversaries advance.  This quarter's developments contribute another approach to improving the generalizability and robustness of machine learning classifiers, specifically enhancing shifting between domains, which in turn can be applied to shifting between properties of malware samples. 

Accomplishments

  • Address whether project milestones were met. If milestones were not met, explain why, and what are the next steps.

    • We are on track to meet the Year 2 milestones.  We previously reported PBP and MalMixer as part of Year 1's effort to development malware augmentation techniques.  We also previously reported machine unlearning advancements.  The current quarter's work entails important advancements in model robustness and generalizability, which is critical for advancing malware classification.  

     

  • What is the contribution to foundational cybersecurity research? Was there something discovered or confirmed?
    • Domain generalization is an important problem in machine learning due to domain shift -- that new unseen samples may contain properties that do not match anything previously seen during the training of a model.  While possible to retrain a model on new data, this requires substantial effort to label and large amounts of computational resources.  Further, the time spent waiting to retain a model may mean missing important in-the-wild samples that do not match an existing domain.  Previous techniques make critical assumptions about the distribution of training samples among clients and about the number of clients that participate in a federated learning scenario.  Our approach involves the creation of a straightforward vector of statistics of each domain within each client, which can be shared among clients to enable transfer between clients without sharing client data (and while retaining reasonable training performance).  This enhances the robustness of the global model while better maintaining privacy among clients.  

       

  • Impact of research
    • Internal to the university (coursework/curriculum)
      • None new to report.
    • External to the university (transition to industry/government (local/federal); patents, start-ups, software, etc.)
      • None to report.
    • Any acknowledgements, awards, or references in media?
      • None to report

 

Publications and presentations

  • Add publication reference in the publications section below. An authors copy or final should be added in the report file(s) section. This is for NSA's review only.
  • Optionally, upload technical presentation slides that may go into greater detail. For NSA's review only.
Report Materials