autocombo

Introduction

This repository contains the demo code and dataset used in our paper AutoCombo.

To use the code and/or dataset, please kindly cite our paper:

Min Du, Wenjun Hu, and William Hewlett. "AutoCombo: Automatic Malware Signature Generation Through Combination Rule Mining." In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pp. 3777-3786. 2021.

Prerequisites

Python 3

Dataset

./dataset/property_lists_raw folder contains a couple of example files.

To get the complete dataset, please email data-sets@paloaltonetworks.com

Dataset Preparation

Input: property lists within time range

[dataset_preparer]
start_date = 2020-07-01
end_date = 2020-07-31

  • Step 1: generate property-index mapping
python generate_property_index_mapping.py

Output: ./dataset/property_index_mapping.json
  • Step 2: convert property lists into integers
python run_integer_converter.py
Output: ./dataset/property_lists_integer

Signature Generation

Input: property lists within time range

[combo_generation]
start_date = 2020-07-01
end_date = 2020-07-15

Output: all intermediate results in this step are stored into folder ./dataset/combo_generation_results

Signature creation

choose different generation mode in config file ./config/common_config.ini:

[combo_generation]
; choose from:
; ablation_study
; multi_processing
; store_parent_hits
; prestore (hashes-per-property)
generation_mode = prestore
; for ablation study
use_integer_subset = True
do_property_sorting = True
; for multi_processing
num_cores = 20
  • Step 1: sort properties with MF-IBF or other heuristic
python run_property_sorter.py
  • Step 2: enumerate all candidates and generate signatures
python run_combo_generator.py

Signature refining

choose between different methods

[combo_selection]
; choose from best-remaining or threshold
combo_selection_approach = best-remaining
  • Step 1 (for threshold-based method): sort signatures based on their own contributions
python run_combo_sorter.py
  • Step 2: select signatures using best-remaining or threshold based method
python run_combo_selection.py

Signature Evaluation

Input: property lists within time range

[combo_evaluation]
start_date = 2020-07-16
end_date = 2020-07-31

Output: check new files in folder ./dataset/combo_generation_results

  • Evaluate the generated signatures
python run_combo_evaluator.py

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Developer Sites

Social


Copyright © 2024 Palo Alto Networks, Inc. All rights reserved.