Looking for insight on labelling portable executable (PE) malware files using a VirusTotal API response report.

This page summarizes the projects mentioned and recommended in the original post on /r/Malware

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • avclass

    AVClass malware labeling tool

  • AVClass is a great tool which takes the various AV labels from some course, like VT, and comes up with a deterministic name which makes sense. https://github.com/malicialab/avclass

  • MalConv2

    Classifying Sequences of Extreme Length with Constant Memory Applied to Malware Detection

  • My former colleagues and I did some work with CNNs in raw malware bytes: https://arxiv.org/abs/2012.09390, https://github.com/NeuromorphicComputationResearchProgram/MalConv2. Maybe that can help, if you’re interested in other options to replace the images.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • DeepMalwareDetector

    A Deep Learning framework that analyses Windows PE files to detect malicious Softwares.

  • What brought me to this research was these studies [1] [2], which demonstrates how image-based malware classification can be done using a CNN (convolutional neural network). Since I had a bit of a background with malware, and I recently completed a CNN model, I figured I would try to do something similar. It was only after investigating different materials I hit a bit of a roadblock. I found this one dataset, malimg [3], which is made up of PE files that have been converted into images already. I didn't want to just use the images, I wanted to demonstrate how to get them, only the method used to classify them turned out to be a bit out of my depth, kind of like this whole project, it's discussed in Section 4.2 of this paper [4] . There's also this set [5], which contains the pixel content for each file record. And as for the static disassembly you mention, I think you are right, the training data might not exist. During my investigation the best I could find was this study [6].

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Zero Shot Text Classification Under the hood

    1 project | dev.to | 5 May 2024
  • Demystifying OS Concepts (Part 2): Other Synchronization Primitives

    1 project | dev.to | 5 May 2024
  • Show HN: Hacker News over SSH – Browse HN Articles Directly from Your Terminal

    1 project | news.ycombinator.com | 5 May 2024
  • Ask HN: How do you develop and maintain a good note-taking habit?

    2 projects | news.ycombinator.com | 5 May 2024
  • Rabbit R1 can be run on a Android device

    1 project | news.ycombinator.com | 5 May 2024