Looking for insight on labelling portable executable (PE) malware files using a VirusTotal API response report.

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

avclass

1 445 5.7 Python

AVClass malware labeling tool

AVClass is a great tool which takes the various AV labels from some course, like VT, and comes up with a deterministic name which makes sense. https://github.com/malicialab/avclass

MalConv2

1 55 10.0 Python

Classifying Sequences of Extreme Length with Constant Memory Applied to Malware Detection

My former colleagues and I did some work with CNNs in raw malware bytes: https://arxiv.org/abs/2012.09390, https://github.com/NeuromorphicComputationResearchProgram/MalConv2. Maybe that can help, if you’re interested in other options to replace the images.

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
DeepMalwareDetector

1 65 0.0 Python

A Deep Learning framework that analyses Windows PE files to detect malicious Softwares.

What brought me to this research was these studies [1] [2], which demonstrates how image-based malware classification can be done using a CNN (convolutional neural network). Since I had a bit of a background with malware, and I recently completed a CNN model, I figured I would try to do something similar. It was only after investigating different materials I hit a bit of a roadblock. I found this one dataset, malimg [3], which is made up of PE files that have been converted into images already. I didn't want to just use the images, I wanted to demonstrate how to get them, only the method used to classify them turned out to be a bit out of my depth, kind of like this whole project, it's discussed in Section 4.2 of this paper [4] . There's also this set [5], which contains the pixel content for each file record. And as for the static disassembly you mention, I think you are right, the training data might not exist. During my investigation the best I could find was this study [6].

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Zero Shot Text Classification Under the hood

1 project | dev.to | 5 May 2024
Demystifying OS Concepts (Part 2): Other Synchronization Primitives

1 project | dev.to | 5 May 2024
Show HN: Hacker News over SSH – Browse HN Articles Directly from Your Terminal

1 project | news.ycombinator.com | 5 May 2024
Ask HN: How do you develop and maintain a good note-taking habit?

2 projects | news.ycombinator.com | 5 May 2024
Rabbit R1 can be run on a Android device

1 project | news.ycombinator.com | 5 May 2024