sharpened-cosine-similarity
DeepMalwareDetector


sharpened-cosine-similarity | DeepMalwareDetector | |
---|---|---|
2 | 1 | |
254 | 73 | |
0.0% | - | |
0.0 | 0.0 | |
11 months ago | almost 2 years ago | |
Python | Python | |
MIT License | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
sharpened-cosine-similarity
-
Alternatives to Cosine Similarity
Don't forget Sharpened Cosine Similarity[0] which arose from this really interesting twitter thread [1].
[0] https://github.com/brohrer/sharpened-cosine-similarity
[1] https://twitter.com/_brohrer_/status/1232063619657093120
- Sharpened Cosine Distance as an Alternative for Convolutions
DeepMalwareDetector
-
Looking for insight on labelling portable executable (PE) malware files using a VirusTotal API response report.
What brought me to this research was these studies [1] [2], which demonstrates how image-based malware classification can be done using a CNN (convolutional neural network). Since I had a bit of a background with malware, and I recently completed a CNN model, I figured I would try to do something similar. It was only after investigating different materials I hit a bit of a roadblock. I found this one dataset, malimg [3], which is made up of PE files that have been converted into images already. I didn't want to just use the images, I wanted to demonstrate how to get them, only the method used to classify them turned out to be a bit out of my depth, kind of like this whole project, it's discussed in Section 4.2 of this paper [4] . There's also this set [5], which contains the pixel content for each file record. And as for the static disassembly you mention, I think you are right, the training data might not exist. During my investigation the best I could find was this study [6].
What are some alternatives?
convolution-vision-transformers - PyTorch Implementation of CvT: Introducing Convolutions to Vision Transformers
easyesn - Python library for Reservoir Computing using Echo State Networks
hub - A library for transfer learning by reusing parts of TensorFlow models.
SparK - [ICLR'23 Spotlight🔥] The first successful BERT/MAE-style pretraining on any convolutional network; Pytorch impl. of "Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling"
Convolution-From-Scratch - Implementation of the generalized 2D convolution with dilation from scratch in Python and NumPy
strelka - Real-time, container-based file scanning at enterprise scale
albumentations - Fast and flexible image augmentation library. Paper about the library: https://www.mdpi.com/2078-2489/11/2/125
Unredactor - In this project we are tryinbg to create unredactor. Unredactor will take a redacted document and the redacted flag as input, inreturn it will give the most likely candidates to fill in redacted location. In this project we are only considered about unredacting names only. The data that we are considering is imdb data set with many review files. These files are used to buils corpora for finding tfidf score. Few files are used to train and in these files names are redacted and written into redacted folder. These redacted files are used for testing and different classification models are built to predict the probabilies of each class. Top 5 classes i.e names similar to the test features are written at the end of text in unreddacted foleder.
SCS-CCT - CCT but using Sharpened Cosine Similarity
MDML - Malware Detection using Machine Learning (MDML)
tinysleepnet - TinySleepNet: An Efficient Deep Learning Model for Sleep Stage Scoring based on Raw Single-Channel EEG by Akara Supratak and Yike Guo from The Faculty of ICT, Mahidol University and Imperial College London respectively
avclass - AVClass malware labeling tool

