data-centric-ai vs DikeDataset

data-centric-ai

Resources for Data Centric AI (by HazyResearch)

Source Code

Suggest alternative

Edit details

DikeDataset

Dataset with labeled benign and malicious files 🗃️ (by iosifache)

Dataset malware-samples Artificial intelligence

Source Code

Suggest alternative

Edit details

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

data-centric-ai		DikeDataset
	Project
1	Mentions	2
1,068	Stars	77
1.5%	Growth	-
0.0	Activity	0.0
5 months ago	Latest Commit	9 months ago
TeX	Language	TeX
Apache License 2.0	License	MIT License

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

data-centric-ai

Posts with mentions or reviews of data-centric-ai. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2021-09-13.

[P] Rubrix: Open-source Python framework for NLP data annotation, exploration, and monitoring
2 projects | /r/MachineLearning | 13 Sep 2021

In line with initiatives like Data-centric AI (https://https-deeplearning-ai.github.io/data-centric-comp/, https://github.com/HazyResearch/data-centric-ai), we firmly believe that iterating on datasets (finding label errors, dataset slicing, QA, etc.) will become more and more important, and tools for making this easier and involving different roles are needed.

DikeDataset

Posts with mentions or reviews of DikeDataset. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-03-31.

Dataset with labeled benign and malicious files
5 projects | /r/Malware | 31 Mar 2022

[2] DikeDataset

2 projects | /r/opensource | 31 Mar 2022

Hi, Reddit, During the project implementation for my bachelor's thesis [1], a software (named dike, as the Greek goddess of justice) capable of analyzing malicious programs using artificial intelligence techniques, I was unable to locate an open source dataset with labeled malware samples in the public domain. As a result, I created DikeDataset, a dataset with labeled PE and OLE samples [2]. Because it was not the main focus of my thesis, the samples attributes are not evenly distributed (the benign-malicious and OLE-PE ratios are quite low), but the dataset aided greatly in the research process. This week, I was surprised to see that the public GitHub repository (which was used only for storage, without any promotion on communities like this) gained some organic reach (views, clones and stars). Furthermore, I was thrilled to learn that it was used in a research article published in 2021 [3]! As a result, I'd like to share this project with the community in the hopes that it will be useful to some members of the community. [1] dike [2] DikeDataset [3] Toward Identifying APT Malware through API System Calls

What are some alternatives?

When comparing data-centric-ai and DikeDataset you can also consider the following projects:

argilla - Argilla is a collaboration platform for AI engineers and domain experts that require high-quality outputs, full data ownership, and overall efficiency.

dike - Platform for automatic analysis of malicious applications using artificial intelligence algorithms ⚖️

pytorch-lightning - The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate. [Moved to: https://github.com/PyTorchLightning/pytorch-lightning]

CPPE-Dataset - Code for our paper CPPE - 5 (Medical Personal Protective Equipment), a new challenging object detection dataset

prometheus-spec - Cryptoeconomically-safe trustless high-load computing on top of Bitcoin

public-apis - A collective list of free APIs

pytorch-lightning - Build high-performance AI models with PyTorch Lightning (organized PyTorch). Deploy models with Lightning Apps (organized Python to build end-to-end ML systems). [Moved to: https://github.com/Lightning-AI/lightning]

theZoo - A repository of LIVE malwares for your own joy and pleasure. theZoo is a project created to make the possibility of malware analysis open and available to the public.

data-centric-AI - A curated, but incomplete, list of data-centric AI resources.

spaCy - 💫 Industrial-strength Natural Language Processing (NLP) in Python

autoscraper - A Smart, Automatic, Fast and Lightweight Web Scraper for Python