Top 23 fraud-detection Open-Source Projects

fingerprintjs

346 20,938 7.8 TypeScript

Browser fingerprinting library. Accuracy of this version is 40-60%, accuracy of the commercial Fingerprint Identification is 99.5%. V4 of this library is BSL licensed.

Project mention: Should I Open Source my Company? | news.ycombinator.com | 2024-01-22

pyod

7 7,941 7.7 Python

A Comprehensive and Scalable Python Library for Outlier Detection (Anomaly Detection)

Project mention: A Comprehensive Guide for Building Rag-Based LLM Applications | news.ycombinator.com | 2023-09-13

This is a feature in many commercial products already, as well as open source libraries like PyOD. https://github.com/yzhao062/pyod

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
anomaly-detection-resources

98 7,871 4.6 Python

Anomaly detection related books, papers, videos, and toolboxes

Project mention: anomaly-detection-resources: NEW Extended Research - star count:7507.0 | /r/algoprojects | 2023-10-24

MISP

28 4,986 9.9 PHP

MISP (core software) - Open Source Threat Intelligence and Sharing Platform

Project mention: A recent abrupt change in Internet SSH brute force attacks against us | news.ycombinator.com | 2024-02-24

awesome-fraud-detection-papers

51 1,545 3.4 Python

A curated list of data mining papers about fraud detection.

Project mention: awesome-fraud-detection-papers: NEW Extended Research - star count:1346.0 | /r/algoprojects | 2023-05-13

graph-fraud-detection-papers

22 1,263 6.0

A curated list of graph-based fraud, anomaly, and outlier detection papers & resources
pygod

3 1,208 8.6 Python

A Python Library for Graph Outlier Detection (Anomaly Detection)

Project mention: RAG Using Structured Data: Overview and Important Questions | news.ycombinator.com | 2024-01-10

Ok, using ChatGPT and Bard (the irony lol) I learned a bit more about GNNs:
GNNs are probabilistic and can be trained to learn representations in graph-structured data and handling complex relationships, while classical graph algorithms are specialized for specific graph analysis tasks and operate based on predefined rules/steps.
* Why is PyG it called "Geometric" and not "Topologic" ?
Properties like connectivity, neighborhoods, and even geodesic distances can all be considered topological features of a graph. These features remain unchanged under continuous deformations like stretching or bending, which is the defining characteristic of topological equivalence. In this sense, "PyTorch Topologic" might be a more accurate reflection of the library's focus on analyzing the intrinsic structure and connections within graphs.
However, the term "geometric" still has some merit in the context of PyG. While most GNN operations rely on topological principles, some do incorporate notions of Euclidean geometry, such as:
- Node embeddings: Many GNNs learn low-dimensional vectors for each node, which can be interpreted as points in a vector space, allowing geometric operations like distances and angles to be applied.
- Spectral GNNs: These models leverage the eigenvalues and eigenvectors of the graph Laplacian, which encodes information about the geometric structure and distances between nodes.
- Manifold learning: Certain types of graphs can be seen as low-dimensional representations of high-dimensional manifolds. Applying GNNs in this context involves learning geometric properties on the manifold itself.
Therefore, although topology plays a primary role in understanding and analyzing graphs, geometry can still be relevant in certain contexts and GNN operations.
* Real world applications:
- HuggingFace has a few models [0] around things like computational chemistry [1] or weather forecasting.
- PyGod [2] can be used for Outlier Detection (Anomaly Detection).
- Apparently ULTRA [3] can "infer" (in the knowledge graph sense), that Michael Jackson released some disco music :-p (see the paper).
- RGCN [4] can be used for knowledge graph link prediction (recovery of missing facts, i.e. subject-predicate-object triples) and entity classification (recovery of missing entity attributes).
- GreatX [5] tackles removing inherent noise, "Distribution Shift" and "Adversarial Attacks" (ex: noise purposely introduced to hide a node presence) from networks. Apparently this is a thing and the field is called "Graph Reliability" or "Reliable Deep Graph Learning". The author even has a bunch of "awesome" style lists of links! [6]
- Finally this repo has a nice explanation of how/why to run machine learning algorithms "outside of the DB":
"Pytorch Geometric (PyG) has a whole arsenal of neural network layers and techniques to approach machine learning on graphs (aka graph representation learning, graph machine learning, deep graph learning) and has been used in this repo [7] to learn link patterns, also known as link or edge predictions."
--
0: https://huggingface.co/models?pipeline_tag=graph-ml&sort=tre...
1: https://github.com/Microsoft/Graphormer
2: https://github.com/pygod-team/pygod
3: https://github.com/DeepGraphLearning/ULTRA
4: https://huggingface.co/riship-nv/RGCN
5: https://github.com/EdisonLeeeee/GreatX
6: https://edisonleeeee.github.io/projects.html
7: https://github.com/Orbifold/pyg-link-prediction

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
ThreatIngestor

1 786 7.6 Python

Extract and aggregate threat intelligence.
DGFraud

16 655 2.6 Python

A Deep Graph-based Toolbox for Fraud Detection
fraud-detection-handbook

1 429 0.0 Jupyter Notebook

Reproducible Machine Learning for Credit Card Fraud Detection - Practical Handbook
getIPIntel

29 306 5.0 PHP

IP Intelligence is a free Proxy VPN TOR and Bad IP detection tool to prevent Fraud, stolen content, and malicious users. Block proxies, VPN connections, web host IPs, TOR IPs, and compromised systems with a simple API. GeoIP lookup available.

Project mention: getIPIntel: NEW Extended Research - star count:243.0 | /r/algoprojects | 2023-04-29

TabFormer

10 295 0.0 Python

Code & Data for "Tabular Transformers for Modeling Multivariate Time Series" (ICASSP, 2021)

Project mention: Time-based splitting performing significantly worse than random splitting | /r/learnmachinelearning | 2023-05-20

Hi, I am currently working on a basic binary classifier for a transaction dataset, to predict which transaction is fraudulent (Dataset: https://github.com/IBM/TabFormer). The following is a quick summary of the dataset:

fraud-detection-using-machine-learning

7 248 1.8 Jupyter Notebook

Setup end to end demo architecture for predicting fraud events with Machine Learning using Amazon SageMaker
Free-RASP-Community

7 241 5.9

SDK providing app protection and threat monitoring for mobile devices, available for Flutter, Cordova, Android and iOS.

Project mention: Attempt#2 - HELP! I'm looking for beta testers for my app, and would be great if this post doesn't get deleted | /r/algeria | 2023-06-10

if i may ask, how did you test the app? would you recommend this?

MLSys-NYU-2022

9 238 10.0 Jupyter Notebook

Slides, scripts and materials for the Machine Learning in Finance Course at NYU Tandon, 2022

Project mention: Where to start | /r/mlops | 2023-09-13

There are 3 courses that I usually recommend to folks looking to get into MLE/MLOps that already have a technical background. The first is a higher-level look at the MLOps processes, common challenges and solutions, and other important project considerations. It's one of Andrew Ng's courses from Deep Learning AI but you can audit it for free if you don't need the certificate: - Machine Learning in Production For a more hands-on, in-depth tutorial, I'd recommend this course from NYU (free on GitHub), including slides, scripts, full-code homework: - Machine Learning Systems And the title basically says it all, but this is also a really good one: - Hands-on Train and Deploy ML Pau Labarta, who made that last course, actually has a series of good (free) hands-on courses on GitHub. If you're interested in getting started with LLMs (since every company in the world seems to be clamoring for them right now), this course just came out from Pau and Paul Iusztin: - Hands-on LLMs For LLMs I also like this DLAI course (that includes Prompt Engineering too): - Generative AI with LLMs It can also be helpful to start learning how to use MLOps tools and platforms. I'll suggest Comet because I work there and am most familiar with it (and also because it's a great tool). Cloud and DevOps skills are also helpful. Make sure you're comfortable with git. Make sure you're learning how to actually deploy your projects. Good luck! :)

cryptowallet_risk_scoring

2 220 4.3 Python

A free cryptowallet risk scoring tool with fully explainable scoring.
realtime-fraud-detection-with-gnn-on-dgl

7 202 4.3 TypeScript

An end-to-end blueprint architecture for real-time fraud detection(leveraging graph database Amazon Neptune) using Amazon SageMaker and Deep Graph Library (DGL) to construct a heterogeneous graph from tabular data and train a Graph Neural Network(GNN) model to detect fraudulent transactions in the IEEE-CIS dataset.

Project mention: realtime-fraud-detection-with-gnn-on-dgl: NEW Extended Research - star count:165.0 | /r/algoprojects | 2023-05-20

benford_py

1 148 0.0 Jupyter Notebook

Python implementation of Benford's Law tests.
UGFraud

7 123 1.8 Python

An Unsupervised Graph-based Toolbox for Fraud Detection
Marble

1 124 8.2 HCL

Marble - the real time decision engine for fraud and AML (by checkmarble)

Project mention: Marble – Open-Source real-time fraud and AML monitoring | news.ycombinator.com | 2024-01-31

FraudDetection

11 111 0.0 MATLAB

Accounting Fraud Detection Using Machine Learning
threatbite

1 86 0.0 Go

ThreatBite is a real-time service that detects unwanted web users.
MemStream

1 81 3.5 Python

MemStream: Memory-Based Streaming Anomaly Detection
SaaSHub

www.saashub.com sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

fraud-detection related posts

Marble – Open-Source real-time fraud and AML monitoring
1 project | news.ycombinator.com | 31 Jan 2024
fingerprintjs: NEW Extended Research - star count:20102.0
1 project | /r/algoprojects | 9 Dec 2023
fingerprintjs: NEW Extended Research - star count:20102.0
1 project | /r/algoprojects | 8 Dec 2023
fingerprintjs: NEW Extended Research - star count:20102.0
1 project | /r/algoprojects | 7 Dec 2023
fingerprintjs: NEW Extended Research - star count:20102.0
1 project | /r/algoprojects | 7 Dec 2023
fingerprintjs: NEW Extended Research - star count:20102.0
1 project | /r/algoprojects | 6 Dec 2023
fingerprintjs: NEW Extended Research - star count:20102.0
1 project | /r/algoprojects | 5 Dec 2023
A note from our sponsor - SaaSHub
www.saashub.com | 28 Apr 2024

SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source fraud-detection projects? This list will help you:

	Project	Stars
1	fingerprintjs	20,938
2	pyod	7,941
3	anomaly-detection-resources	7,871
4	MISP	4,986
5	awesome-fraud-detection-papers	1,545
6	graph-fraud-detection-papers	1,263
7	pygod	1,208
8	ThreatIngestor	786
9	DGFraud	655
10	fraud-detection-handbook	429
11	getIPIntel	306
12	TabFormer	295
13	fraud-detection-using-machine-learning	248
14	Free-RASP-Community	241
15	MLSys-NYU-2022	238
16	cryptowallet_risk_scoring	220
17	realtime-fraud-detection-with-gnn-on-dgl	202
18	benford_py	148
19	UGFraud	123
20	Marble	124
21	FraudDetection	111
22	threatbite	86
23	MemStream	81