Python outlier-detection

Open-source Python projects categorized as outlier-detection

Top 13 Python outlier-detection Projects

  • cleanlab

    The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

  • Project mention: [Research] Detecting Annotation Errors in Semantic Segmentation Data | /r/MachineLearning | 2023-11-05

    We have feely open-sourced our new method for improving segmentation data, published a paper on the research behind it, and released a 5-min code tutorial. You can also read more in the blog if you'd like.

  • pyod

    A Comprehensive and Scalable Python Library for Outlier Detection (Anomaly Detection)

  • Project mention: A Comprehensive Guide for Building Rag-Based LLM Applications | news.ycombinator.com | 2023-09-13

    This is a feature in many commercial products already, as well as open source libraries like PyOD. https://github.com/yzhao062/pyod

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • anomaly-detection-resources

    Anomaly detection related books, papers, videos, and toolboxes

  • Project mention: anomaly-detection-resources: NEW Extended Research - star count:7507.0 | /r/algoprojects | 2023-10-24
  • fastdup

    fastdup is a powerful free tool designed to rapidly extract valuable insights from your image & video datasets. Assisting you to increase your dataset images & labels quality and reduce your data operations costs at an unparalleled scale.

  • Project mention: Visualize your dataset using DINOv2 embedding | news.ycombinator.com | 2023-05-02

    Visualizing your dataset (especially large ones) in a low-dimensional embedding space can tell you a lot about the patterns and clusters in your dataset.

    We recently release a notebook showing how you can visualize your dataset using DINOv2 models by running it on your CPU.

    Yes! No GPUs needed.

    We used it to find clusters of similar images, duplicates, and outliers in a subset of the LAION dataset

    Try it on your own dataset:

    Colab notebook - https://colab.research.google.com/github/visual-layer/fastdup/blob/main/examples/dinov2_notebook.ipynb

    GitHub repo - https://github.com/visual-layer/fastdup

  • minisom

    :red_circle: MiniSom is a minimalistic implementation of the Self Organizing Maps

  • tods

    TODS: An Automated Time-series Outlier Detection System

  • pygod

    A Python Library for Graph Outlier Detection (Anomaly Detection)

  • Project mention: RAG Using Structured Data: Overview and Important Questions | news.ycombinator.com | 2024-01-10

    Ok, using ChatGPT and Bard (the irony lol) I learned a bit more about GNNs:

    GNNs are probabilistic and can be trained to learn representations in graph-structured data and handling complex relationships, while classical graph algorithms are specialized for specific graph analysis tasks and operate based on predefined rules/steps.

    * Why is PyG it called "Geometric" and not "Topologic" ?

    Properties like connectivity, neighborhoods, and even geodesic distances can all be considered topological features of a graph. These features remain unchanged under continuous deformations like stretching or bending, which is the defining characteristic of topological equivalence. In this sense, "PyTorch Topologic" might be a more accurate reflection of the library's focus on analyzing the intrinsic structure and connections within graphs.

    However, the term "geometric" still has some merit in the context of PyG. While most GNN operations rely on topological principles, some do incorporate notions of Euclidean geometry, such as:

    - Node embeddings: Many GNNs learn low-dimensional vectors for each node, which can be interpreted as points in a vector space, allowing geometric operations like distances and angles to be applied.

    - Spectral GNNs: These models leverage the eigenvalues and eigenvectors of the graph Laplacian, which encodes information about the geometric structure and distances between nodes.

    - Manifold learning: Certain types of graphs can be seen as low-dimensional representations of high-dimensional manifolds. Applying GNNs in this context involves learning geometric properties on the manifold itself.

    Therefore, although topology plays a primary role in understanding and analyzing graphs, geometry can still be relevant in certain contexts and GNN operations.

    * Real world applications:

    - HuggingFace has a few models [0] around things like computational chemistry [1] or weather forecasting.

    - PyGod [2] can be used for Outlier Detection (Anomaly Detection).

    - Apparently ULTRA [3] can "infer" (in the knowledge graph sense), that Michael Jackson released some disco music :-p (see the paper).

    - RGCN [4] can be used for knowledge graph link prediction (recovery of missing facts, i.e. subject-predicate-object triples) and entity classification (recovery of missing entity attributes).

    - GreatX [5] tackles removing inherent noise, "Distribution Shift" and "Adversarial Attacks" (ex: noise purposely introduced to hide a node presence) from networks. Apparently this is a thing and the field is called "Graph Reliability" or "Reliable Deep Graph Learning". The author even has a bunch of "awesome" style lists of links! [6]

    - Finally this repo has a nice explanation of how/why to run machine learning algorithms "outside of the DB":

    "Pytorch Geometric (PyG) has a whole arsenal of neural network layers and techniques to approach machine learning on graphs (aka graph representation learning, graph machine learning, deep graph learning) and has been used in this repo [7] to learn link patterns, also known as link or edge predictions."

    --

    0: https://huggingface.co/models?pipeline_tag=graph-ml&sort=tre...

    1: https://github.com/Microsoft/Graphormer

    2: https://github.com/pygod-team/pygod

    3: https://github.com/DeepGraphLearning/ULTRA

    4: https://huggingface.co/riship-nv/RGCN

    5: https://github.com/EdisonLeeeee/GreatX

    6: https://edisonleeeee.github.io/projects.html

    7: https://github.com/Orbifold/pyg-link-prediction

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • ADBench

    Official Implement of "ADBench: Anomaly Detection Benchmark", NeurIPS 2022.

  • luminaire

    Luminaire is a python package that provides ML driven solutions for monitoring time series data.

  • OpenOOD

    Benchmarking Generalized Out-of-Distribution Detection

  • Project mention: [Online Leaderboard | Easy Evaluation] OpenOOD v1.5: Enhanced Benchmark for Out-of-Distribution Detection | /r/DeepLearningPapers | 2023-06-28

    Open-sourced implementations of 40+ advanced methods (see our repo);

  • DGFraud

    A Deep Graph-based Toolbox for Fraud Detection

  • UGFraud

    An Unsupervised Graph-based Toolbox for Fraud Detection

  • deep-iforest

    offical implementation of TKDE paper "Deep isolation forest for anomaly detection"

  • Project mention: Commercial Opensource LLM | /r/LocalLLaMA | 2023-06-01

    For use-case 1, you really should use a statistical solution like Isolation Forest. It's going to be more reliable and less computationally expensive than an LLM.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-01-10.

Python outlier-detection related posts

Index

What are some of the best open-source outlier-detection projects in Python? This list will help you:

Project Stars
1 cleanlab 8,592
2 pyod 7,928
3 anomaly-detection-resources 7,858
4 fastdup 1,398
5 minisom 1,379
6 tods 1,289
7 pygod 1,207
8 ADBench 770
9 luminaire 750
10 OpenOOD 741
11 DGFraud 655
12 UGFraud 123
13 deep-iforest 69
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com