[P] We are building a curated list of open source tooling for data-centric AI workflows, looking for contributions.

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning

CodeRabbit: AI Code Reviews for Developers
Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
coderabbit.ai
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  1. cleanlab

    The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

    How about cleanlab? It works for any data you can train a classifier or get embeddings on (text, tabular, image, audio, etc). We just released some new features as well. Currently, cleanlab can automatically:

  2. CodeRabbit

    CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.

    CodeRabbit logo
  3. refinery

    The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact.

    You definitely forgot https://www.kern.ai/ :)

  4. grape

    🍇 GRAPE is a Rust/Python Graph Representation Learning library for Predictions and Evaluations (by AnacletoLAB)

    For graph embeddings, there's quite a few. I'd recommend this one, but there's also this one (disclaimer: I'm the author) or this one, more of a DGL library.

  5. nodevectors

    Fastest network node embeddings in the west

    For graph embeddings, there's quite a few. I'd recommend this one, but there's also this one (disclaimer: I'm the author) or this one, more of a DGL library.

  6. dgl

    Python package built to ease deep learning on graph, on top of existing DL frameworks.

    For graph embeddings, there's quite a few. I'd recommend this one, but there's also this one (disclaimer: I'm the author) or this one, more of a DGL library.

  7. optuna

    A hyperparameter optimization framework

    Keras Tuner, Optuna : https://github.com/optuna/optuna ?

  8. awesome-production-machine-learning

    A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning

    There is a cool, gigantic list for MLOps that I can recommend: https://github.com/EthicalML/awesome-production-machine-learning

  9. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  10. dcai-lab

    Lab assignments for Introduction to Data-Centric AI, MIT IAP 2024 👩🏽‍💻

    Thanks for the kind words! Make sure to check out the current open MIT course if you are just starting out: https://dcai.csail.mit.edu/

  11. snorkel

    A system for quickly generating training data with weak supervision

    The paid product came out of an open source tool: https://github.com/snorkel-team/snorkel

  12. deodel

    A mixed attributes predictive algorithm implemented in Python.

    The deodel classifier can act as a quick dataset evaluation tool. If your data is available in table format, you can check its potential for prediction/classification. Just feed it to deodel. It accepts mixed attributes without any preliminary curation. It simply considers attribute values expressed as floats (dot decimal) as being continuous. It accepts even a mix of continuous and categorical values for the same attribute column.

  13. BotLibre

    An open platform for artificial intelligence, chat bots, virtual agents, social media automation, and live chat automation.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Building a Sarcasm Detection System with LSTM and GloVe: A Complete Guide

    4 projects | dev.to | 2 Jan 2025
  • AI companies cause most of traffic on forums

    3 projects | news.ycombinator.com | 30 Dec 2024
  • [Python Tips] Streamlit: A Rapid Prototyping Tool for Python

    2 projects | dev.to | 23 Dec 2024
  • How to Scrape and Analyse Data for Free using AI: From Collection to Insight

    2 projects | dev.to | 15 Dec 2024
  • Building a Voice Transcription and Translation App with OpenAI Whisper and Streamlit

    2 projects | dev.to | 29 Nov 2024

Did you know that Python is
the 2nd most popular programming language
based on number of references?