[P] We are building a curated list of open source tooling for data-centric AI workflows, looking for contributions.

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • cleanlab

    The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

  • How about cleanlab? It works for any data you can train a classifier or get embeddings on (text, tabular, image, audio, etc). We just released some new features as well. Currently, cleanlab can automatically:

  • refinery

    The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact.

  • You definitely forgot https://www.kern.ai/ :)

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • grape

    🍇 GRAPE is a Rust/Python Graph Representation Learning library for Predictions and Evaluations (by AnacletoLAB)

  • For graph embeddings, there's quite a few. I'd recommend this one, but there's also this one (disclaimer: I'm the author) or this one, more of a DGL library.

  • nodevectors

    Fastest network node embeddings in the west

  • For graph embeddings, there's quite a few. I'd recommend this one, but there's also this one (disclaimer: I'm the author) or this one, more of a DGL library.

  • dgl

    Python package built to ease deep learning on graph, on top of existing DL frameworks.

  • For graph embeddings, there's quite a few. I'd recommend this one, but there's also this one (disclaimer: I'm the author) or this one, more of a DGL library.

  • optuna

    A hyperparameter optimization framework

  • Keras Tuner, Optuna : https://github.com/optuna/optuna ?

  • awesome-production-machine-learning

    A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning

  • There is a cool, gigantic list for MLOps that I can recommend: https://github.com/EthicalML/awesome-production-machine-learning

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • dcai-lab

    Lab assignments for Introduction to Data-Centric AI, MIT IAP 2024 👩🏽‍💻

  • Thanks for the kind words! Make sure to check out the current open MIT course if you are just starting out: https://dcai.csail.mit.edu/

  • snorkel

    A system for quickly generating training data with weak supervision

  • The paid product came out of an open source tool: https://github.com/snorkel-team/snorkel

  • deodel

    A mixed attributes predictive algorithm implemented in Python.

  • The deodel classifier can act as a quick dataset evaluation tool. If your data is available in table format, you can check its potential for prediction/classification. Just feed it to deodel. It accepts mixed attributes without any preliminary curation. It simply considers attribute values expressed as floats (dot decimal) as being continuous. It accepts even a mix of continuous and categorical values for the same attribute column.

  • BotLibre

    An open platform for artificial intelligence, chat bots, virtual agents, social media automation, and live chat automation.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • My Favorite DevTools to Build AI/ML Applications!

    9 projects | dev.to | 23 Apr 2024
  • Creating a Sales Analysis Application with Streamlit: A Practical Approach to Business Intelligence

    1 project | dev.to | 19 Apr 2024
  • 🦙 Llama-2-GGML-CSV-Chatbot 🤖

    3 projects | dev.to | 10 Apr 2024
  • Show HN: Buefy Web Components for Streamlit

    2 projects | news.ycombinator.com | 4 Mar 2024
  • Simplify Web App Development: Code Lite, Create Big!

    1 project | dev.to | 26 Feb 2024