[Discussion] - "data sourcing will be more important than model building in the era of foundational model fine-tuning"

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning

Nutrient - The #1 PDF SDK Library
Bad PDFs = bad UX. Slow load times, broken annotations, clunky UX frustrates users. Nutrient’s PDF SDKs gives seamless document experiences, fast rendering, annotations, real-time collaboration, 100+ features. Used by 10K+ devs, serving ~half a billion users worldwide. Explore the SDK for free.
nutrient.io
featured
CodeRabbit: AI Code Reviews for Developers
Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
coderabbit.ai
featured
  1. snorkel

    A system for quickly generating training data with weak supervision

  2. Nutrient

    Nutrient - The #1 PDF SDK Library. Bad PDFs = bad UX. Slow load times, broken annotations, clunky UX frustrates users. Nutrient’s PDF SDKs gives seamless document experiences, fast rendering, annotations, real-time collaboration, 100+ features. Used by 10K+ devs, serving ~half a billion users worldwide. Explore the SDK for free.

    Nutrient logo
  3. cleanlab

    The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

  4. ydata-profiling

    1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.

  5. OpenRefine

    OpenRefine is a free, open source power tool for working with messy data and improving it

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Must-Know 2025 Developer’s Roadmap and Key Programming Trends

    6 projects | dev.to | 5 Feb 2025
  • Building a Sarcasm Detection System with LSTM and GloVe: A Complete Guide

    4 projects | dev.to | 2 Jan 2025
  • Marimo raises $5M to build an open-source reactive Python notebook

    1 project | news.ycombinator.com | 19 Nov 2024
  • 20 Open Source Tools I Recommend to Build, Share, and Run AI Projects

    11 projects | dev.to | 13 Nov 2024
  • A quick comparison: Streamlit, Dash, Reflex and Rio

    4 projects | dev.to | 30 May 2024