[Discussion] - "data sourcing will be more important than model building in the era of foundational model fine-tuning"

This page summarizes the projects mentioned and recommended in the original post on reddit.com/r/MachineLearning

Our great sponsors
  • InfluxDB - Access the most powerful time series database as a service
  • Sonar - Write Clean Python Code. Always.
  • SaaSHub - Software Alternatives and Reviews
  • snorkel

    A system for quickly generating training data with weak supervision

  • cleanlab

    The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

  • InfluxDB

    Access the most powerful time series database as a service. Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression.

  • ydata-profiling

    Create HTML profiling reports from pandas DataFrame objects

  • OpenRefine

    OpenRefine is a free, open source power tool for working with messy data and improving it

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts