data-centric

Open-source projects categorized as data-centric

Top 7 data-centric Open-Source Projects

  • ludwig

    Low-code framework for building custom LLMs, neural networks, and other AI models

  • Project mention: Show HN: Toolkit for LLM Fine-Tuning, Ablating and Testing | news.ycombinator.com | 2024-04-07

    This is a great project, little bit similar to https://github.com/ludwig-ai/ludwig, but it includes testing capabilities and ablation.

    questions regarding the LLM testing aspect: How extensive is the test coverage for LLM use cases, and what is the current state of this project area? Do you offer any guarantees, or is it considered an open-ended problem?

    Would love to see more progress toward this area!

  • lance

    Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, with more integrations coming..

  • Project mention: The Nimble File Format by Meta | news.ycombinator.com | 2024-04-25
  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • data-centric-AI

    A curated, but incomplete, list of data-centric AI resources. (by daochenzha)

  • Encord Active

    Open source active learning toolkit to find failure modes in your computer vision models, prioritize data to label next, and drive data curation to improve model performance.

  • Project mention: Launch HN: Encord (YC W21) – Unit testing for computer vision models | news.ycombinator.com | 2024-01-31

    We base our pricing on your user and consumption scale and would be happy to discuss this with you directly. Please feel free to explore the OS version of Active at https://github.com/encord-team/encord-active. Note that some features, such as natural language search using GPU accelerated APIs, are not included in the cloud version.

  • DataCLUE

    DataCLUE: 数据为中心的NLP基准和工具包

  • vuejs-form

    Vue Form with Laravel Inspired Validation and Simply Enjoyable Error Messages Api. (Form Api, Validator Api, Rules Api, Error Messages Api)

  • pypely

    From local functions to cloud deployed pipelines

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

data-centric related posts

  • Python Open Closed Design Pattern (Python SOLID Principles)

    1 project | dev.to | 5 Feb 2023
  • The Proficient, Efficient, and Consistent Solution (Skillful, Minimal, & Disciplined Practitioners)

    1 project | dev.to | 23 Jul 2022
  • [R] DataCLUE: A Benchmark Suite for Data-centric NLP

    2 projects | /r/MachineLearning | 17 Nov 2021

Index

What are some of the best open-source data-centric projects? This list will help you:

Project Stars
1 ludwig 10,845
2 lance 3,296
3 data-centric-AI 990
4 Encord Active 420
5 DataCLUE 145
6 vuejs-form 41
7 pypely 16

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com