Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Examples Alternatives
Similar projects and alternatives to examples
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
FLiPStackWeekly
FLaNK AI Weekly covering Apache NiFi, Apache Flink, Apache Kafka, Apache Spark, Apache Iceberg, Apache Ozone, Apache Pulsar, and more...
-
cleanlab
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
-
token-label-error-benchmarks
Benchmarking methods for label error detection in token classification tasks
-
multiannotator-benchmarks
Benchmarking algorithms for assessing quality of data labeled by multiple annotators
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
awesome-artificial-intelligence
A curated list of Artificial Intelligence (AI) courses, books, video lectures and papers.
examples reviews and mentions
- FLaNK AI - 15 April 2024
-
[R] Detecting Dataset Drift and Non-IID Sampling: A k-Nearest Neighbors approach that works for Image/Text/Audio/Numeric Data
I just published a paper detailing this non-IID check and open-sourced its code in the cleanlab package — just one line of code will check for this and many other types of issues in your dataset.
-
Datalab: A Linter for ML Datasets
I recently published a blog introducing Datalab and an open-source Python implementation that is easy-to-use for all data types (image, text, tabular, audio, etc). For data scientists, I’ve made a quick Jupyter tutorial to run Datalab on your own data.
-
Finetuning Large Language Models -- An introduction to the core ideas and approaches
Cool read! I just finished up a notebook where I show how noisy labels can drastically impact the performance of Open AI LLMs. I first fine-tune the well-known Davinci model (the backbone of ChatGPT) on the original data and report an accuracy of 63%. I then use the open-source package cleanlab to find examples that are incorrectly labeled and drop them from the training data. This step increases the fine-tuning accuracy to 66% (better accuracy with less data). Finally, I correct the mislabeled examples and fine-tuning accuracy jumps to 77%!
-
What are some active research areas in Machine Learning Systems?
The entire field of data-centric AI is an active field that is pretty new --- it focuses on the data side of ML as opposed to just model optimization. Our company is building an open-source package cleanlab that is becoming the DCAI standard.
-
[Research] ActiveLab: Active Learning with Data Re-Labeling
I recently published a paper introducing this novel method and an open-source Python implementation that is easy-to-use for all data types (image, text, tabular, audio, etc). For data scientists, I’ve made a quick Jupyter tutorial to run ActiveLab on your own data. For ML researchers, I’ve made all of our benchmarking code available for reproducibility so you can see for yourself how effective ActiveLab is in practice.
-
cleanlab open-source --- expanded support for Active Learning and other data-centric AI tasks
suggest which data is most informative to (re)label next (active learning) (link)
- Strategies for selecting what data to annotate?
- [D] Can someone point to research on determining usefulness of samples/datasets for training ML models?
-
cleanlab: an open-source python framework for data-centric AI
In one-line of python, cleanlab can automatically: 1) find mislabeled data + train robust models 2) detect outliers 3) estimate consensus + annotator-quality for datasets labeled by multiple annotators 4) suggest which data is best to label or re-label next (active learning)
-
A note from our sponsor - InfluxDB
www.influxdata.com | 28 Apr 2024
Stats
cleanlab/examples is an open source project licensed under GNU Affero General Public License v3.0 which is an OSI approved license.
The primary programming language of examples is Jupyter Notebook.
Sponsored