Show HN: Autolabel, a Python library to label and enrich text data with LLMs

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

autolabel

10 1,841 9.4 Python

Label, clean and enrich text datasets with LLMs.

Yep! I totally understand the concerns around not being able to share data externally - the library currently supports open source, self-hosted LLMs through huggingface pipelines (https://github.com/refuel-ai/autolabel/blob/main/src/autolab...), and we plan to add more support here for models like llama cpp that can be run without many constrains on hardware

llama.cpp

778 57,984 10.0 C++

LLM inference in C/C++

You can self-host an open-source model. Llama CCP is a very popular project with great docs.
https://github.com/ggerganov/llama.cpp
You need to be careful about liscencing - some of these models its a legal grey area if you can use them for commercial projects.
A popular compression methodology at the moment is 'quantization', using lower precision model weights for inference to reduce memory requirements. I find it a bit hard to evaluate which open source models are best, and how they are impacted by quantisation.
You can also use the Open-AI API. They don't use the data, or store beyond 30 days, which they use for fraud-monitoring. It doesn't seem hugely different to using something like Slack/Google doc/AWS.
https://openai.com/policies/api-data-usage-policies

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

NLP Research in the Era of LLMs

3 projects | news.ycombinator.com | 21 Dec 2023
[P] Autolabel: data labeling with LLMs

1 project | /r/MachineLearning | 2 Sep 2023
Label clean and enrich text datasets with LLMs

1 project | /r/learnmachinelearning | 22 Jun 2023
Show HN: Autolabel, a Python library to label and enrich text data with LLMs

1 project | /r/patient_hackernews | 21 Jun 2023
Show HN: Autolabel, a Python library to label and enrich text data with LLMs

1 project | /r/hackernews | 21 Jun 2023

Show HN: Autolabel, a Python library to label and enrich text data with LLMs

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
llm large-language-models huggingface-transformers openai NLP
Post date: 20 Jun 2023

autolabel

llama.cpp

InfluxDB

Related posts

NLP Research in the Era of LLMs

[P] Autolabel: data labeling with LLMs

Label clean and enrich text datasets with LLMs