Show HN: Autolabel, a Python library to label and enrich text data with LLMs

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • autolabel

    Label, clean and enrich text datasets with LLMs.

  • Yep! I totally understand the concerns around not being able to share data externally - the library currently supports open source, self-hosted LLMs through huggingface pipelines (https://github.com/refuel-ai/autolabel/blob/main/src/autolab...), and we plan to add more support here for models like llama cpp that can be run without many constrains on hardware

  • llama.cpp

    LLM inference in C/C++

  • You can self-host an open-source model. Llama CCP is a very popular project with great docs.

    https://github.com/ggerganov/llama.cpp

    You need to be careful about liscencing - some of these models its a legal grey area if you can use them for commercial projects.

    A popular compression methodology at the moment is 'quantization', using lower precision model weights for inference to reduce memory requirements. I find it a bit hard to evaluate which open source models are best, and how they are impacted by quantisation.

    You can also use the Open-AI API. They don't use the data, or store beyond 30 days, which they use for fraud-monitoring. It doesn't seem hugely different to using something like Slack/Google doc/AWS.

    https://openai.com/policies/api-data-usage-policies

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • NLP Research in the Era of LLMs

    3 projects | news.ycombinator.com | 21 Dec 2023
  • [P] Autolabel: data labeling with LLMs

    1 project | /r/MachineLearning | 2 Sep 2023
  • Label clean and enrich text datasets with LLMs

    1 project | /r/learnmachinelearning | 22 Jun 2023
  • Show HN: Autolabel, a Python library to label and enrich text data with LLMs

    1 project | /r/patient_hackernews | 21 Jun 2023
  • Show HN: Autolabel, a Python library to label and enrich text data with LLMs

    1 project | /r/hackernews | 21 Jun 2023