Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Spacy [0] is a state-of-art / easy-to-use NLP library from the pre-LLM era. This post is the Spacy founder's thoughts on how to integrate LLMs with the kind of problems that "traditional" NLP is used for right now. It's an advertisement for Prodigy [1], their paid tool for using LLMs to assist data labeling. That said, I think I largely agree with the premise, and it's worth reading the entire post.
The steps described in "LLM pragmatism" are basically what I see my data science friends doing — it's hard to justify the cost (money and latency) in using LLMs directly for all tasks, and even if you want to you'll need a baseline model to compare against, so why not use LLMs for dataset creation or augmentation in order to train a classic supervised model?
[0] https://spacy.io/
[1] https://prodi.gy/