HanLP vs spacy-llm

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

HanLP		spacy-llm
	Project
3	Mentions	4
32,388	Stars	945
-	Growth	5.6%
5.1	Activity	8.8
16 days ago	Latest Commit	3 days ago
Python	Language	Python
Apache License 2.0	License	MIT License

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

HanLP

Posts with mentions or reviews of HanLP. We have used some of these posts to build our list of alternatives and similar projects.

Hanlp - Natural language processing for the next decade
1 project | /r/github_trends | 28 May 2022
HanLP: Han Language Processing
1 project | /r/u_waynerad | 24 May 2022
Hanlp – Natural language processing for the next decade
1 project | news.ycombinator.com | 22 May 2022

spacy-llm

Posts with mentions or reviews of spacy-llm. We have used some of these posts to build our list of alternatives and similar projects.

Integrating LLMs into structured NLP pipelines
1 project | news.ycombinator.com | 10 Sep 2023
Advanced NLP with SpaCy
1 project | news.ycombinator.com | 9 Sep 2023

(Original author of spaCy here)
Okay so, first some terminology. LLMs can mean a bunch of different things, people call models the size of BERT LLMs sometimes. So let's talk specifically about in-context learning (ICL) with either zero or a few examples. So we'll say LLM ICL, and contrast that with techniques where you annotate enough data to train with, which might only be something like 10-40 hours of annotation. The something you do with that data is probably training a task-specific classification model initialised with weights from a language modelling objective. This is sometimes called "fine-tuning", but fine-tuning can also mean taking an LLM and adapting its ICL. So we'll just call it "training", and the fact you use transfer learning like BERT or even word vectors is just tactics.
Okay. So, here's something that might surprise you: ICL actually sucks at most predictive tasks currently. Let's take NER. Performance on NER on some datasets is _below 2003_. Here's some recent discussion: https://twitter.com/mayhewsw/status/1700139745769046409
The discussion focusses on how bad the CoNLL 2003 dataset is, and indeed it's a crap dataset. But experiments have also been done on other datasets, e.g. check out the comparison of ICL and training in this paper from Microsoft: https://universal-ner.github.io/ . When GPT4 is used this one paper reports it slightly better on some tasks: https://arxiv.org/abs/2308.10092 . Frustratingly they don't do enough GPT4 experiments. This other paper also does a huge number of experiments, but not with GPT4: https://arxiv.org/abs/2308.10092
The findings across the literature are really clear. ICL is generally much worse than training a model in accuracy, and you generally don't need much training data to get there.
For tasks like text classification, ICL sometimes does okay. But you need to pick the problem characteristics carefully. Most text classification tasks people actually want to do have something like 20 labels, the texts are kind of long, and the labels don't capture the category especially well. Applying ICL to such tasks is very non-trivial. Your prompt balloons up if you have lots of classes to predict between, and providing the examples is hard if your texts are even a few hundred words.
Let's say you want to do something ultra simple: classify articles into categories for some news site or blog. This is the type of problem text classifiers have been eating for breakfast for 20 years. This is not a difficult problem -- a unigram bag of words does fine, and the method of estimating the weights can be almost anything, like just averaged perceptron will be totally okay.
But how will an LLM be able to do this? Probably your topic categories include several different types of article under it. If you know what those types of article are you can separate them out and make sure they're represented in the prompt. But now we're back at needing a tonne of domain knowledge about your problem -- that's like having to write custom features to make your model work. We all switched to deep learning so we wouldn't have to do that.
LLMs build a much more sophisticated representation of the meaning of the data to classify. But then you give them very few examples of the problem. So they can only build a shallow function from this deep representation. If you have lots of examples, you can learn a complex function from shallower features. For a great many classification tasks, this is better. The rest of your system usually needs the classification module to have some sort of consistent behaviours anyway. To do that, you basically have to make an annotation manual, and then you want to annotate evaluation documents. Once you're there the effort to make training data and train a model is minimal.
The other elephant in the room is the expense of the LLM solutions. The papers are missing results on GPT4 not because they're lazy, but because it's so expensive to use GPT4 as a classification solution that they want to get the rest of their results out the door.
The world cannot migrate all its current NLP models for text classification and NER to ICL. There are nowhere near enough GPUs in the world for that to happen. And I don't know about you, but I expect the number of text classification and NER models to grow, not shrink. So, the idea that we'll stop training small models for these tasks is just implausible. The OpenAI models that support batching are almost viable for prediction, but models like GPT4 don't support it (perhaps due to the mixture of experts?), so it's super slow.
The other thing is, many aspects of language that are useful as annotations are consistent linguistic features. The English language codes for proper names and numeric entities. They behave differently in the grammar. So some sort of named entity annotation can be done once, and then the model trained and reused. This is what spaCy does. We do this for a variety of other useful annotations across languages. We actually need to do much more: we need to collect new annotations for these models to keep them up to date, and we need to do this for more tasks, such as semantic role labelling. But it's definitely a good way to reuse work. We can do this annotation once, train the models, and users can reuse the models.
The strength of ICL is that you can get started very easily, without doing the work of annotation and training. There's lots of research on making ICL few-shot learning less bad on arbitrary text classification, NER and other tasks. We're working hard to take these results from the literature and build best-practice prompts and parsers you can use as a drop-in annotation module in spaCy: https://github.com/explosion/spacy-llm . Our annotation tool Prodigy also supports initializing the annotations from an LLM, and just correcting the output: https://prodigy.ai . The idea is to let you start with an LLM, and then transition to a model you train yourself, which can be run much faster.
Spacy-LLM: Integrating LLMs into structured NLP pipelines
1 project | news.ycombinator.com | 27 Jul 2023

1 project | news.ycombinator.com | 16 May 2023

What are some alternatives?

When comparing HanLP and spacy-llm you can also consider the following projects:

spaCy - 💫 Industrial-strength Natural Language Processing (NLP) in Python

chatgpt-comparison-detection - Human ChatGPT Comparison Corpus (HC3), Detectors, and more! 🔥

healthsea - Healthsea is a spaCy pipeline for analyzing user reviews of supplementary products for their effects on health.

Qwen - The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

embedders - With embedders, you can easily convert your texts into sentence- or token-level embeddings within a few lines of code. Use cases for this include similarity search between texts, information extraction such as named entity recognition, or basic text classification.

banks - LLM prompt language based on Jinja

huspacy - HuSpaCy: industrial-strength Hungarian natural language processing

ChainFury - 🦋 Production grade chaining engine behind TuneChat. Self host today!

Stanza - Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languages

prompttools - Open-source tools for prompt testing and experimentation, with support for both LLMs (e.g. OpenAI, LLaMA) and vector databases (e.g. Chroma, Weaviate, LanceDB).

flair - A very simple framework for state-of-the-art Natural Language Processing (NLP)

gmail-assist - Get control of your overflowing inbox using GPT-3 to classify your emails by importance.

HanLP vs spaCy spacy-llm vs chatgpt-comparison-detection HanLP vs healthsea spacy-llm vs Qwen HanLP vs embedders spacy-llm vs banks HanLP vs huspacy spacy-llm vs ChainFury HanLP vs Stanza spacy-llm vs prompttools HanLP vs flair spacy-llm vs gmail-assist

Compare HanLP vs spacy-llm and see what are their differences.

HanLP

spacy-llm

HanLP

spacy-llm

What are some alternatives?