polyglot
Multilingual text (NLP) processing toolkit (by aboSamoor)
langid.py
Stand-alone language identification system (by saffsd)
Our great sponsors
polyglot | langid.py | |
---|---|---|
1 | 2 | |
2,261 | 2,242 | |
- | - | |
0.0 | 0.0 | |
6 months ago | over 4 years ago | |
Python | Python | |
GNU General Public License v3.0 or later | BSD 3-clause "New" or "Revised" License |
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
polyglot
Posts with mentions or reviews of polyglot.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2021-09-25.
langid.py
Posts with mentions or reviews of langid.py.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2023-01-22.
-
Curator v0.1.0: Auto-organize large movie collections (AI language detection+sync)
Right now it's in early stages: It can detect languages from audio and subtitles (Whisper+LangID) with good results so far tried with 52 movies here (failed with just 1 which was silent). I'm currently working on synchronization: Hopefully subtitle timestamps and audio sound effects can suffice for cross-correlation. After that, I'll work on the TUI (maybe add a proper GUI too) to improve UX.
-
Announcing Lingua 1.0.0: The most accurate natural language detection library for Python, suitable for long and short text alike
Python is widely used in natural language processing, so there are a couple of comprehensive open source libraries for this task, such as Google's CLD 2 and CLD 3, langid and langdetect. Unfortunately, except for the last one they have two major drawbacks:
What are some alternatives?
When comparing polyglot and langid.py you can also consider the following projects:
spaCy - 💫 Industrial-strength Natural Language Processing (NLP) in Python
TextBlob - Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.
NLTK - NLTK Source
py3langid - Faster, modernized fork of the language identification tool langid.py
Stanza - Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languages
Jieba - 结巴中文分词
stanfordnlp - [Deprecated] This library has been renamed to "Stanza". Latest development at: https://github.com/stanfordnlp/stanza
Pattern - Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.