langid.py | cld3 | |
---|---|---|
2 | 6 | |
2,242 | 749 | |
- | 0.9% | |
0.0 | 0.0 | |
over 4 years ago | 12 months ago | |
Python | C++ | |
BSD 3-clause "New" or "Revised" License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
langid.py
-
Curator v0.1.0: Auto-organize large movie collections (AI language detection+sync)
Right now it's in early stages: It can detect languages from audio and subtitles (Whisper+LangID) with good results so far tried with 52 movies here (failed with just 1 which was silent). I'm currently working on synchronization: Hopefully subtitle timestamps and audio sound effects can suffice for cross-correlation. After that, I'll work on the TUI (maybe add a proper GUI too) to improve UX.
-
Announcing Lingua 1.0.0: The most accurate natural language detection library for Python, suitable for long and short text alike
Python is widely used in natural language processing, so there are a couple of comprehensive open source libraries for this task, such as Google's CLD 2 and CLD 3, langid and langdetect. Unfortunately, except for the last one they have two major drawbacks:
cld3
-
cld3: Rust binding for Compact Language Detector v3 (CLD3), a neural network model for language identification.
the C++ code is from https://github.com/google/cld3
- Lingua-Go, the most accurate language detection for Go
-
Announcing Lingua 1.0.0: The most accurate natural language detection library for Python, suitable for long and short text alike
Python is widely used in natural language processing, so there are a couple of comprehensive open source libraries for this task, such as Google's CLD 2 and CLD 3, langid and langdetect. Unfortunately, except for the last one they have two major drawbacks:
-
Best C# library to detect the language of user input strings without calling external APIs like Google Translate etc?
I was looking for something like that for .net app and ended up using this https://github.com/google/cld3
-
Using char n grams as input to a neural network
I used it a lot in the models I work on. For example, we used it on a widely used project called CLD3 (https://github.com/google/cld3). The performance is great.
- Language Detection - Pre Trained Models
What are some alternatives?
polyglot - Multilingual text (NLP) processing toolkit
ntextcat
TextBlob - Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.
lingua - The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike
py3langid - Faster, modernized fork of the language identification tool langid.py
langdetect - Port of Google's language-detection library to Python.
spaCy - 💫 Industrial-strength Natural Language Processing (NLP) in Python
lingua-py - The most accurate natural language detection library for Python, suitable for short text and mixed-language text
NLTK - NLTK Source
cld2 - Compact Language Detector 2
stanfordnlp - [Deprecated] This library has been renamed to "Stanza". Latest development at: https://github.com/stanfordnlp/stanza
lingua-go - The most accurate natural language detection library for Go, suitable for short text and mixed-language text