efficient-language-detector-js
lingua
efficient-language-detector-js | lingua | |
---|---|---|
1 | 9 | |
30 | 688 | |
- | - | |
5.9 | 6.0 | |
11 months ago | 5 months ago | |
JavaScript | Kotlin | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
efficient-language-detector-js
lingua
-
Comparing Language Detection Libraries (& API) Using Java/ColdFusion/CFML
I evaluated the Lingua java library. It claims to be the "The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike" and also appears to be actively updated & supported. In my small unit test, Lingua seemed to be slightly slower and couldn't correctly identify Malay text.
- Announcing Lingua 1.2.0 - The most accurate natural language detection library for the JVM, suitable for long and short text alike
-
r/argentina es el subreddit de habla hispana mas popular del sitio
select 'r/'||subreddit sub , initcap(lang) language , count(*) c , ratio_to_report(c) over(partition by sub) ratio , sum(iff(language!='English', c, 0)) over(partition by sub) total_not_english , sum(c) over(partition by sub) total from reddit_sample_languages_udtf group by 1, 2 qualify ratio > .02 order by total_not_english desc, c desc, 1, ratio desc- Jason Baumgartner for collecting and sharing Reddit’s comments. - Peter M. Stahl for the Lingua project to detect languages in Java. - Snowflake for making it easy to run Java code in a UDF.
-
The most popular languages on Reddit, after analyzing 1M comments: English, German, Spanish, Portuguese, French, Italian, Romanian, Dutch... [OC]
I don't speak most of these languages, so I wasn't able to verify -- instead I just used the results of this library: https://github.com/pemistahl/lingua
-
Hazelcast + Kibana: best buddies for exploring and visualizing data
A linguist can infer the language of the field. It's also possible to use an automated process in the pipeline. A couple of NLP libraries are available in the JVM ecosystem, but I set my eyes on Lingua, one focused on language recognition.
- Usando a Biblioteca Lingua para Kotlin
- Language Detection - Pre Trained Models
- Lingua 1.1.0 released - The most accurate natural language detection library for the JVM
-
Free and easy to use Java language detection library
I've used this one previously, and found it pretty easy to use, relatively fast, and accurate: https://github.com/pemistahl/lingua
What are some alternatives?
chatgpt-sentiment-analysis-in-excel - Discover NLP features powered by ChatGPT you can use in Excel
language-detection-cld2 - Natural language detection, Java bindings for CLD2
efficient-language-detector - Fast and accurate natural language detection. Detector written in PHP. Nito-ELD, ELD.
Beagle - Beagle helps you identify keywords, phrases, regexes, and complex search queries of interest in streams of text documents.
detectlanguage-node - Detect Language API Node.js Client
kotlin-logging - Lightweight Multiplatform logging framework for Kotlin. A convenient and performant logging facade.
tinyld - Simple and Performant Language detection library for NodeJS
cld3-kotlin - Bindings to Google's Compact Language Detector 3 to JVM Based Languages
n2words - Convert numerical numbers to written numbers, in 25+ languages.
kovenant - Kovenant. Promises for Kotlin.
retext - natural language processor powered by plugins part of the @unifiedjs collective
KtUnits - Simple unit conversion library for Kotlin