lingua
language-detection-cld2
Our great sponsors
lingua | language-detection-cld2 | |
---|---|---|
8 | 1 | |
657 | 13 | |
- | - | |
6.3 | 1.1 | |
10 days ago | about 1 year ago | |
Kotlin | Java | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
lingua
- Announcing Lingua 1.2.0 - The most accurate natural language detection library for the JVM, suitable for long and short text alike
-
r/argentina es el subreddit de habla hispana mas popular del sitio
select 'r/'||subreddit sub , initcap(lang) language , count(*) c , ratio_to_report(c) over(partition by sub) ratio , sum(iff(language!='English', c, 0)) over(partition by sub) total_not_english , sum(c) over(partition by sub) total from reddit_sample_languages_udtf group by 1, 2 qualify ratio > .02 order by total_not_english desc, c desc, 1, ratio desc- Jason Baumgartner for collecting and sharing Reddit’s comments. - Peter M. Stahl for the Lingua project to detect languages in Java. - Snowflake for making it easy to run Java code in a UDF.
-
The most popular languages on Reddit, after analyzing 1M comments: English, German, Spanish, Portuguese, French, Italian, Romanian, Dutch... [OC]
I don't speak most of these languages, so I wasn't able to verify -- instead I just used the results of this library: https://github.com/pemistahl/lingua
-
Hazelcast + Kibana: best buddies for exploring and visualizing data
A linguist can infer the language of the field. It's also possible to use an automated process in the pipeline. A couple of NLP libraries are available in the JVM ecosystem, but I set my eyes on Lingua, one focused on language recognition.
- Usando a Biblioteca Lingua para Kotlin
- Language Detection - Pre Trained Models
- Lingua 1.1.0 released - The most accurate natural language detection library for the JVM
-
Free and easy to use Java language detection library
I've used this one previously, and found it pretty easy to use, relatively fast, and accurate: https://github.com/pemistahl/lingua
language-detection-cld2
-
Free and easy to use Java language detection library
CommonCrawl.org has Java bindings for CLD2 here: https://github.com/commoncrawl/language-detection-cld2
What are some alternatives?
Beagle - Beagle helps you identify keywords, phrases, regexes, and complex search queries of interest in streams of text documents.
lingua-go - The most accurate natural language detection library for Go, suitable for short text and mixed-language text
kotlin-logging - Lightweight Multiplatform logging framework for Kotlin. A convenient and performant logging facade.
mlkit - A collection of sample apps to demonstrate how to use Google's ML Kit APIs on Android and iOS
cld3-kotlin - Bindings to Google's Compact Language Detector 3 to JVM Based Languages
hms-ml-demo - HMS ML Demo provides an example of integrating Huawei ML Kit service into applications. This example demonstrates how to integrate services provided by ML Kit, such as face detection, text recognition, image segmentation, asr, and tts.
kovenant - Kovenant. Promises for Kotlin.
simplenlg - Java API for Natural Language Generation. Originally developed by Ehud Reiter at the University of Aberdeen’s Department of Computing Science and co-founder of Arria NLG. This git repo is the official SimpleNLG version.
KtUnits - Simple unit conversion library for Kotlin
efficient-language-detector - Fast and accurate natural language detection. Detector written in PHP. Nito-ELD, ELD.
CakeParse - Simple parser combinator library for Kotlin
cld3