language-detection-cld2
lingua
language-detection-cld2 | lingua | |
---|---|---|
1 | 9 | |
14 | 688 | |
- | - | |
1.1 | 6.0 | |
over 1 year ago | 5 months ago | |
Java | Kotlin | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
language-detection-cld2
-
Free and easy to use Java language detection library
CommonCrawl.org has Java bindings for CLD2 here: https://github.com/commoncrawl/language-detection-cld2
lingua
-
Comparing Language Detection Libraries (& API) Using Java/ColdFusion/CFML
I evaluated the Lingua java library. It claims to be the "The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike" and also appears to be actively updated & supported. In my small unit test, Lingua seemed to be slightly slower and couldn't correctly identify Malay text.
- Announcing Lingua 1.2.0 - The most accurate natural language detection library for the JVM, suitable for long and short text alike
-
r/argentina es el subreddit de habla hispana mas popular del sitio
select 'r/'||subreddit sub , initcap(lang) language , count(*) c , ratio_to_report(c) over(partition by sub) ratio , sum(iff(language!='English', c, 0)) over(partition by sub) total_not_english , sum(c) over(partition by sub) total from reddit_sample_languages_udtf group by 1, 2 qualify ratio > .02 order by total_not_english desc, c desc, 1, ratio desc- Jason Baumgartner for collecting and sharing Reddit’s comments. - Peter M. Stahl for the Lingua project to detect languages in Java. - Snowflake for making it easy to run Java code in a UDF.
-
The most popular languages on Reddit, after analyzing 1M comments: English, German, Spanish, Portuguese, French, Italian, Romanian, Dutch... [OC]
I don't speak most of these languages, so I wasn't able to verify -- instead I just used the results of this library: https://github.com/pemistahl/lingua
-
Hazelcast + Kibana: best buddies for exploring and visualizing data
A linguist can infer the language of the field. It's also possible to use an automated process in the pipeline. A couple of NLP libraries are available in the JVM ecosystem, but I set my eyes on Lingua, one focused on language recognition.
- Usando a Biblioteca Lingua para Kotlin
- Language Detection - Pre Trained Models
- Lingua 1.1.0 released - The most accurate natural language detection library for the JVM
-
Free and easy to use Java language detection library
I've used this one previously, and found it pretty easy to use, relatively fast, and accurate: https://github.com/pemistahl/lingua
What are some alternatives?
lingua-go - The most accurate natural language detection library for Go, suitable for short text and mixed-language text
Beagle - Beagle helps you identify keywords, phrases, regexes, and complex search queries of interest in streams of text documents.
mlkit - A collection of sample apps to demonstrate how to use Google's ML Kit APIs on Android and iOS
kotlin-logging - Lightweight Multiplatform logging framework for Kotlin. A convenient and performant logging facade.
hms-ml-demo - HMS ML Demo provides an example of integrating Huawei ML Kit service into applications. This example demonstrates how to integrate services provided by ML Kit, such as face detection, text recognition, image segmentation, asr, and tts.
cld3-kotlin - Bindings to Google's Compact Language Detector 3 to JVM Based Languages
simplenlg - Java API for Natural Language Generation. Originally developed by Ehud Reiter at the University of Aberdeen’s Department of Computing Science and co-founder of Arria NLG. This git repo is the official SimpleNLG version.
kovenant - Kovenant. Promises for Kotlin.
efficient-language-detector - Fast and accurate natural language detection. Detector written in PHP. Nito-ELD, ELD.
KtUnits - Simple unit conversion library for Kotlin
kotlin-futures - A collections of extension functions to make the JVM Future, CompletableFuture, ListenableFuture API more functional and Kotlin like.
khronos - An intuitive Date extensions in Kotlin.