Our great sponsors
-
lingua
The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
select 'r/'||subreddit sub , initcap(lang) language , count(*) c , ratio_to_report(c) over(partition by sub) ratio , sum(iff(language!='English', c, 0)) over(partition by sub) total_not_english , sum(c) over(partition by sub) total from reddit_sample_languages_udtf group by 1, 2 qualify ratio > .02 order by total_not_english desc, c desc, 1, ratio desc- Jason Baumgartner for collecting and sharing Reddit’s comments. - Peter M. Stahl for the Lingua project to detect languages in Java. - Snowflake for making it easy to run Java code in a UDF.
Related posts
- Announcing Lingua 1.2.0 - The most accurate natural language detection library for the JVM, suitable for long and short text alike
- The most popular languages on Reddit, after analyzing 1M comments: English, German, Spanish, Portuguese, French, Italian, Romanian, Dutch... [OC]
- Usando a Biblioteca Lingua para Kotlin
- Language Detection - Pre Trained Models
- Lingua 1.1.0 released - The most accurate natural language detection library for the JVM