Our great sponsors
-
lingua-go
The most accurate natural language detection library for Go, suitable for short text and mixed-language text
-
lingua-py
The most accurate natural language detection library for Python, suitable for short text and mixed-language text
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
Worth noting that this is on Lingua-Go's issues list for the 1.1.0 version: https://github.com/pemistahl/lingua-go/issues/9
There is also a comparison with CLD 2 in the repo of sister Python library:
https://github.com/pemistahl/lingua-py#4-how-good-is-it
CDL 2 seems to be slightly less accurate than CLD 3 on average.
In general, language detection is surprisingly hard. There is LSTM-based implementation https://github.com/AU-DIS/LSTM_langid which should be better than ngrams.
Related posts
- Lingua 1.2.0 - The most accurate natural language detection library for Go, now with support for detecting multiple languages in mixed-language text
- Lingua 1.1.0 - The most accurate natural language detection library for Go, suitable for long and short text alike
- Hacker News top posts: Feb 12, 2022
- The most accurate natural language detection library for Go, suitable for long and short text alike
- Typos — automatic language recognition and error detection in Word and Excel documents