nlpo3
whatlang-rs
nlpo3 | whatlang-rs | |
---|---|---|
1 | 7 | |
30 | 952 | |
- | - | |
1.6 | 5.1 | |
5 months ago | about 2 months ago | |
Rust | Rust | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
nlpo3
-
Thai word tokenizers benchmark: nlpo3 vs newmm
Thanathip Suntorntip Gorlph ported Korakot Chaovavanich's Thai word tokenizer - Newmm, written in Python, to Rust called nlpo3. The nlpo3 website claimed that nlpo3 is 2X faster than Newmm. I felt that Nlpo3 must be faster than this claim because in contrast to Python's Regex engine, Rust's regex runs in the linear time since it was constrained not to support looking back/ahead. Moreover, 2X faster is ambiguous.
whatlang-rs
-
Lingua 1.5.0 - The most accurate natural language detection library for Rust, now with support for detecting multiple languages in mixed-language text
How does it compare to whatlang?
-
Python Binding for WhatLang (Detect languages) - Blazing Fast ⚡
WhatLang is a Python library for detecting the language of a text. It is based on the WhatLang Rust library.
-
To people with real Rusty jobs: How did you land it? What exactly do you do at your job? How proficient are you? What skills besides Rust? How long did it take?
I started working on whatlang project (https://github.com/greyblake/whatlang-rs). In 2017 I started going to Rust interviews. At that moment there were only 3 companies in Berlin that were offering Rust jobs (as far as I know): Parity, Mozilla, 1aim. I had interview with all of them and did not pass. I had classical Ruby/web background, and at that moment Rust was seen as alternative to C++, so many would expect me to know C++ well (but it was not really the case). I did continue working on my open source projects and writing blog posts from time to time. Year 2020 was very different. I was like rust turned from underdog to mainstream. I felt like Rust job openings tripled. Head hunters started writing me on LinkedIn, waw! I got contacted by big CryptoExchange, because they wanted to use my library for technical analysis. Sounds like a dream! Eventually, I find a job at Impero.com, thanks to this subreddit. They posted a job description and I send them my CV. Soon I got hired. It's a remote job, but at that moment it did not make a difference, because of the pandemic.
-
Whatlang 0.15.0 released (lightweight lib for language recognition)
CHANGELOG: https://github.com/greyblake/whatlang-rs/blob/master/CHANGELOG.md
- Whatlang: A Natural language detection library for Rust
-
Whatlang strikes back
Regarding Chinese / Japanese, if I got it correctly Japanese may include Katakana, Hiragana and Mandarin, while Chinese includes only Mandarin characters (again I can be wrong here).
What are some alternatives?
hck - A sharp cut(1) clone.
regex - An implementation of regular expressions for Rust. This implementation uses finite automata and guarantees linear time matching on all inputs.
sd - Intuitive find & replace CLI (sed alternative)
Fluent - Rust implementation of Project Fluent
pythainlp - Thai Natural Language Processing in Python.
textwrap - An efficient and powerful Rust library for word wrapping text.
lingua-rs - The most accurate natural language detection library for Rust, suitable for short text and mixed-language text
oso - Oso is a batteries-included framework for building authorization in your application.
suffix - Fast suffix arrays for Rust (with Unicode support).
tiktoken-rs - Ready-made tokenizer library for working with GPT and tiktoken
ngrams - (Read-only) Generate n-grams