pythainlp
toiro
pythainlp | toiro | |
---|---|---|
2 | 1 | |
926 | 112 | |
0.1% | - | |
9.5 | 5.2 | |
4 days ago | 9 months ago | |
Python | Python | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
pythainlp
-
PyThaiNLP 2.4.0-dev0
Read more about PyThaiNLP v2.4.0-dev0: https://github.com/PyThaiNLP/pythainlp/releases/tag/v2.4.0-dev0
-
Thai word tokenizers benchmark: nlpo3 vs newmm
Thanathip Suntorntip Gorlph ported Korakot Chaovavanich's Thai word tokenizer - Newmm, written in Python, to Rust called nlpo3. The nlpo3 website claimed that nlpo3 is 2X faster than Newmm. I felt that Nlpo3 must be faster than this claim because in contrast to Python's Regex engine, Rust's regex runs in the linear time since it was constrained not to support looking back/ahead. Moreover, 2X faster is ambiguous.
toiro
-
Any recommendations for a good Japanese NLP engine?
Thank you! I have also been looking at Toiro which is not a NLP but a comparison tool, and it includes MeCab. You can use it to install all Japanese language parsers (that it knows about) and then run tests on your data set. Right now I'm running each one on the game script I have and see which one is best.
What are some alternatives?
nlpo3 - Thai Natural Language Processing library in Rust, with Python and Node bindings.
pykakasi - Lightweight converter from Japanese Kana-kanji sentences into Kana-Roman.
PyProjects - Beginner Friendly Python-Projects
jProcessing - Japanese Natural Langauge Processing Libraries
google-play-scraper - Google play scraper for Python inspired by <facundoolano/google-play-scraper>
abydos - Abydos NLP/IR library for Python
uni2db - The Unified University Database (uni2db). Tools to get information about courses offered at various colleges/universities.
transformers - 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Manoonchai - "มนูญชัย" Modern, productive, and data-driven Thai keyboard layout generated with CarpalxTH
regex - An implementation of regular expressions for Rust. This implementation uses finite automata and guarantees linear time matching on all inputs.