SudachiPy
momepy
SudachiPy | momepy | |
---|---|---|
3 | 1 | |
348 | 451 | |
- | 3.1% | |
1.6 | 8.1 | |
over 1 year ago | 6 days ago | |
Python | Python | |
Apache License 2.0 | BSD 3-clause "New" or "Revised" License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
SudachiPy
-
Sakubun - a tool I made to help you practice kanji, with customized quiz questions and sentences
The current readings were generated with SudachiPy, with a little processing. UniDic seems pretty interesting, I'll check it out. Do you know how well its accuracy is, compared to Sudachi?
-
software which turn hiragana and katakana into kanji
There are free tools for both of these things. I made game2text to do OCR and script matching. It includes a segmentation and normalization library Sudachi but I have not used its normalization feature for the app. I'm not sure anyone else even wants this feature but it will be pretty straightforward to add it if you're familiar with Python and vanilla Javascript.
-
Tokenizing / picking words out of non-english languages
spaCy uses SudachiPy internally (see the doc comment about that), so if you don't need any of spaCy's extra features or want more control over the tokenization, you could use SudachiPy directly.
momepy
-
What are some very useful (but maybe not that popular) Python libraries that you've used?
I love momepyfor its road network analysis capabilities - given a GeoDataFrame of roads/ways as linestrings you get a nice graph to look for paths.
What are some alternatives?
Sudachi - A Japanese Tokenizer for Business
camel_tools - A suite of Arabic natural language processing tools developed by the CAMeL Lab at New York University Abu Dhabi.
spaCy - 💫 Industrial-strength Natural Language Processing (NLP) in Python
quanfima - Quanfima (Quantitative Analysis of Fibrous Materials)
osmnx - OSMnx is a Python package to easily download, model, analyze, and visualize street networks and other geospatial features from OpenStreetMap.
mecab - Yet another Japanese morphological analyzer
NeuroTS - Topological Neuron Synthesis
simplemma - Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
kagome - Self-contained Japanese Morphological Analyzer written in pure Go
TwitchUrbanDictionary - Twitch Bot to look up urban dictionary definitions and examples.