pykakasi vs uniunihan-db

pykakasi

Lightweight converter from Japanese Kana-kanji sentences into Kana-Roman. (by miurahr)

DISCONTINUED

Suggest alternative

Edit details

uniunihan-db

Chinese character dictionary for learning Sino-xenic languages (by garfieldnate)

Python mandarin cantonese Japanese vietnamese Korean

Source Code

garfieldnate.github.io

Suggest alternative

Edit details

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

pykakasi		uniunihan-db
	Project
1	Mentions	1
350	Stars	4
-	Growth	-
5.1	Activity	4.7
almost 2 years ago	Latest Commit	2 months ago
Python	Language	Python
GNU General Public License v3.0 only	License	Apache License 2.0

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

pykakasi

Posts with mentions or reviews of pykakasi. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2021-08-04.

Any recommendations for a good Japanese NLP engine?
4 projects | /r/LearnJapanese | 4 Aug 2021

I have built a prototype application for helping me learn japanese which does the following using kakasi.

uniunihan-db

Posts with mentions or reviews of uniunihan-db. We have used some of these posts to build our list of alternatives and similar projects.

Office of the President of Mongolia (top to bottom text on the web)
1 project | news.ycombinator.com | 24 Apr 2023

I loved learning to read Japanese through the second volume of Heisig's _Learning the Kanji_. Volume 1, which teaches only meanings, is a slog, but volume 2, which teaches the Sino-Japanese readings is a beautiful example of organizing material to minimize entropy and maximize benefit for memorization as soon as possible. Unfortunately he never put together a volume 2 for a Chinese language. I haven't worked on it in a while, but I have a project where I attempt re-create the book for Japanese as well as Mandarin, Korean, and Vietnamese: https://nateglenn.com/uniunihan-db/ (repo: https://github.com/garfieldnate/uniunihan-db).
The "pure groups" are the ones where the presence of a specific radical guarantee you a specific pronunciation (within the list of character/pronunciation pairs you're trying to learn). Of the 4800 characters I used for the volume, only 290 are in the chapter on pure groups. The rest are either in semi-regular groups with varying numbers of exceptions, or in completely irregular groups with no discernible patterns.
The characters were designed continuously over a period of time starting thousands of years ago, and the phonetic parts were sometimes exact and sometimes just clues, similar sounds or rhymes to give the reader a hint. Ancient Chinese pronunciation has changed beyond recognition, so it makes perfect sense that the pronunciations wouldn't be regular anymore.
Mainland China uses a "simplified" character set, which did not affect literacy but in my opinion is a bit more difficult to read; they reduced the number of lines so that more characters look samey and they combined many (Mandarin) homonyms (https://en.wikipedia.org/wiki/Simplified_Chinese_characters#...), removing the meaning portion of characters that would have distinguished them. The simplification did not apply to all characters, so to achieve a high level of literacy you need to know traditional forms, anyway.
It would be interesting to see someone try to actually remodel hanzi from scratch for a specific dialect of Chinese, using 100% regular phonetic components and no variants; multiple pronunciations of a character in the current system would be required to be written differently. An interesting example of this would be certain Korean gukja, where they've combined a Chinese character with a phonetic hangeul (example: https://en.wiktionary.org/wiki/%E3%AB%87). This would be a truly simplified Chinese character set... but all of the culture's history that gets built into spelling over time would be completely lost, which is why I always prefer conservative spelling systems.

What are some alternatives?

When comparing pykakasi and uniunihan-db you can also consider the following projects:

spaCy - 💫 Industrial-strength Natural Language Processing (NLP) in Python

kengdic - Joe Speigle's Korean/English dictionary database

jiten - jiten - japanese android/cli/web dictionary based on jmdict/kanjidic — 日本語　辞典　和英辞典　漢英字典　和独辞典　和蘭辞典

buondua-downloader - :ribbon: NSFW. Album downloader for https://buondua.com.

jProcessing - Japanese Natural Langauge Processing Libraries

unihan-etl - Export UNIHAN's database to csv, json or yaml

jmdict-kindle - Japanese - English dictionary for Kindle based on the JMdict / EDICT database

python-jamo - Hangul syllable decomposition and synthesis using jamo.

toiro - A comparison tool of Japanese tokenizers

ark-pixel-font - Open source Pan-CJK pixel font / 开源的泛中日韩像素字体

mahjong - Implementation of riichi mahjong related stuff (hand cost, shanten, agari end, etc.)

pykakasi vs spaCy uniunihan-db vs kengdic pykakasi vs jiten uniunihan-db vs buondua-downloader pykakasi vs jProcessing uniunihan-db vs unihan-etl pykakasi vs jmdict-kindle uniunihan-db vs python-jamo pykakasi vs toiro uniunihan-db vs ark-pixel-font pykakasi vs mahjong

Compare pykakasi vs uniunihan-db and see what are their differences.

pykakasi

uniunihan-db

pykakasi

uniunihan-db

What are some alternatives?