Manage all types of time series data in a single, purpose-built database. Run at any scale in any environment in the cloud, on-premises, or at the edge. Learn more →
Top 23 Python Japanese Projects
-
-
Project mention: Any way to extract characters from images, or are there any apps/ tools that allow you to handwrite the characters? | /r/LearnJapanese | 2023-05-19
I use manga-ocr on pc
-
Sonar
Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.
-
Project mention: I made a site to help you practice scoring in Riichi Mahjong :) | /r/Mahjong | 2022-10-01
Part credit should go to Nihisil on GitHub as I'm just using a function he created to spit out the details of the fu scoring.
-
konoha
🌿 An easy-to-use Japanese Text Processing tool, which makes it possible to switch tokenizers with small changes of code.
-
Mostly cause I got a good workflow going and I buy most of my ebooks from amazon. I use stuff like this anki add-on to mine sentences (although I have kinda stopped doing this lately) and got this jmdict dictionary for kindle which is pretty nice. I could probably set up something with the Onyx Boox reader and yomichan like others have mentioned but at this point I'm too lazy. Also the kindle is smaller and more portable, the Boox I have is slightly larger and better for manga though.
-
languagepod101-scraper
Python scraper for Language Pods such as Japanesepod101.com :japanese_ogre: :japan: :sushi: Compatible with Japanese, Chinese, French, German, Italian, Korean, Portuguese, Russian, Spanish and many more! ✨
-
-
Mergify
Updating dependencies is time-consuming.. Solutions like Dependabot or Renovate update but don't merge dependencies. You need to do it manually while it could be fully automated! Add a Merge Queue to your workflow and stop caring about PR management & merging. Try Mergify for free.
-
-
jiten
jiten - japanese android/cli/web dictionary based on jmdict/kanjidic — 日本語 辞典 和英辞典 漢英字典 和独辞典 和蘭辞典
-
japanese-words-to-vectors
Word2vec (word to vectors) approach for Japanese language using Gensim and Mecab.
Project mention: Abstract-Concreteness Value Lexical Data for Japanese | /r/linguistics | 2022-11-19I'm looking for data for how concrete or abstract different lexical items are in Japanese, similar to this data for English. I'm not very well versed in computational linguistics, so even though I've found this word-to-vector model that can create vectors for Japanese words, but I'm not sure how to extrapolate abstractness values from the resulting vectors, or if that's even possible without using a predefined abstract-concrete vector like shown here.
-
-
Project mention: is doing 50 new words/day pushing it too far with Anki? | /r/LearnJapanese | 2023-03-24
Early immersion is going to be painful, especially when you just started this month, I think if you do the pre-build deck for kuma bear 1 sorted by local frequency to clear out a few hundred most frequent words in kuma bear 1 it might be doable, but the number of words is not the only factor, without at least n5-4 grammar it would be really difficult to jump into reading native material. i would recommend watching the ToKini Andy genki grammar videos and installing grammar dictionaries on your yomichan to look up grammar as you read
-
anki-jrp
Anki add-on for generating furigana and pitch accent coloring & graphs, including optional flexible card styling
Project mention: Completed 6000 Japanese Words by Frequency Today! (+My Card Look) | /r/Anki | 2023-03-11 -
Project mention: Here is a list of reasons I stayed away from Kanji. Its from misunderstanding Kanji learning. I hope someone who isn't still learning Kanji or new to Kanji may read this and learn from my mistakes. | /r/LearnJapanese | 2022-12-28
I've just recently finished it and there is a whole suite of scripts to export it to Anki when you are done. Here
-
-
Project mention: What Japanese learning tools do you use on a regular basis? | /r/LearnJapanese | 2023-02-10
Kindle2Anki
-
-
Project mention: KanjiVG – SVGs of Kanji character strokes including order, shape and direction | news.ycombinator.com | 2023-02-21
-
Project mention: Office of the President of Mongolia (top to bottom text on the web) | news.ycombinator.com | 2023-04-24
I loved learning to read Japanese through the second volume of Heisig's _Learning the Kanji_. Volume 1, which teaches only meanings, is a slog, but volume 2, which teaches the Sino-Japanese readings is a beautiful example of organizing material to minimize entropy and maximize benefit for memorization as soon as possible. Unfortunately he never put together a volume 2 for a Chinese language. I haven't worked on it in a while, but I have a project where I attempt re-create the book for Japanese as well as Mandarin, Korean, and Vietnamese: https://nateglenn.com/uniunihan-db/ (repo: https://github.com/garfieldnate/uniunihan-db).
The "pure groups" are the ones where the presence of a specific radical guarantee you a specific pronunciation (within the list of character/pronunciation pairs you're trying to learn). Of the 4800 characters I used for the volume, only 290 are in the chapter on pure groups. The rest are either in semi-regular groups with varying numbers of exceptions, or in completely irregular groups with no discernible patterns.
The characters were designed continuously over a period of time starting thousands of years ago, and the phonetic parts were sometimes exact and sometimes just clues, similar sounds or rhymes to give the reader a hint. Ancient Chinese pronunciation has changed beyond recognition, so it makes perfect sense that the pronunciations wouldn't be regular anymore.
Mainland China uses a "simplified" character set, which did not affect literacy but in my opinion is a bit more difficult to read; they reduced the number of lines so that more characters look samey and they combined many (Mandarin) homonyms (https://en.wikipedia.org/wiki/Simplified_Chinese_characters#...), removing the meaning portion of characters that would have distinguished them. The simplification did not apply to all characters, so to achieve a high level of literacy you need to know traditional forms, anyway.
It would be interesting to see someone try to actually remodel hanzi from scratch for a specific dialect of Chinese, using 100% regular phonetic components and no variants; multiple pronunciations of a character in the current system would be required to be written differently. An interesting example of this would be certain Korean gukja, where they've combined a Chinese character with a phonetic hangeul (example: https://en.wiktionary.org/wiki/%E3%AB%87). This would be a truly simplified Chinese character set... but all of the culture's history that gets built into spelling over time would be completely lost, which is why I always prefer conservative spelling systems.
-
sakuraParisPythonAPI
(more than just) A Python wrapper for the Sakura Paris (Japanese) Dictionary API. All definitions are monolingual.
-
-
bulk_generate_japanese_vocab_frequency
An Anki add-on for adding the word frequency to the Japanese words in a specific deck.
-
asian-comprehension-worksheet-generator
Create worksheet to learn Asian language (eg. Chinese) and practice reading and writing in grid format. Perfect tool for kid and beginner.
-
InfluxDB
Collect and Analyze Billions of Data Points in Real Time. Manage all types of time series data in a single, purpose-built database. Run at any scale in any environment in the cloud, on-premises, or at the edge.
Python Japanese related posts
- Any way to extract characters from images, or are there any apps/ tools that allow you to handwrite the characters?
- Do you guys know where I can read the translated version of Isekai Joshi Kangoku?
- How do you read Japanese?
- Looking for a program for quick word extraction WITHOUT leaving the screen?
- easy manga that writes left to right (horizontal) and uses kanji with furigana
- What is the most accurate Windows OCR
- is doing 50 new words/day pushing it too far with Anki?
-
A note from our sponsor - InfluxDB
www.influxdata.com | 24 Sep 2023
Index
What are some of the best open-source Japanese projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | ark-pixel-font | 2,443 |
2 | manga-ocr | 1,057 |
3 | mahjong | 312 |
4 | konoha | 191 |
5 | jmdict-kindle | 181 |
6 | languagepod101-scraper | 137 |
7 | toiro | 110 |
8 | kanji-data | 91 |
9 | jiten | 83 |
10 | japanese-words-to-vectors | 81 |
11 | unihan-etl | 46 |
12 | Grammar-Dictionaries | 41 |
13 | anki-jrp | 24 |
14 | Koohii2Anki | 13 |
15 | yomichad | 12 |
16 | Kindle2Anki | 10 |
17 | kanji-flashcard-generator | 8 |
18 | anki-kunren | 5 |
19 | uniunihan-db | 3 |
20 | sakuraParisPythonAPI | 3 |
21 | PhantomBrigade-Translation | 2 |
22 | bulk_generate_japanese_vocab_frequency | 1 |
23 | asian-comprehension-worksheet-generator | 0 |