Python Japanese

Open-source Python projects categorized as Japanese

Top 23 Python Japanese Projects

  • ark-pixel-font

    Open source Pan-CJK pixel font / 开源的泛中日韩像素字体

  • manga-ocr

    Optical character recognition for Japanese text, with the main focus being Japanese manga

    Project mention: Any way to extract characters from images, or are there any apps/ tools that allow you to handwrite the characters? | /r/LearnJapanese | 2023-05-19

    I use manga-ocr on pc

  • Sonar

    Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.

  • mahjong

    Implementation of riichi mahjong related stuff (hand cost, shanten, agari end, etc.)

    Project mention: I made a site to help you practice scoring in Riichi Mahjong :) | /r/Mahjong | 2022-10-01

    Part credit should go to Nihisil on GitHub as I'm just using a function he created to spit out the details of the fu scoring.

  • konoha

    🌿 An easy-to-use Japanese Text Processing tool, which makes it possible to switch tokenizers with small changes of code.

  • jmdict-kindle

    Japanese - English dictionary for Kindle based on the JMdict / EDICT database

    Project mention: Onyx Boox e-reader | /r/LearnJapanese | 2023-02-17

    Mostly cause I got a good workflow going and I buy most of my ebooks from amazon. I use stuff like this anki add-on to mine sentences (although I have kinda stopped doing this lately) and got this jmdict dictionary for kindle which is pretty nice. I could probably set up something with the Onyx Boox reader and yomichan like others have mentioned but at this point I'm too lazy. Also the kindle is smaller and more portable, the Boox I have is slightly larger and better for manga though.

  • languagepod101-scraper

    Python scraper for Language Pods such as Japanesepod101.com :japanese_ogre: :japan: :sushi: Compatible with Japanese, Chinese, French, German, Italian, Korean, Portuguese, Russian, Spanish and many more! ✨

  • toiro

    A comparison tool of Japanese tokenizers

  • Mergify

    Updating dependencies is time-consuming.. Solutions like Dependabot or Renovate update but don't merge dependencies. You need to do it manually while it could be fully automated! Add a Merge Queue to your workflow and stop caring about PR management & merging. Try Mergify for free.

  • kanji-data

    A JSON kanji dataset with updated JLPT levels and WaniKani information

  • jiten

    jiten - japanese android/cli/web dictionary based on jmdict/kanjidic — 日本語 辞典 和英辞典 漢英字典 和独辞典 和蘭辞典

  • japanese-words-to-vectors

    Word2vec (word to vectors) approach for Japanese language using Gensim and Mecab.

    Project mention: Abstract-Concreteness Value Lexical Data for Japanese | /r/linguistics | 2022-11-19

    I'm looking for data for how concrete or abstract different lexical items are in Japanese, similar to this data for English. I'm not very well versed in computational linguistics, so even though I've found this word-to-vector model that can create vectors for Japanese words, but I'm not sure how to extrapolate abstractness values from the resulting vectors, or if that's even possible without using a predefined abstract-concrete vector like shown here.

  • unihan-etl

    Export UNIHAN's database to csv, json or yaml

  • Grammar-Dictionaries

    Project mention: is doing 50 new words/day pushing it too far with Anki? | /r/LearnJapanese | 2023-03-24

    Early immersion is going to be painful, especially when you just started this month, I think if you do the pre-build deck for kuma bear 1 sorted by local frequency to clear out a few hundred most frequent words in kuma bear 1 it might be doable, but the number of words is not the only factor, without at least n5-4 grammar it would be really difficult to jump into reading native material. i would recommend watching the ToKini Andy genki grammar videos and installing grammar dictionaries on your yomichan to look up grammar as you read

  • anki-jrp

    Anki add-on for generating furigana and pitch accent coloring & graphs, including optional flexible card styling

    Project mention: Completed 6000 Japanese Words by Frequency Today! (+My Card Look) | /r/Anki | 2023-03-11
  • Koohii2Anki

    Full Kanji Koohii to Anki Migration

    Project mention: Here is a list of reasons I stayed away from Kanji. Its from misunderstanding Kanji learning. I hope someone who isn't still learning Kanji or new to Kanji may read this and learn from my mistakes. | /r/LearnJapanese | 2022-12-28

    I've just recently finished it and there is a whole suite of scripts to export it to Anki when you are done. Here

  • yomichad

    Japanese pop-up dictionary for qutebrowser

  • Kindle2Anki

    Create Anki cards from Kindle's Vocab-Builder and Yomichan dictionaries (by Kartoffel0)

    Project mention: What Japanese learning tools do you use on a regular basis? | /r/LearnJapanese | 2023-02-10

    Kindle2Anki

  • kanji-flashcard-generator

    Simple script to generate flashcards for studying kanji

  • anki-kunren

    Interactive japanese kanji writing drill practice for anki with stroke order

    Project mention: KanjiVG – SVGs of Kanji character strokes including order, shape and direction | news.ycombinator.com | 2023-02-21
  • uniunihan-db

    Chinese character dictionary for learning Sino-xenic languages

    Project mention: Office of the President of Mongolia (top to bottom text on the web) | news.ycombinator.com | 2023-04-24

    I loved learning to read Japanese through the second volume of Heisig's _Learning the Kanji_. Volume 1, which teaches only meanings, is a slog, but volume 2, which teaches the Sino-Japanese readings is a beautiful example of organizing material to minimize entropy and maximize benefit for memorization as soon as possible. Unfortunately he never put together a volume 2 for a Chinese language. I haven't worked on it in a while, but I have a project where I attempt re-create the book for Japanese as well as Mandarin, Korean, and Vietnamese: https://nateglenn.com/uniunihan-db/ (repo: https://github.com/garfieldnate/uniunihan-db).

    The "pure groups" are the ones where the presence of a specific radical guarantee you a specific pronunciation (within the list of character/pronunciation pairs you're trying to learn). Of the 4800 characters I used for the volume, only 290 are in the chapter on pure groups. The rest are either in semi-regular groups with varying numbers of exceptions, or in completely irregular groups with no discernible patterns.

    The characters were designed continuously over a period of time starting thousands of years ago, and the phonetic parts were sometimes exact and sometimes just clues, similar sounds or rhymes to give the reader a hint. Ancient Chinese pronunciation has changed beyond recognition, so it makes perfect sense that the pronunciations wouldn't be regular anymore.

    Mainland China uses a "simplified" character set, which did not affect literacy but in my opinion is a bit more difficult to read; they reduced the number of lines so that more characters look samey and they combined many (Mandarin) homonyms (https://en.wikipedia.org/wiki/Simplified_Chinese_characters#...), removing the meaning portion of characters that would have distinguished them. The simplification did not apply to all characters, so to achieve a high level of literacy you need to know traditional forms, anyway.

    It would be interesting to see someone try to actually remodel hanzi from scratch for a specific dialect of Chinese, using 100% regular phonetic components and no variants; multiple pronunciations of a character in the current system would be required to be written differently. An interesting example of this would be certain Korean gukja, where they've combined a Chinese character with a phonetic hangeul (example: https://en.wiktionary.org/wiki/%E3%AB%87). This would be a truly simplified Chinese character set... but all of the culture's history that gets built into spelling over time would be completely lost, which is why I always prefer conservative spelling systems.

  • sakuraParisPythonAPI

    (more than just) A Python wrapper for the Sakura Paris (Japanese) Dictionary API. All definitions are monolingual.

  • PhantomBrigade-Translation

    Translation project of "Phantom Brigade". Check "Releases".

  • bulk_generate_japanese_vocab_frequency

    An Anki add-on for adding the word frequency to the Japanese words in a specific deck.

  • asian-comprehension-worksheet-generator

    Create worksheet to learn Asian language (eg. Chinese) and practice reading and writing in grid format. Perfect tool for kid and beginner.

    Project mention: I made this tool to help my kids learn Chinese | /r/Python | 2023-06-05
  • InfluxDB

    Collect and Analyze Billions of Data Points in Real Time. Manage all types of time series data in a single, purpose-built database. Run at any scale in any environment in the cloud, on-premises, or at the edge.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2023-06-05.

Python Japanese related posts

Index

What are some of the best open-source Japanese projects in Python? This list will help you:

Project Stars
1 ark-pixel-font 2,443
2 manga-ocr 1,057
3 mahjong 312
4 konoha 191
5 jmdict-kindle 181
6 languagepod101-scraper 137
7 toiro 110
8 kanji-data 91
9 jiten 83
10 japanese-words-to-vectors 81
11 unihan-etl 46
12 Grammar-Dictionaries 41
13 anki-jrp 24
14 Koohii2Anki 13
15 yomichad 12
16 Kindle2Anki 10
17 kanji-flashcard-generator 8
18 anki-kunren 5
19 uniunihan-db 3
20 sakuraParisPythonAPI 3
21 PhantomBrigade-Translation 2
22 bulk_generate_japanese_vocab_frequency 1
23 asian-comprehension-worksheet-generator 0
Collect and Analyze Billions of Data Points in Real Time
Manage all types of time series data in a single, purpose-built database. Run at any scale in any environment in the cloud, on-premises, or at the edge.
www.influxdata.com