hangul-jamo
A library to compose and decompose Hangul syllables using Hangul jamo characters (by jonghwanhyeon)
wikiextractor
A tool for extracting plain text from Wikipedia dumps (by attardi)
Our great sponsors
hangul-jamo | wikiextractor | |
---|---|---|
1 | 3 | |
25 | 3,630 | |
- | - | |
0.0 | 0.0 | |
almost 2 years ago | 3 months ago | |
Python | Python | |
MIT License | GNU Affero General Public License v3.0 |
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
hangul-jamo
Posts with mentions or reviews of hangul-jamo.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2022-06-13.
-
Letter and next letter frequencies for 24 languages (see comments for non-English plots) [OC]
Korean (40 jamo, decomposed using hangul-jamo)
wikiextractor
Posts with mentions or reviews of wikiextractor.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2022-06-13.
-
Letter and next letter frequencies for 24 languages (see comments for non-English plots) [OC]
Larger text corpora: each plot is generated from around 450MB of Wikipedia article text (or as much as is available), extracted using wikiextractor.
-
Most similar language to each European language, based purely on letter distribution [OC]
Methodology: extracted 100MB of article texts from each of the different Wikipedias using https://github.com/attardi/wikiextractor, and counted the character prevalences using Python. The similarity measure is just the sum of the absolute differences in character prevalences (so a lower score means more similar): e.g. if language A has distribution {A: 0.5, B: 0.3, C: 0.2} and language B has distribution {A: 0.8, B: 0.2} then their similarity is |0.5-0.8|+|0.3-0.2|+|0.2-0.0|=0.6. The final chart was generated using graphviz and pillar.
-
Finding a English Wikipedia dump
With the help of wikiextractor, i was able to query it and process the dump. However, when i start inspecting it, some articles are empty. For example AccessibleComputing should not be empty, but the dump gave: ``` AccessibleComputing 0 10 854851586 2021-01-23T15:15:01Z Elli shel wikitext text/x-wiki #REDIRECT [[Computer accessibility]]
What are some alternatives?
When comparing hangul-jamo and wikiextractor you can also consider the following projects:
ark-pixel-font - Open source Pan-CJK pixel font / 开源的泛中日韩像素字体
kogpt - KakaoBrain KoGPT (Korean Generative Pre-trained Transformer)
python-jamo - Hangul syllable decomposition and synthesis using jamo.