Open-source projects categorized as Linguistics Edit details

Top 23 Linguistic Open-Source Projects

  • rime-cantonese

    Rime Cantonese input schema | 粵語拼音輸入方案

    Project mention: Looking for a Cantonese pinyin keyboard for Windows 10 | | 2022-05-25

    The Github page is here if you'd prefer to look at that: It has more information and links some patch files for other romanization schemes (Yale, etc.) if you'd prefer those instead.

  • ipa-dict

    Monolingual wordlists with pronunciation information in IPA

    Project mention: TunicScript - Write in Tunic! | | 2022-03-26

    Credit to the open-dict-data ( project for their extension dictionary of IPA spellings of English words.

  • Scout APM

    Less time debugging, more time building. Scout APM allows you to find and fix performance issues with no hassle. Now with error monitoring and external services monitoring, Scout is a developer's best friend when it comes to application development.

  • awesome-linguistics

    A curated list of anything remotely related to linguistics

  • prosodic

    Prosodic: a metrical-phonological parser, written in Python. For English and Finnish, with flexible language support.

    Project mention: (Spoilers Main) Poetic Meters of the Songs in ASOIAF | | 2022-02-17

    When studying poetic meters, I note most songs in ASOIAF have regular meters. And I find a useful auto poetic meter analyzer Prosodic with 90% accuracy, so with the aid of it I tried to analyze all the songs.

  • wikipron

    Massively multilingual pronunciation mining

    Project mention: How to extract Russian IPA transcriptions from Wiktionary? | | 2022-06-19 <- may be what you're looking for

  • ichiran

    Linguistic tools for texts in Japanese language

  • dev

    PHOIBLE data and development. (by phoible)

    Project mention: Are there databases of “standardized” phonetic frequencies/harmonics? For example the vowel sound “a”? (with an API) so an IPA API :)? | | 2022-04-28

    I guess I’m looking for something like this: PHOIBLE but with audio

  • JetBrains

    Developer Ecosystem Survey 2022. Take part in the Developer Ecosystem Survey 2022 by JetBrains and get a chance to win a Macbook, a Nvidia graphics card, or other prizes. We’ll create an infographic full of stats, and you’ll get personalized results so you can compare yourself with other developers.

  • TextAnnotationGraphs

    A modular annotation system that supports complex, interactive annotation graphs embedded on top of sequences of text.

  • OpenGNT

    Open Greek New Testament Project; NA28 / NA27 Equivalent Text & Resources

    Project mention: Help finding Interlinear texts | | 2021-11-11 and do this as far as I can tell.

  • zeroshot_topics

    Topic Inference with Zeroshot models

    Project mention: Label your text data automatically with zeroshot_labels | | 2021-11-22
  • mlconjug3

    A Python library to conjugate verbs in French, English, Spanish, Italian, Portuguese and Romanian (more soon) using Machine Learning techniques.

  • tone

    A Cross-Cultural Writing System (by teamdrumwork)

    Project mention: What are the differences between these language sounds and IPA orthographies? | | 2022-06-11

    I am working on refining a fantasy language script, which is like a simplified down IPA for a game world. I have made room for the distinction between aspiration vs. h, or palatalization vs. y, etc., (or labialization vs. w), but TBH I can't see when you would ever treat them differently. So I'm not entirely sure this distinction is necessary to have. Why is it absolutely necessary?

  • treebender

    A HDPSG-inspired symbolic natural language parser written in Rust

    Project mention: Ask HN: Which personal projects got you hired? | | 2022-05-15
  • WonderfulPolishLanguage

    This is a repository created for the list of resources for learning and exploring Wonderful Polish language.

    Project mention: Can you recommend me some resource for learning polish? (read desc.) | | 2021-08-06

    You can find many resources on this link: The author do the good job and sort all materials very well :)

  • langua

    A suite of language tools

  • maxent-learner-hw

    A tool for automatically inferring phonotactic grammars from a lexicon and using those grammars to generate random text

  • NaiPosTagger

    A part of speech tagger written in PHP.

    Project mention: N-ai: a part of speech tagger for chatbots, keywords extraction, text analysis and more. Now is open source | | 2021-08-15
  • sca

    Apply sound changes automatically to a set of words.

    Project mention: Sound change appliers | | 2022-06-28
  • iso639

    ISO 639 language codes in Python (by jacksonllee)

    Project mention: New Python Package for ISO 639 Language Codes | | 2022-05-17
  • sniglet

    Generate sniglets with machine learning!

    Project mention: Just published a major feature update to my sniglet generator, written in SwiftUI/Catalyst! | | 2022-02-19
  • tune

    An Intermediate Constructed Language (by teamdrumwork)

    Project mention: How many words/concepts do you need to be able to understand and communicate about reality at a deep level? | | 2022-05-26

    But if you are working on a conlang, how many words would you need to define that people should memorize to have a rich understanding of the world? If you want to try and break this problem down into smaller pieces that is fine with me. But it seems in my initial attempt at a conlang, you can cover most abstract concepts with about 2,000 words. Then for the common objects on earth (rocks, trees, etc.) or highly specific named entities (star constellations, or atoms/materials for example, or foods or daily objects like kitchen supplies), you can add another 2,000 or so words to the lexicon. I was quite surprised when I listed out every possible tool I could think of, and the list was only about 700 individual words (then you can combine words like "Circular saw" to get more tools). Or for foods, Wikipedia has less than about 1,000 named foods which cover every possible thing you've ever eaten.

  • treeBuilderJS

    A web interface to quickly and easily draw syntactic trees. Created for linguists.

  • scraper

    Declarative web scraper in JavaScript primarily designed to extract linguistics data (by sergeyt)

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2022-06-28.

Linguistics related posts


What are some of the best open-source Linguistic projects? This list will help you:

Project Stars
1 rime-cantonese 306
2 ipa-dict 266
3 awesome-linguistics 263
4 prosodic 206
5 wikipron 190
6 ichiran 171
7 dev 89
8 TextAnnotationGraphs 81
9 OpenGNT 56
10 zeroshot_topics 50
11 mlconjug3 44
12 tone 37
13 treebender 28
14 WonderfulPolishLanguage 23
15 langua 19
16 maxent-learner-hw 8
17 NaiPosTagger 6
18 sca 4
19 iso639 4
20 sniglet 4
21 tune 3
22 treeBuilderJS 2
23 scraper 2
Find remote jobs at our new job board There are 4 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.
Deliver Cleaner and Safer Code - Right in Your IDE of Choice!
SonarLint is a free and open source IDE extension that identifies and catches bugs and vulnerabilities as you code, directly in the IDE. Install from your favorite IDE marketplace today.