Linguistics

Open-source projects categorized as Linguistics

Top 23 Linguistic Open-Source Projects

  • tatoeba2

    Tatoeba is a platform whose purpose is to create a collaborative and open dataset of sentences and their translations.

  • Project mention: Where can I find reliable example sentences? | /r/LearnJapanese | 2023-05-27

    Maybe on tatoeba.org with filters

  • ipa-dict

    Monolingual wordlists with pronunciation information in IPA

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • rime-cantonese

    Rime Cantonese input schema | 粵語拼音輸入方案

  • Project mention: How to type Jyutcitzi? 【RIME keyboard installation manual】? | /r/CantoneseScriptReform | 2023-12-07

    Please follow instructions at https://github.com/rime/rime-cantonese/wiki and https://github.com/rime/rime-cantonese/wiki/新手安裝教程 In a nutshell, download and install using the following files: Mac: mac-2021.05.16-installer.pkg Windows: windows-sfx-2021.05.16-installer.exe Linux: Download and run ibus-install.sh Please check to ensure that RIME Cantonese is properly installed before proceeding to Step 3.

  • awesome-linguistics

    A curated list of anything remotely related to linguistics

  • Project mention: Untranslatable | news.ycombinator.com | 2024-01-26

    Very interesting they were funded from Kickstarter! 292 backers at 10k€. I assumed you needed quite the following for Kickstarter to work...

    And it looks like they do. 49k followers on Facebook and 16k on Instagram. Not sure how far back these go, but looks like very "shareable" content, where they would take I translatable words and make little funny pictures or memes or other intriguing things and post them. Lots of interaction comments/reaction-wise

    Timeline-wise this was backed on Kickstarter in 2020. Site launched in summer 2020. The creator was very active on Kickstarter working on communicating and updating the community with what was going on (until the end there).

    Also seems to have a Patreon, and worked itself into other places like https://github.com/theimpossibleastronaut/awesome-linguistic...

  • wikipron

    Massively multilingual pronunciation mining

  • Project mention: does anyone know of any phonological transcription APIs for portuguese? | /r/asklinguistics | 2023-04-20

    There's wikipron, which scrapes data from Wiktionary.

  • ichiran

    Linguistic tools for texts in Japanese language

  • prosodic

    Prosodic: a metrical-phonological parser, written in Python. For English and Finnish, with flexible language support.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • dev

    PHOIBLE data and development. (by phoible)

  • Project mention: Does someone have a phonemic inventory of all the romance languages, a list of all the phonemes in all the romance languages ? | /r/linguistics | 2023-05-11

    Does the language you’re thinking of have an inventory on https://phoible.org/?

  • TextAnnotationGraphs

    A modular annotation system that supports complex, interactive annotation graphs embedded on top of sequences of text.

  • ambuda

    Main application code for Ambuda, a breakthrough Sanskrit library (ambuda.org)

  • Project mention: The Theorist Who Sees Math in Art, Music and Writing | news.ycombinator.com | 2024-03-04

    >"Thousands of years ago in India, poets were trying to think about the possible meters. In Sanskrit poetry, you have long and short syllables. Long is twice as long as short. If you want to work out how many there are that take a length of time of three, you can have short, short, short, or long, short, or short, long. There are three ways to make three. There are five ways to make a length-four phrase. And there are eight ways to make a length-five phrase. This sequence you’re getting is one where every term is the sum of the previous two. You exactly reproduce what we nowadays call the Fibonacci sequence. But this was centuries before Fibonacci."

    Related:

    Ambuda: "Building the world's largest Sanskrit library":

    https://ambuda.org/

  • odict

    A blazingly-fast, offline-first format and toolchain for lexical data 📖

  • OpenGNT

    Open Greek New Testament Project; NA28 / NA27 Equivalent Text & Resources

  • langstats

    A visual color bar of the programming languages in your directory, with percentages and labels

  • zeroshot_topics

    Topic Inference with Zeroshot models

  • tone

    A Cross-Cultural Writing System (by termsurf)

  • treebender

    A HDPSG-inspired symbolic natural language parser written in Rust

  • langua

    A suite of language tools

  • WonderfulPolishLanguage

    This is a repository created for the list of resources for learning and exploring Wonderful Polish language.

  • Project mention: Are there any websites gathering graded readers in different languages; either making it themselves or simply sharing sources. I’m specifically looking for fiction books in Polish A1/A2. | /r/languagelearning | 2023-04-26

    - Big list of resources (for learners) here, maybe you'll find something you like - https://github.com/TheDomcio/WonderfulPolishLanguage

  • proiel-treebank

    Official releases of the PROIEL treebank of ancient Indo-European languages

  • Project mention: Greek New Testament lemmatized with morphosyntactic annotation | /r/AcademicBiblical | 2023-04-19
  • syn

    🌾 Get synonyms and antonyms of words from Thesaurus.com and other sources in your terminal, with rich output. (by agmmnn)

  • google-books-ngram-frequency

    Word/n-gram frequency lists for the Google Books Ngram Corpus (v3, all languages) with Python code

  • iso639

    ISO 639 language codes (by jacksonllee)

  • NaiPosTagger

    A part of speech tagger written in PHP.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-03-04.

Linguistics related posts

Index

What are some of the best open-source Linguistic projects? This list will help you:

Project Stars
1 tatoeba2 663
2 ipa-dict 493
3 rime-cantonese 488
4 awesome-linguistics 353
5 wikipron 288
6 ichiran 273
7 prosodic 267
8 dev 108
9 TextAnnotationGraphs 89
10 ambuda 78
11 odict 78
12 OpenGNT 73
13 langstats 61
14 zeroshot_topics 60
15 tone 52
16 treebender 39
17 langua 35
18 WonderfulPolishLanguage 34
19 proiel-treebank 33
20 syn 26
21 google-books-ngram-frequency 25
22 iso639 23
23 NaiPosTagger 14
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com