The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning. Learn more →
Top 23 Linguistic Open-Source Projects
-
tatoeba2
Tatoeba is a platform whose purpose is to create a collaborative and open dataset of sentences and their translations.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
prosodic
Prosodic: a metrical-phonological parser, written in Python. For English and Finnish, with flexible language support.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
TextAnnotationGraphs
A modular annotation system that supports complex, interactive annotation graphs embedded on top of sequences of text.
-
langstats
A visual color bar of the programming languages in your directory, with percentages and labels
-
WonderfulPolishLanguage
This is a repository created for the list of resources for learning and exploring Wonderful Polish language.
-
syn
🌾 Get synonyms and antonyms of words from Thesaurus.com and other sources in your terminal, with rich output. (by agmmnn)
-
google-books-ngram-frequency
Word/n-gram frequency lists for the Google Books Ngram Corpus (v3, all languages) with Python code
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Maybe on tatoeba.org with filters
Project mention: How to type Jyutcitzi? 【RIME keyboard installation manual】? | /r/CantoneseScriptReform | 2023-12-07Please follow instructions at https://github.com/rime/rime-cantonese/wiki and https://github.com/rime/rime-cantonese/wiki/新手安裝教程 In a nutshell, download and install using the following files: Mac: mac-2021.05.16-installer.pkg Windows: windows-sfx-2021.05.16-installer.exe Linux: Download and run ibus-install.sh Please check to ensure that RIME Cantonese is properly installed before proceeding to Step 3.
Very interesting they were funded from Kickstarter! 292 backers at 10k€. I assumed you needed quite the following for Kickstarter to work...
And it looks like they do. 49k followers on Facebook and 16k on Instagram. Not sure how far back these go, but looks like very "shareable" content, where they would take I translatable words and make little funny pictures or memes or other intriguing things and post them. Lots of interaction comments/reaction-wise
Timeline-wise this was backed on Kickstarter in 2020. Site launched in summer 2020. The creator was very active on Kickstarter working on communicating and updating the community with what was going on (until the end there).
Also seems to have a Patreon, and worked itself into other places like https://github.com/theimpossibleastronaut/awesome-linguistic...
Project mention: does anyone know of any phonological transcription APIs for portuguese? | /r/asklinguistics | 2023-04-20There's wikipron, which scrapes data from Wiktionary.
Project mention: Does someone have a phonemic inventory of all the romance languages, a list of all the phonemes in all the romance languages ? | /r/linguistics | 2023-05-11Does the language you’re thinking of have an inventory on https://phoible.org/?
Project mention: The Theorist Who Sees Math in Art, Music and Writing | news.ycombinator.com | 2024-03-04>"Thousands of years ago in India, poets were trying to think about the possible meters. In Sanskrit poetry, you have long and short syllables. Long is twice as long as short. If you want to work out how many there are that take a length of time of three, you can have short, short, short, or long, short, or short, long. There are three ways to make three. There are five ways to make a length-four phrase. And there are eight ways to make a length-five phrase. This sequence you’re getting is one where every term is the sum of the previous two. You exactly reproduce what we nowadays call the Fibonacci sequence. But this was centuries before Fibonacci."
Related:
Ambuda: "Building the world's largest Sanskrit library":
Project mention: Are there any websites gathering graded readers in different languages; either making it themselves or simply sharing sources. I’m specifically looking for fiction books in Polish A1/A2. | /r/languagelearning | 2023-04-26- Big list of resources (for learners) here, maybe you'll find something you like - https://github.com/TheDomcio/WonderfulPolishLanguage
Project mention: Greek New Testament lemmatized with morphosyntactic annotation | /r/AcademicBiblical | 2023-04-19
Linguistics related posts
- Untranslatable
- A colloquial (عامیانه) frequency list! Our prayers have been answered.
- Draw Syntactic Trees!
- Seeking your insights on "Loquax": A tool for phonological analysis
- Does someone have a phonemic inventory of all the romance languages, a list of all the phonemes in all the romance languages ?
- Are there any websites gathering graded readers in different languages; either making it themselves or simply sharing sources. I’m specifically looking for fiction books in Polish A1/A2.
- Design considerations for digitalized ancient texts on the web, what would be ideal to have?
-
A note from our sponsor - WorkOS
workos.com | 19 Apr 2024
Index
What are some of the best open-source Linguistic projects? This list will help you:
Project | Stars | |
---|---|---|
1 | tatoeba2 | 663 |
2 | ipa-dict | 493 |
3 | rime-cantonese | 488 |
4 | awesome-linguistics | 353 |
5 | wikipron | 288 |
6 | ichiran | 273 |
7 | prosodic | 267 |
8 | dev | 108 |
9 | TextAnnotationGraphs | 89 |
10 | ambuda | 78 |
11 | odict | 78 |
12 | OpenGNT | 73 |
13 | langstats | 61 |
14 | zeroshot_topics | 60 |
15 | tone | 52 |
16 | treebender | 39 |
17 | langua | 35 |
18 | WonderfulPolishLanguage | 34 |
19 | proiel-treebank | 33 |
20 | syn | 26 |
21 | google-books-ngram-frequency | 25 |
22 | iso639 | 23 |
23 | NaiPosTagger | 14 |