Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 23 Linguistic Open-Source Projects
-
tatoeba2
Tatoeba is a platform whose purpose is to create a collaborative and open dataset of sentences and their translations.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
prosodic
Prosodic: a metrical-phonological parser, written in Python. For English and Finnish, with flexible language support.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
TextAnnotationGraphs
A modular annotation system that supports complex, interactive annotation graphs embedded on top of sequences of text.
-
langstats
A visual color bar of the programming languages in your directory, with percentages and labels
-
WonderfulPolishLanguage
This is a repository created for the list of resources for learning and exploring Wonderful Polish language.
-
syn
🌾 Get synonyms and antonyms of words from Thesaurus.com and other sources in your terminal, with rich output. (by agmmnn)
-
google-books-ngram-frequency
Word/n-gram frequency lists for the Google Books Ngram Corpus (v3, all languages) with Python code
-
top-open-subtitles-sentences
Most common sentences and words for all languages in the OpenSubtitles2018 corpus with Python code
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: The AI Revolution Is Crushing Thousands of Languages | news.ycombinator.com | 2024-04-25Alternate take, it can also help people learn niche languages if native speakers contribute to data sets. For example, I've been using Clozemaster for the past few months as a way to work on vocabulary on some languages, and they pull their dataset from Tatoeba [1]. I was very surprised to see that my father's native language, Kabylie, which is admittedly a somewhat niche language, is one of the top languages by sentence contribution in the dataset (over 700k entries, more than French or Spanish or German). I showed him the sentences once and he confirmed that yes, they all seem like what a native speaker would say. Not all of them have translations into other languages of course, and a lot of them are slight variations on each other, but some native speakers are there contributing. It's not currently an option to use in Clozemaster -- I'm guessing the TTS isn't really there -- but I totally could see these as gaps that are easily filled.
Same with my wife's native language (Bengali). There are surprisingly few language learning resources for Bangla, even though it's the 7th most spoken language in the world. But there it is in the data set with TTS and the ability for Clozemaster to have ChatGPT "explain" what's going on in the sentence (a very useful feature for new speakers).
Anyway, I don't view AI as good or bad, just another tool that we should be intentional about when we cultivate the data sets underlying the tool.
[1] https://tatoeba.org
Project mention: How to type Jyutcitzi? 【RIME keyboard installation manual】? | /r/CantoneseScriptReform | 2023-12-07Please follow instructions at https://github.com/rime/rime-cantonese/wiki and https://github.com/rime/rime-cantonese/wiki/新手安裝教程 In a nutshell, download and install using the following files: Mac: mac-2021.05.16-installer.pkg Windows: windows-sfx-2021.05.16-installer.exe Linux: Download and run ibus-install.sh Please check to ensure that RIME Cantonese is properly installed before proceeding to Step 3.
Very interesting they were funded from Kickstarter! 292 backers at 10k€. I assumed you needed quite the following for Kickstarter to work...
And it looks like they do. 49k followers on Facebook and 16k on Instagram. Not sure how far back these go, but looks like very "shareable" content, where they would take I translatable words and make little funny pictures or memes or other intriguing things and post them. Lots of interaction comments/reaction-wise
Timeline-wise this was backed on Kickstarter in 2020. Site launched in summer 2020. The creator was very active on Kickstarter working on communicating and updating the community with what was going on (until the end there).
Also seems to have a Patreon, and worked itself into other places like https://github.com/theimpossibleastronaut/awesome-linguistic...
Project mention: Does someone have a phonemic inventory of all the romance languages, a list of all the phonemes in all the romance languages ? | /r/linguistics | 2023-05-11Does the language you’re thinking of have an inventory on https://phoible.org/?
Project mention: The Theorist Who Sees Math in Art, Music and Writing | news.ycombinator.com | 2024-03-04>"Thousands of years ago in India, poets were trying to think about the possible meters. In Sanskrit poetry, you have long and short syllables. Long is twice as long as short. If you want to work out how many there are that take a length of time of three, you can have short, short, short, or long, short, or short, long. There are three ways to make three. There are five ways to make a length-four phrase. And there are eight ways to make a length-five phrase. This sequence you’re getting is one where every term is the sum of the previous two. You exactly reproduce what we nowadays call the Fibonacci sequence. But this was centuries before Fibonacci."
Related:
Ambuda: "Building the world's largest Sanskrit library":
https://ambuda.org/
Project mention: A colloquial (عامیانه) frequency list! Our prayers have been answered. | /r/farsi | 2023-08-03
Linguistics related posts
-
The AI Revolution Is Crushing Thousands of Languages
-
Untranslatable
-
A colloquial (عامیانه) frequency list! Our prayers have been answered.
-
Draw Syntactic Trees!
-
Seeking your insights on "Loquax": A tool for phonological analysis
-
Does someone have a phonemic inventory of all the romance languages, a list of all the phonemes in all the romance languages ?
-
Are there any websites gathering graded readers in different languages; either making it themselves or simply sharing sources. I’m specifically looking for fiction books in Polish A1/A2.
-
A note from our sponsor - InfluxDB
www.influxdata.com | 3 May 2024
Index
What are some of the best open-source Linguistic projects? This list will help you:
Project | Stars | |
---|---|---|
1 | tatoeba2 | 668 |
2 | ipa-dict | 498 |
3 | rime-cantonese | 494 |
4 | awesome-linguistics | 352 |
5 | wikipron | 289 |
6 | ichiran | 278 |
7 | prosodic | 268 |
8 | dev | 108 |
9 | TextAnnotationGraphs | 89 |
10 | odict | 80 |
11 | OpenGNT | 79 |
12 | ambuda | 79 |
13 | langstats | 61 |
14 | zeroshot_topics | 60 |
15 | tone | 52 |
16 | treebender | 39 |
17 | langua | 35 |
18 | WonderfulPolishLanguage | 34 |
19 | proiel-treebank | 33 |
20 | iso639 | 27 |
21 | syn | 26 |
22 | google-books-ngram-frequency | 28 |
23 | top-open-subtitles-sentences | 17 |
Sponsored