The AI Revolution Is Crushing Thousands of Languages

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • tatoeba2

    Tatoeba is a platform whose purpose is to create a collaborative and open dataset of sentences and their translations.

  • Alternate take, it can also help people learn niche languages if native speakers contribute to data sets. For example, I've been using Clozemaster for the past few months as a way to work on vocabulary on some languages, and they pull their dataset from Tatoeba [1]. I was very surprised to see that my father's native language, Kabylie, which is admittedly a somewhat niche language, is one of the top languages by sentence contribution in the dataset (over 700k entries, more than French or Spanish or German). I showed him the sentences once and he confirmed that yes, they all seem like what a native speaker would say. Not all of them have translations into other languages of course, and a lot of them are slight variations on each other, but some native speakers are there contributing. It's not currently an option to use in Clozemaster -- I'm guessing the TTS isn't really there -- but I totally could see these as gaps that are easily filled.

    Same with my wife's native language (Bengali). There are surprisingly few language learning resources for Bangla, even though it's the 7th most spoken language in the world. But there it is in the data set with TTS and the ability for Clozemaster to have ChatGPT "explain" what's going on in the sentence (a very useful feature for new speakers).

    Anyway, I don't view AI as good or bad, just another tool that we should be intentional about when we cultivate the data sets underlying the tool.

    [1] https://tatoeba.org

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • Lobe Chat

    LobeChat is a open-source, extensible (Function Calling), high-performance chatbot framework.It supports one-click free deployment of your private ChatGPT/LLM web application.

  • Get your OpenAI API key and then use it on one of the hundreds of open source frontends available, such as: https://github.com/lobehub/lobe-chat

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts