tatoeba2 VS gutensearch

Compare tatoeba2 vs gutensearch and see what are their differences.

tatoeba2

Tatoeba is a platform whose purpose is to create a collaborative and open dataset of sentences and their translations. (by Tatoeba)

gutensearch

Search engine for Project Gutenberg books (by cordb)
Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
tatoeba2 gutensearch
47 1
667 6
2.5% -
0.0 0.0
4 days ago about 3 years ago
PHP Python
GNU Affero General Public License v3.0 -
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

tatoeba2

Posts with mentions or reviews of tatoeba2. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-04-25.
  • The AI Revolution Is Crushing Thousands of Languages
    2 projects | news.ycombinator.com | 25 Apr 2024
    Alternate take, it can also help people learn niche languages if native speakers contribute to data sets. For example, I've been using Clozemaster for the past few months as a way to work on vocabulary on some languages, and they pull their dataset from Tatoeba [1]. I was very surprised to see that my father's native language, Kabylie, which is admittedly a somewhat niche language, is one of the top languages by sentence contribution in the dataset (over 700k entries, more than French or Spanish or German). I showed him the sentences once and he confirmed that yes, they all seem like what a native speaker would say. Not all of them have translations into other languages of course, and a lot of them are slight variations on each other, but some native speakers are there contributing. It's not currently an option to use in Clozemaster -- I'm guessing the TTS isn't really there -- but I totally could see these as gaps that are easily filled.

    Same with my wife's native language (Bengali). There are surprisingly few language learning resources for Bangla, even though it's the 7th most spoken language in the world. But there it is in the data set with TTS and the ability for Clozemaster to have ChatGPT "explain" what's going on in the sentence (a very useful feature for new speakers).

    Anyway, I don't view AI as good or bad, just another tool that we should be intentional about when we cultivate the data sets underlying the tool.

    [1] https://tatoeba.org

  • Where can I find reliable example sentences?
    1 project | /r/LearnJapanese | 27 May 2023
    Maybe on tatoeba.org with filters
  • Best vocab (not writing) app
    2 projects | /r/learnchinese | 10 May 2023
    I use both. I make a lot of my own cards so I get to focus on the vocab I want. Generally find a word I want to learn, use https://forvo.com/ to find native audio for it, then use https://tatoeba.org/ to find sentences use that word. Once you get a bit of practise it's pretty quick to make a word note, then make 2 or 3 sentence notes for it*. However I do use some pre-made decks like this set of sentence decks for each HSK level with native audio: https://ankiweb.net/shared/byauthor/933449107
  • Anyone else spend heaps of time searching for sentences for Anki?
    1 project | /r/languagelearning | 21 Mar 2023
    You can try tatoeba https://tatoeba.org but I don't know if it's good with arabic ...
  • GPT-4's toki pona capabilities
    1 project | /r/tokipona | 20 Mar 2023
    tatoeba if anything because that has sentences so at least a modicum of context
  • Is there an app or website where I can paste a word/phrase and get examples of how it’s used in a sentence?
    1 project | /r/languagelearning | 23 Feb 2023
    I use Tatoeba https://tatoeba.org : it's a collection of phrases with sometimes translations and audio recordings. You could use Forvo but it's only audio recordings.
  • maneiras de falar "no pasa nada / it's okay/all right" em BR-PT?
    1 project | /r/Portuguese | 21 Jan 2023
  • How do I get audio data from from native speakers for Anki?
    2 projects | /r/languagelearning | 16 Jan 2023
  • anyone know a site like Reverso but for simpler sentences?
    2 projects | /r/languagelearning | 10 Jan 2023
    As someone else suggested, Tatoeba is also a good option. Nowadays, I use it less and less because I prefer the more didactic sentences found on online dictionaries. Nonetheless, it's still very good, especially due to the sheer quantity of sentences you can find there.
  • Nihongo Lessons has launched on the App Store
    1 project | /r/nihongoapp | 1 Dec 2022
    Appearances in the Tatoeba example sentence database.

gutensearch

Posts with mentions or reviews of gutensearch. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2021-01-24.
  • Show HN: Full text search Project Gutenberg (60m paragraphs)
    5 projects | news.ycombinator.com | 24 Jan 2021
    Thanks! I had the exact same problem and eventually it got me to do something about it. It is particularly bad with writers from antiquity or with a lot of popular appeal.

    I've begun adding to this repository, it'll come in piece by piece as I clean up the code: https://github.com/cordb/gutensearch

What are some alternatives?

When comparing tatoeba2 and gutensearch you can also consider the following projects:

river-runner - Uses USGS/MERIT Basin data to visualize the path of a rain droplet to its endpoint.

recoll - recoll with webui in a docker container

FrequencyWords - Repository for Frequency Word List Generator and processed files

react-virtualized - React components for efficiently rendering large lists and tabular data

rum - Simple, decomplected, isomorphic HTML UI library for Clojure and ClojureScript