speech-recognition

Open-source projects categorized as speech-recognition

Top 23 speech-recognition Open-Source Projects

  • transformers

    🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

  • Project mention: Maxtext: A simple, performant and scalable Jax LLM | news.ycombinator.com | 2024-04-23

    Is t5x an encoder/decoder architecture?

    Some more general options.

    The Flax ecosystem

    https://github.com/google/flax?tab=readme-ov-file

    or dm-haiku

    https://github.com/google-deepmind/dm-haiku

    were some of the best developed communities in the Jax AI field

    Perhaps the “trax” repo? https://github.com/google/trax

    Some HF examples https://github.com/huggingface/transformers/tree/main/exampl...

    Sadly it seems much of the work is proprietary these days, but one example could be Grok-1, if you customize the details. https://github.com/xai-org/grok-1/blob/main/run.py

  • whisper.cpp

    Port of OpenAI's Whisper model in C/C++

  • Project mention: Show HN: I created automatic subtitling app to boost short videos | news.ycombinator.com | 2024-04-09

    whisper.cpp [1] has a karaoke example that uses ffmpeg's drawtext filter to display rudimentary karaoke-like captions. It also supports diarisation. Perhaps it could be a starting point to create a better script that does what you need.

    --

    1: https://github.com/ggerganov/whisper.cpp/blob/master/README....

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • DeepSpeech

    DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

  • Project mention: Common Voice | news.ycombinator.com | 2023-12-05
  • Leon

    🧠 Leon is your open-source personal assistant.

  • Project mention: Rabbit R1, Designed by Teenage Engineering | news.ycombinator.com | 2024-01-09

    It's indeed suspicious. You're sending your voice samples, your various services accounts, your location and more private data to some proprietary black box in some public cloud. Sorry, but this is a privacy nightmare. It should be open source and self-hosted like Mycroft (https://mycroft.ai) or Leon (https://getleon.ai) to be trustworthy.

  • Kaldi Speech Recognition Toolkit

    kaldi-asr/kaldi is the official location of the Kaldi project.

  • Project mention: Amazon plans to charge for Alexa in June–unless internal conflict delays revamp | news.ycombinator.com | 2024-01-20

    Yeah, whisper is the closest thing we have, but even it requires more processing power than is present in most of these edge devices in order to feel smooth. I've started a voice interface project on a Raspberry Pi 4, and it takes about 3 seconds to produce a result. That's impressive, but not fast enough for Alexa.

    From what I gather a Pi 5 can do it in 1.5 seconds, which is closer, so I suspect it's only a matter of time before we do have fully local STT running directly on speakers.

    > Probably anathema to the space, but if the devices leaned into the ~five tasks people use them for (timers, weather, todo list?) could probably tighten up the AI models to be more accurate and/or resource efficient.

    Yes, this is the approach taken by a lot of streaming STT systems, like Kaldi [0]. Rather than use a fully capable model, you train a specialized one that knows what kinds of things people are likely to say to it.

    [0] http://kaldi-asr.org/

  • DeepLearningExamples

    State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.

  • deep-learning-drizzle

    Drench yourself in Deep Learning, Reinforcement Learning, Machine Learning, Computer Vision, and NLP by learning from these exciting lectures!!

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • PaddleSpeech

    Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

  • Project mention: Open Source Libraries | /r/AudioAI | 2023-10-02

    PaddlePaddle/PaddleSpeech

  • whisperX

    WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

  • Project mention: Easy video transcription and subtitling with Whisper, FFmpeg, and Python | news.ycombinator.com | 2024-04-06

    It uses this, which does support diarization: https://github.com/m-bain/whisperX

  • faster-whisper

    Faster Whisper transcription with CTranslate2

  • Project mention: Using Groq to Build a Real-Time Language Translation App | dev.to | 2024-04-05

    For our real-time STT needs, we'll employ a fantastic library called faster-whisper.

  • SpeechRecognition

    Speech recognition module for Python, supporting several engines and APIs, online and offline.

  • Project mention: help with script (beginner) | /r/learnpython | 2023-12-07

    Start and Stop Listening Example

  • espnet

    End-to-End Speech Processing Toolkit

  • Project mention: WhisperSpeech – An Open Source text-to-speech system built by inverting Whisper | news.ycombinator.com | 2024-01-17

    You might check out this list from espnet. They list the different corpuses they use to train their models sorted by language and task (ASR, TTS etc):

    https://github.com/espnet/espnet/blob/master/egs2/README.md

  • speechbrain

    A PyTorch-based Speech Toolkit

  • Project mention: SpeechBrain 1.0: A free and open-source AI toolkit for all things speech | news.ycombinator.com | 2024-02-28
  • vosk-api

    Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node

  • Project mention: VOSK Offline Speech Recognition API | news.ycombinator.com | 2024-04-13
  • annyang

    :speech_balloon: Speech recognition for your site

  • wav2letter

    Facebook AI Research's Automatic Speech Recognition Toolkit

  • openvino

    OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference

  • Project mention: FLaNK Stack 05 Feb 2024 | dev.to | 2024-02-05
  • silero-models

    Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple

  • Project mention: Weird A.I. Yankovic, a cursed deep dive into the world of voice cloning | news.ycombinator.com | 2023-10-02

    I doubt it's currently actually "the best open source text to speech", but the answer I came up with when throwing a couple of hours at the problem some months ago was "Silero" [0, 1].

    Following the "standalone" guide [2], it was pretty trivial to make the model render my sample text in about 100 English "voices" (many of which were similar to each other, and in varying quality). Sampling those, I got about 10 that were pretty "good". And maybe 6 that were the "best ones" (pretty natural, not annoying to listen to).

    IIRC the license was free for noncommercial use only. I'm not sure exactly "how open source" they are, but it was simple to install the dependencies and write the basic Python to try it out; I had to write a for loop to try all the voices like I wanted. I ended using something else for the project for other reasons, but this could still be fairly good backup option for some use cases IMO.

      [0] https://github.com/snakers4/silero-models#text-to-speech

  • whisper-jax

    JAX implementation of OpenAI's Whisper model for up to 70x speed-up on TPU.

  • Project mention: [D] What is the most efficient version of OpenAI Whisper? | /r/MachineLearning | 2023-07-12

    Whisper JAX: https://github.com/sanchit-gandhi/whisper-jax. Builds on the Hugging Face implementation. Written in JAX (instead of PyTorch), where you get a 10x or more speed-up if you run it on TPU v4 hardware (I've gotten up to 15x with large batch sizes for super long audio files). Overall, 70-100x faster than OpenAI if you run it on TPU v4

  • pocketsphinx

    A small speech recognizer

  • Project mention: [Discussion] Looking for an Open-Source Speech to Text model (english) that captures filler words, pauses and also records timestamps for each word. | /r/LocalLLaMA | 2023-07-07
  • wenet

    Production First and Production Ready End-to-End Speech Recognition Toolkit

  • Project mention: Open Source Libraries | /r/AudioAI | 2023-10-02

    wenet-e2e/wenet

  • Porcupine  

    On-device wake word detection powered by deep learning

  • FunASR

    A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models. |语音识别工具包,包含丰富的性能优越的开源预训练模型,支持语音识别、语音端点检测、文本后处理等,具备服务部署能力。

  • Project mention: FunASR: Fundamental End-to-End Speech Recognition Toolkit | news.ycombinator.com | 2024-01-13
  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

speech-recognition related posts

Index

What are some of the best open-source speech-recognition projects? This list will help you:

Project Stars
1 transformers 125,021
2 whisper.cpp 30,942
3 DeepSpeech 24,278
4 Leon 14,539
5 Kaldi Speech Recognition Toolkit 13,706
6 DeepLearningExamples 12,607
7 deep-learning-drizzle 11,749
8 PaddleSpeech 10,120
9 whisperX 8,965
10 faster-whisper 8,723
11 SpeechRecognition 8,040
12 espnet 7,872
13 speechbrain 7,869
14 vosk-api 7,025
15 annyang 6,547
16 wav2letter 6,331
17 openvino 5,911
18 silero-models 4,534
19 whisper-jax 4,077
20 pocketsphinx 3,736
21 wenet 3,691
22 Porcupine   3,424
23 FunASR 3,299

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com