Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 23 Asr Open-Source Projects
-
PaddleSpeech
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
-
NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
vosk-api
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
-
silero-models
Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
youtube-transcript-api
This is a python API which allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require an API key nor a headless browser, like other selenium based solutions do!
-
STT
🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.
-
whisper-timestamped
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
-
athena
an open-source implementation of sequence-to-sequence based speech processing engine (by athena-team)
-
whisper.api
This project provides an API with user level access support to transcribe speech to text using a finetuned and processed Whisper ASR model.
-
whisper-standalone-win
Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.
-
AutoSub
A CLI script to generate subtitle files (SRT/VTT/TXT) for any video using either DeepSpeech or Coqui (by abhirooptalasila)
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
PaddlePaddle/PaddleSpeech
Project mention: [P] Making a TTS voice, HK-47 from Kotor using Tortoise (Ideally WaveRNN) | /r/MachineLearning | 2023-07-06I don't test WaveRNN but from the ones that I know the best that is open source is FastPitch. And it's easy to use, here is the tutorial for voice cloning.
Project mention: Easy video transcription and subtitling with Whisper, FFmpeg, and Python | news.ycombinator.com | 2024-04-06It uses this, which does support diarization: https://github.com/m-bain/whisperX
Project mention: SpeechBrain 1.0: A free and open-source AI toolkit for all things speech | news.ycombinator.com | 2024-02-28
Project mention: Weird A.I. Yankovic, a cursed deep dive into the world of voice cloning | news.ycombinator.com | 2023-10-02I doubt it's currently actually "the best open source text to speech", but the answer I came up with when throwing a couple of hours at the problem some months ago was "Silero" [0, 1].
Following the "standalone" guide [2], it was pretty trivial to make the model render my sample text in about 100 English "voices" (many of which were similar to each other, and in varying quality). Sampling those, I got about 10 that were pretty "good". And maybe 6 that were the "best ones" (pretty natural, not annoying to listen to).
IIRC the license was free for noncommercial use only. I'm not sure exactly "how open source" they are, but it was simple to install the dependencies and write the basic Python to try it out; I had to write a for loop to try all the voices like I wanted. I ended using something else for the project for other reasons, but this could still be fairly good backup option for some use cases IMO.
[0] https://github.com/snakers4/silero-models#text-to-speech
wenet-e2e/wenet
https://github.com/MahmoudAshraf97/whisper-diarization
This project has been alright for transcribing audio with speaker diarization. A big finicky. The OpenAI model is better than other paid products(Descript, Riverside) so I’m looking forward to trying MacWhisper.
Project mention: Show HN: AI Dub Tool I Made to Watch Foreign Language Videos with My 7-Year-Old | news.ycombinator.com | 2024-02-28Yes. But Whisper's word-level timings are actually quite inaccurate out of the box. There are some Python libraries that mitigate that. I tested several of them. whisper-timestamped seems to be the best one. [0]
[0] https://github.com/linto-ai/whisper-timestamped
On the other hand, if you need subtitles for a movie that doesn't have some. There are some automated solutions like Whisper that can do a very decent job in most cases : https://github.com/Purfview/whisper-standalone-win
These will be 3-5 hour recordings of 4-5 people. I plan to use https://github.com/yinruiqing/pyannote-whisper to generate the transcript from the recording.
Asr related posts
-
VOSK Offline Speech Recognition API
-
Easy video transcription and subtitling with Whisper, FFmpeg, and Python
-
SOTA ASR Tooling: Long-Form Transcription
-
Deploying whisperX on AWS SageMaker as Asynchronous Endpoint
-
Show HN: AI Dub Tool I Made to Watch Foreign Language Videos with My 7-Year-Old
-
Do you know any quality FastAPI starter projects?
-
Weird A.I. Yankovic, a cursed deep dive into the world of voice cloning
-
A note from our sponsor - InfluxDB
www.influxdata.com | 5 May 2024
Index
What are some of the best open-source Asr projects? This list will help you:
Project | Stars | |
---|---|---|
1 | PaddleSpeech | 10,161 |
2 | NeMo | 10,128 |
3 | whisperX | 9,064 |
4 | speechbrain | 7,892 |
5 | vosk-api | 7,057 |
6 | silero-models | 4,569 |
7 | wenet | 3,699 |
8 | lingvo | 2,778 |
9 | youtube-transcript-api | 2,325 |
10 | STT | 2,144 |
11 | whisper-diarization | 2,019 |
12 | whisper-timestamped | 1,513 |
13 | SincNet | 1,097 |
14 | pykaldi | 978 |
15 | athena | 930 |
16 | vosk-server | 843 |
17 | whisper.api | 840 |
18 | whisper-standalone-win | 781 |
19 | vosk-android-demo | 677 |
20 | AutoSub | 556 |
21 | cheetah | 555 |
22 | pyannote-whisper | 421 |
23 | leopard | 408 |
Sponsored