Top 19 speech-processing Open-Source Projects

speechbrain

26 7,914 9.8 Python

A PyTorch-based Speech Toolkit

Project mention: SpeechBrain 1.0: A free and open-source AI toolkit for all things speech | news.ycombinator.com | 2024-02-28

pyannote-audio

15 5,077 8.6 Jupyter Notebook

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Project mention: Open Source Libraries | /r/AudioAI | 2023-10-02

pyannote/pyannote-audio

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
torchscale

2 2,927 7.2 Python

Foundation Architecture for (M)LLMs

Project mention: Retentive Network: A Successor to Transformer Implemented in PyTorch | news.ycombinator.com | 2023-07-24

A retnet commit has now appeared in Microsoft's torchscale repo:
https://github.com/microsoft/torchscale/commit/bf65397b26469...

whisper-timestamped

2 1,513 8.1 Python

Multilingual Automatic Speech Recognition with word-level timestamps and confidence

Project mention: Show HN: AI Dub Tool I Made to Watch Foreign Language Videos with My 7-Year-Old | news.ycombinator.com | 2024-02-28

Yes. But Whisper's word-level timings are actually quite inaccurate out of the box. There are some Python libraries that mitigate that. I tested several of them. whisper-timestamped seems to be the best one. [0]
[0] https://github.com/linto-ai/whisper-timestamped

SincNet

3 1,097 0.0 Python

SincNet is a neural architecture for efficiently processing raw audio samples.
resemble-enhance

3 921 6.3 Python

AI powered speech denoising and enhancement

Project mention: Ask HN: Who is hiring? (February 2024) | news.ycombinator.com | 2024-02-01

Resemble AI | San Francisco Bay Area (office in Santa Clara, CA) | Full-Time | Full-Stack Engineer, Frontend Engineer, Product Designer
Resemble AI creates high-quality synthetic voices that capture human emotion. We're a venture-backed high-growth startup that's looking to shake up an entire industry with state of the art AI. Our product changes the way that thousands of brands, media companies, creative agencies, and game studios create speech content. We believe that to build an enticing product and solid team is by encouraging innovation is by enabling continuous education. That's why every Friday is a day that you can use to work on anything you want, Resemble-related or not.
Recently, we open sourced a state of the art speech enhancement model: https://github.com/resemble-ai/resemble-enhance
We're hiring for three roles:
Full Stack Engineer - Can you break the entire stack? You're the right person for this job. Work on our Rails app, with sprinkles of React, and Python for the deep learning. Everything is dockerized, and we use Kubernetes to deploy.
Frontend Engineer - We're hiring a Frontend Engineer proficient in React, TypeScript, and Ruby on Rails to shape our user experience. Join our team to develop user-friendly interfaces and collaborate on building exceptional web experiences.
Product Designer - As a Product Designer, you will lead the end-to-end design process, from concept to implementation, ensuring a seamless and delightful user experience. You will collaborate with cross-functional teams to define product vision, conduct user research, create visually compelling interfaces, and develop interactive prototypes.
If interested, reach out directly to me: zohaib [at] resemble.ai

voicefixer

2 913 5.4 Python

General Speech Restoration

Project mention: Linux Audio Noise suppression using deep filtering in Rust | news.ycombinator.com | 2023-06-06

SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Speech-Backbones

1 526 0.0 Jupyter Notebook

This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.
UniSpeech

1 389 4.5 Python

UniSpeech - Large Scale Self-Supervised Learning for Speech
Wave-U-Net-for-Speech-Enhancement

1 302 0.0 Python

Implement Wave-U-Net by PyTorch, and migrate it to the speech enhancement.
SPTK

1 210 8.0 C++

A suite of speech signal processing tools
whisper-auto-transcribe

8 195 6.1 Python

Auto transcribe tool based on whisper

Project mention: Using Whisper to transcribe the entire Forensic Files series | /r/DataHoarder | 2023-06-04

hifigan-denoiser

1 188 0.0 Python

HiFi-GAN: High Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks
NLP-Guide

2 66 3.5 Python

Natural Language Processing (NLP). Covering topics such as Tokenization, Part Of Speech tagging (POS), Machine translation, Named Entity Recognition (NER), Classification, and Sentiment analysis.
DiscordEarsBot

5 64 4.1 JavaScript

A speech-to-text framework and bot for Discord. Take control of your Discord server using speech and voice commands. Can also be useful for hearing impaired and deaf people. (by inevolin)

Project mention: Creating a live transcript bot using Vosk Ai | /r/Discordjs | 2023-05-27

I have been looking everywhere and having a lot of difficulties finding a solution so sorry if I am coming to the wrong place I am trying to create a discord bot that can transcript conversations live, I chose vosk because its an offline too, l but I am unsure of how to implement it in a live setting, I've seen it done in python and disc.js but I dunno...so to cover all bases here is what I have so far.

speech-emotion-recognition

1 17 1.8 Python

A program that uses neural networks to detect emotions from pre-recorded and real-time speech
speech-rate-meter

1 17 3.2 QML

The Speech Rate Meter (hereinafter SRM) software module is designed to measure a complex of characteristics of the tempo (rate) of oral speech.
speech-kit

1 6 0.0 JavaScript

Simplifying the Speech Synthesis and Speech Recognition engines for Javascript. Listen for commands and perform callback actions, make the browser speak and transcribe your speech!
awesome-self-supervised-speech-representation-learning

1 4 0.0

A comprehensive list of awesome self-supervised speech representation learning papers.
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

speech-processing related posts

AI Transcribing tool for video with two voices?

1 project | /r/ChatGPT | 22 Jun 2023
Recording or transcripts? How are you doing it?

1 project | /r/VTT | 30 Mar 2023
Show HN: PodText.ai – Search anything said on a podcast, Highlight text to play

4 projects | news.ycombinator.com | 9 Feb 2023
I wanted to use OpenAI's Whisper speech-to-text on my Mac without installing stuff in the Terminal so I made MacWhisper, a free Mac app to transcribe audio and video files for easy transcription and subtitle generation. Would love to hear some feedback on it!

2 projects | /r/apple | 1 Feb 2023
I won several speaker diarization challenges with pyannote.audio

1 project | news.ycombinator.com | 2 Dec 2022
Can Whisper differentiate between different voices?

1 project | /r/OpenAI | 16 Nov 2022
[D] Is there a way to distinguish different human voices from 1 audio file ?

2 projects | /r/MachineLearning | 3 Oct 2022
A note from our sponsor - InfluxDB
www.influxdata.com | 6 May 2024

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source speech-processing projects? This list will help you:

	Project	Stars
1	speechbrain	7,914
2	pyannote-audio	5,077
3	torchscale	2,927
4	whisper-timestamped	1,513
5	SincNet	1,097
6	resemble-enhance	921
7	voicefixer	913
8	Speech-Backbones	526
9	UniSpeech	389
10	Wave-U-Net-for-Speech-Enhancement	302
11	SPTK	210
12	whisper-auto-transcribe	195
13	hifigan-denoiser	188
14	NLP-Guide	66
15	DiscordEarsBot	64
16	speech-emotion-recognition	17
17	speech-rate-meter	17
18	speech-kit	6
19	awesome-self-supervised-speech-representation-learning	4