Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 11 voice-activity-detection Open-Source Projects
-
Project mention: Ask HN: What are some unpopular technologies you wish people knew more about? | news.ycombinator.com | 2023-12-02
Noisetorch. https://github.com/noisetorch/NoiseTorch
-
Project mention: The GitHub Black Market That Helps Coders Cheat the Popularity Contest | news.ycombinator.com | 2023-10-23
> Another giveaway is the ratio of stars to watchers / forks. I remember one project with thousands of stars but only 10 users "watching" it. They went on to raise a sizable seed round too.
Not necessarily indicative of foul play. I have two projects like this (https://github.com/smacke/ffsubsync and https://github.com/ipyflow/ipyflow) and I attribute it to not having great developer documentation.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
pyannote-audio
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
pyannote/pyannote-audio
-
FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models. |语音识别工具包,包含丰富的性能优越的开源预训练模型,支持语音识别、语音端点检测、文本后处理等,具备服务部署能力。
Project mention: FunASR: Fundamental End-to-End Speech Recognition Toolkit | news.ycombinator.com | 2024-01-13 -
Project mention: New models and developer products announced at OpenAI DevDay | news.ycombinator.com | 2023-11-06
>How do you detect speech starting and stopping?
-
voice_datasets
🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).
-
Project mention: Jarvis: A Voice Virtual Assistant in Python (OpenAI, ElevenLabs, Deepgram) | news.ycombinator.com | 2023-12-18
There is another one (Also Jarvis) that's been around for a while and is more useful, wonder if they can combine forces? https://github.com/ggeop/Python-ai-assistant
Not sure if anyone has noticed but OpenAI now has a mobile app (I've been using the PWA all this time) and the voice assistant on there is really strong. Sounds good, fast, and seems to even run a pass on my voice before it submits the query.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
inaSpeechSegmenter
CNN-based audio segmentation toolkit. Allows to detect speech, music, noise and speaker gender. Has been designed for large scale gender equality studies based on speech time per gender.
I have a little hobby project where I record an FM radio music station using a SDR and then remove all the non-music portions for offline listening. I like the music selections the DJs pick, but I prefer not to listen to the DJ commentary and the advertisements.
I evaluated three methods of recording: analog capture from a standalone FM receiver, using this nrsc5 library to record the "HD" radio stream, and using an AirSpy SDR with this library: https://github.com/jj1bdx/airspy-fmradion
Recording the "HD" (what a misnomer) radio was nice in that there was no hiss or multipath effects, but in comparison to the other methods the digital compression artifacts became impossible to un-hear. It seems to top out at about 96 kbps
The airspy-fmradion library has some nice stuff in it to address multipath, resulting in the best audio quality of the three methods I tested.
I use https://github.com/ina-foss/inaSpeechSegmenter to identify which segments of the recordings are speech vs. music.
-
subaligner
Automatically synchronize and translate subtitles, or create new ones by transcribing, using pre-trained DNNs, Forced Alignments and Transformers. https://subaligner.readthedocs.io/
-
Project mention: Using Whisper to transcribe the entire Forensic Files series | /r/DataHoarder | 2023-06-04
-
android-vad
Android Voice Activity Detection (VAD) library. Supports WebRTC VAD GMM, Silero VAD DNN, Yamnet VAD DNN models.
voice-activity-detection related posts
- Audio crackling woes on Pop_OS 22.04
- Steam Deck's fan noises interfere with a built in mic
- Mic problems in game (Apex legends)
- FOSS open source version of adobe enhance - Enhance voice recordings
- Noise cancellation for linux
- Noisetorch becoming glitchy
- PSA, Discord GPU acceleration doesn't work correctly on Linux, here's how to properly enable it
-
A note from our sponsor - InfluxDB
www.influxdata.com | 18 Apr 2024
Index
What are some of the best open-source voice-activity-detection projects? This list will help you:
Project | Stars | |
---|---|---|
1 | NoiseTorch | 8,948 |
2 | ffsubsync | 6,478 |
3 | pyannote-audio | 4,930 |
4 | FunASR | 3,023 |
5 | silero-vad | 2,780 |
6 | voice_datasets | 1,525 |
7 | Python-ai-assistant | 852 |
8 | inaSpeechSegmenter | 692 |
9 | subaligner | 411 |
10 | whisper-auto-transcribe | 192 |
11 | android-vad | 185 |