Top 8 Python voice-activity-detection Projects
-
FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Project mention: Omni SenseVoice: High-Speed Speech Recognition with Words Timestamps | news.ycombinator.com | 2024-10-12Apparently not. See https://github.com/lifeiteng/OmniSenseVoice/blob/main/src/om.... See also FunASR running SenseVoice but using Kaldi for speaker identification https://github.com/modelscope/FunASR/blob/cd684580991661b9a0...
-
Stream
Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.
-
Project mention: Ten years after the last release, Aegisub 3.4.0 released | news.ycombinator.com | 2024-12-21
Aegis is great for authoring new subtitles but if you're just looking to sync then take a look at https://github.com/smacke/ffsubsync
Plex also recently added auto-sync subtitles to the Plex Pass
https://support.plex.tv/articles/auto-sync-subtitles/
-
Project mention: AI Voice Agents: Opensource, Pre-Trained Voice Activity Detector | news.ycombinator.com | 2024-07-28
-
Project mention: Ask HN: What is the current state of the art for transcribing with diarization? | news.ycombinator.com | 2024-10-18
Why do you think the space is stalled? There are quite a few apps in that space. https://github.com/juanmc2005/diart
-
-
inaSpeechSegmenter
CNN-based audio segmentation toolkit. Allows to detect speech, music, noise and speaker gender. Has been designed for large scale gender equality studies based on speech time per gender.
-
subaligner
Automatically synchronize and translate subtitles, or create new ones by transcribing, using pre-trained DNNs, Forced Alignments and Transformers. https://subaligner.readthedocs.io/
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
I've used https://github.com/tomchang25/whisper-auto-transcribe to generate subtitles and then translate them to English and it worked fairly well. It's not professional-level, but it was good enough to understand what they were saying and enjoy foreign TV.
Python voice-activity-detection discussion
Index
What are some of the best open-source voice-activity-detection projects in Python? This list will help you:
# | Project | Stars |
---|---|---|
1 | FunASR | 11,452 |
2 | ffsubsync | 7,239 |
3 | silero-vad | 6,295 |
4 | diart | 1,361 |
5 | Python-ai-assistant | 984 |
6 | inaSpeechSegmenter | 818 |
7 | subaligner | 479 |
8 | whisper-auto-transcribe | 225 |