Python Speech

Open-source Python projects categorized as Speech

Top 23 Python Speech Projects

  • MockingBird

    🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time

  • TTS

    🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

    Project mention: OpenAI deems its voice cloning tool too risky for general release | news.ycombinator.com | 2024-03-31

    lol this marketing technique is getting very old. https://github.com/coqui-ai/TTS is already amazing and open source.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

  • datasets

    🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools

    Project mention: 🐍🐍 23 issues to grow yourself as an exceptional open-source Python expert 🧑‍💻 🥇 | dev.to | 2023-10-19
  • whisperX

    WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

    Project mention: Easy video transcription and subtitling with Whisper, FFmpeg, and Python | news.ycombinator.com | 2024-04-06

    It uses this, which does support diarization: https://github.com/m-bain/whisperX

  • EmotiVoice

    EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

    Project mention: FLaNK Stack Weekly 12 February 2024 | dev.to | 2024-02-12
  • modelscope

    ModelScope: bring the notion of Model-as-a-Service to life.

    Project mention: FLaNK Stack Weekly for 20 June 2023 | dev.to | 2023-06-20

    Model as a Service https://github.com/modelscope/modelscope

  • lingvo

    Lingvo

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

  • aeneas

    aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)

  • gTTS

    Python library and CLI tool to interface with Google Translate's text-to-speech API

    Project mention: Using Groq to Build a Real-Time Language Translation App | dev.to | 2024-04-05

    For our real-time TTS needs, we'll employ the fantastic library called gTTS.

  • DeepFilterNet

    Noise supression using deep filtering

    Project mention: Anyone know of a good TTS pipeline for raw speech data? | /r/AudioAI | 2023-10-03

    You mean remove background noise and transcribe? Then you can use DeepFilterNet to remove noise, and Whisper to transcribe.

  • whisper-timestamped

    Multilingual Automatic Speech Recognition with word-level timestamps and confidence

    Project mention: Show HN: AI Dub Tool I Made to Watch Foreign Language Videos with My 7-Year-Old | news.ycombinator.com | 2024-02-28

    Yes. But Whisper's word-level timings are actually quite inaccurate out of the box. There are some Python libraries that mitigate that. I tested several of them. whisper-timestamped seems to be the best one. [0]

    [0] https://github.com/linto-ai/whisper-timestamped

  • dc_tts

    A TensorFlow Implementation of DC-TTS: yet another text-to-speech model

  • pykaldi

    A Python wrapper for Kaldi

  • NATSpeech

    A Non-Autoregressive Text-to-Speech (NAR-TTS) framework, including official PyTorch implementation of PortaSpeech (NeurIPS 2021) and DiffSpeech (AAAI 2022)

  • voicefixer

    General Speech Restoration

    Project mention: Linux Audio Noise suppression using deep filtering in Rust | news.ycombinator.com | 2023-06-06
  • lhotse

    Tools for handling speech data in machine learning projects.

    Project mention: Does anyone else find lhotse a pain to use | /r/speechtech | 2023-06-14
  • SALMONN

    SALMONN: Speech Audio Language Music Open Neural Network

    Project mention: Comparing Humans, GPT-4, and GPT-4V on Abstraction and Reasoning Tasks | news.ycombinator.com | 2023-11-19

    > In other words, if you express a problem in a more complicated space (e.g. a visual problem, or an abstract algebra problem), you will not be able to solve it in the smaller token space, there's not enough information

    You're aware multimodel transformers do exactly this?

    https://github.com/bytedance/SALMONN

  • diffwave

    DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.

  • inaSpeechSegmenter

    CNN-based audio segmentation toolkit. Allows to detect speech, music, noise and speaker gender. Has been designed for large scale gender equality studies based on speech time per gender.

    Project mention: Listen to HD radio with a $30 RTL SDR dongle | news.ycombinator.com | 2023-11-05

    I have a little hobby project where I record an FM radio music station using a SDR and then remove all the non-music portions for offline listening. I like the music selections the DJs pick, but I prefer not to listen to the DJ commentary and the advertisements.

    I evaluated three methods of recording: analog capture from a standalone FM receiver, using this nrsc5 library to record the "HD" radio stream, and using an AirSpy SDR with this library: https://github.com/jj1bdx/airspy-fmradion

    Recording the "HD" (what a misnomer) radio was nice in that there was no hiss or multipath effects, but in comparison to the other methods the digital compression artifacts became impossible to un-hear. It seems to top out at about 96 kbps

    The airspy-fmradion library has some nice stuff in it to address multipath, resulting in the best audio quality of the three methods I tested.

    I use https://github.com/ina-foss/inaSpeechSegmenter to identify which segments of the recordings are speech vs. music.

  • Speech-enhancement

    Deep learning for audio denoising

  • allosaurus

    Allosaurus is a pretrained universal phone recognizer for more than 2000 languages

  • StarGANv2-VC

    StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion

  • UniSpeech

    UniSpeech - Large Scale Self-Supervised Learning for Speech

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-04-06.

Python Speech related posts

Index

What are some of the best open-source Speech projects in Python? This list will help you:

Project Stars
1 MockingBird 33,736
2 TTS 28,959
3 datasets 18,345
4 whisperX 8,869
5 EmotiVoice 6,234
6 modelscope 5,984
7 lingvo 2,781
8 aeneas 2,379
9 gTTS 2,133
10 DeepFilterNet 1,886
11 whisper-timestamped 1,481
12 dc_tts 1,150
13 pykaldi 977
14 NATSpeech 944
15 voicefixer 896
16 lhotse 861
17 SALMONN 786
18 diffwave 720
19 inaSpeechSegmenter 692
20 Speech-enhancement 583
21 allosaurus 502
22 StarGANv2-VC 454
23 UniSpeech 387
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com