Open Source Libraries

This page summarizes the projects mentioned and recommended in the original post on /r/AudioAI

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • TTS

    🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

    coqui-ai/TTS

  • tortoise-tts

    A multi-voice TTS system trained with an emphasis on quality

    neonbjb/tortoise-tts

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

  • bark

    🔊 Text-Prompted Generative Audio Model

    suno-ai/bark

  • piper

    A fast, local neural text to speech system (by rhasspy)

    rhasspy/piper

  • Matcha-TTS

    [ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching

    shivammehta25/Matcha-TTS

  • whisper

    Robust Speech Recognition via Large-Scale Weak Supervision

    openai/whisper

  • whisper.cpp

    Port of OpenAI's Whisper model in C/C++

    ggerganov/whisper.cpp

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

  • faster-whisper

    Faster Whisper transcription with CTranslate2

    guillaumekln/faster-whisper

  • wenet

    Production First and Production Ready End-to-End Speech Recognition Toolkit

    wenet-e2e/wenet

  • seamless_communication

    Foundational Models for State-of-the-Art Speech and Text Translation

    facebookresearch/seamless_communication: Speech translation

  • pyannote-audio

    Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

    pyannote/pyannote-audio

  • PaddleSpeech

    Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

    PaddlePaddle/PaddleSpeech

  • audio-webui

    A webui for different audio related Neural Networks

    gitmylo/audio-webui

  • audiocraft

    Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

    facebookresearch/audiocraft/MUSICGEN: Music Generation

  • jukebox

    Code for the paper "Jukebox: A Generative Model for Music"

    openai/jukebox: Music Generation

  • Retrieval-based-Voice-Conversion-WebUI

    Voice data <= 10 mins can also be used to train a good VC model!

    RVC-Project/Retrieval-based-Voice-Conversion-WebUI: Singing Voice Conversion

  • fish-diffusion

    An easy to understand TTS / SVS / SVC framework

    fishaudio/fish-diffusion: Singing Voice Conversion

  • demucs

    Code for the paper Hybrid Spectrogram and Waveform Source Separation, but the goddamm motherfucker doesn't work.

    facebookresearch/demucs: Stem seperation

  • ultimatevocalremovergui

    GUI for a Vocal Remover that uses Deep Neural Networks.

    Anjok07/UltimateVocalRemoverGUI: Vocal isolation

  • DeepFilterNet

    Noise supression using deep filtering

    Rikorose/DeepFilterNet: A Low Complexity Speech Enhancement Framework for Full-Band Audio (48kHz) using on Deep Filtering

  • PiDTLN

    Apply machine learning model DTLN for noise suppression and acoustic echo cancellation on Raspberry Pi

    SaneBow/PiDTLN: DTLN model for noise suppression and acoustic echo cancellation on Raspberry Pi

  • versatile_audio_super_resolution

    Versatile audio super resolution (any -> 48kHz) with AudioSR.

    haoheliu/versatile_audio_super_resolution: any -> 48kHz high fidelity Enhancer

  • basic-pitch

    A lightweight yet powerful audio-to-MIDI converter with pitch bend detection

    spotify/basic-pitch: Audio to midi converter

  • pedalboard

    🎛 🔊 A Python library for working with audio.

    spotify/pedalboard: audio effects for Python and TensorFlow

  • librosa

    Python library for audio and music analysis

    librosa/librosa: Python library for audio and music analysis

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts