Open Source Libraries

This page summarizes the projects mentioned and recommended in the original post on /r/AudioAI

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • TTS

    πŸΈπŸ’¬ - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

  • coqui-ai/TTS

  • tortoise-tts

    A multi-voice TTS system trained with an emphasis on quality

  • neonbjb/tortoise-tts

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • bark

    πŸ”Š Text-Prompted Generative Audio Model

  • suno-ai/bark

  • piper

    A fast, local neural text to speech system (by rhasspy)

  • rhasspy/piper

  • Matcha-TTS

    [ICASSP 2024] 🍡 Matcha-TTS: A fast TTS architecture with conditional flow matching

  • shivammehta25/Matcha-TTS

  • whisper

    Robust Speech Recognition via Large-Scale Weak Supervision

  • openai/whisper

  • whisper.cpp

    Port of OpenAI's Whisper model in C/C++

  • ggerganov/whisper.cpp

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • faster-whisper

    Faster Whisper transcription with CTranslate2

  • guillaumekln/faster-whisper

  • wenet

    Production First and Production Ready End-to-End Speech Recognition Toolkit

  • wenet-e2e/wenet

  • seamless_communication

    Foundational Models for State-of-the-Art Speech and Text Translation

  • facebookresearch/seamless_communication: Speech translation

  • pyannote-audio

    Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

  • pyannote/pyannote-audio

  • PaddleSpeech

    Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

  • PaddlePaddle/PaddleSpeech

  • audio-webui

    A webui for different audio related Neural Networks

  • gitmylo/audio-webui

  • audiocraft

    Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

  • facebookresearch/audiocraft/MUSICGEN: Music Generation

  • jukebox

    Code for the paper "Jukebox: A Generative Model for Music"

  • openai/jukebox: Music Generation

  • Retrieval-based-Voice-Conversion-WebUI

    Voice data <= 10 mins can also be used to train a good VC model!

  • RVC-Project/Retrieval-based-Voice-Conversion-WebUI: Singing Voice Conversion

  • fish-diffusion

    An easy to understand TTS / SVS / SVC framework

  • fishaudio/fish-diffusion: Singing Voice Conversion

  • demucs

    Code for the paper Hybrid Spectrogram and Waveform Source Separation, but the goddamm motherfucker doesn't work.

  • facebookresearch/demucs: Stem seperation

  • ultimatevocalremovergui

    GUI for a Vocal Remover that uses Deep Neural Networks.

  • Anjok07/UltimateVocalRemoverGUI: Vocal isolation

  • DeepFilterNet

    Noise supression using deep filtering

  • Rikorose/DeepFilterNet: A Low Complexity Speech Enhancement Framework for Full-Band Audio (48kHz) using on Deep Filtering

  • PiDTLN

    Apply machine learning model DTLN for noise suppression and acoustic echo cancellation on Raspberry Pi

  • SaneBow/PiDTLN: DTLN model for noise suppression and acoustic echo cancellation on Raspberry Pi

  • versatile_audio_super_resolution

    Versatile audio super resolution (any -> 48kHz) with AudioSR.

  • haoheliu/versatile_audio_super_resolution: any -> 48kHz high fidelity Enhancer

  • basic-pitch

    A lightweight yet powerful audio-to-MIDI converter with pitch bend detection

  • spotify/basic-pitch: Audio to midi converter

  • pedalboard

    πŸŽ› πŸ”Š A Python library for audio.

  • spotify/pedalboard: audio effects for Python and TensorFlow

  • librosa

    Python library for audio and music analysis

  • librosa/librosa: Python library for audio and music analysis

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts