audio-generation

Top 12 audio-generation Open-Source Projects

  • LocalAI

    :robot: The free, Open Source OpenAI alternative. Self-hosted, community-driven and local-first. Drop-in replacement for OpenAI running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. It allows to generate Text, Audio, Video, Images. Also with voice cloning capabilities.

    Project mention: Drop-In Replacement for ChatGPT API | news.ycombinator.com | 2024-01-24
  • Amphion

    Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

    Project mention: FLaNK Stack Weekly 11 Dec 2023 | dev.to | 2023-12-11
  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

  • AudioLDM

    AudioLDM: Generate speech, sound effects, music and beyond, with text.

    Project mention: Want to know if there's an ai for text (prompt) to sound effects like stable diffusion | /r/StableDiffusion | 2023-05-19
  • audio-ai-timeline

    A timeline of the latest AI models for audio generation, starting in 2023!

  • audio-diffusion-pytorch

    Audio generation using diffusion models, in PyTorch.

  • tts-generation-webui

    TTS Generation Web UI (Bark, MusicGen + AudioGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, MAGNet, StyleTTS2, MMS)

    Project mention: [D] Open-source SOTA Audio-to-Audio: how do I sound like a famous actor? | /r/MachineLearning | 2023-10-27

    I'd use the TTS web UI with RVC. I'll link to the UI. If you are looking for an individual project, you should check in the Readme for RVC. https://github.com/rsxdalv/tts-generation-webui

  • soundstorm-pytorch

    Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch

    Project mention: Meta introduces Voicebox: state-of-the-art generative AI model for speech | news.ycombinator.com | 2023-06-19

    got a response here https://github.com/lucidrains/soundstorm-pytorch/discussions...

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

  • tango

    Codes and Model of the paper "Text-to-Audio Generation using Instruction Tuned LLM and Latent Diffusion Model" (by declare-lab)

    Project mention: [Research] [Project] Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model | /r/MachineLearning | 2023-05-04

    Found relevant code at https://github.com/declare-lab/tango + all code implementations here

  • SpecVQGAN

    Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)

    Project mention: Text-to-Audio Generation Using Instruction Tuned LLM and Latent Diffusion Model | news.ycombinator.com | 2023-04-28

    Excellent. Some of the theory here goes back to Oct/2021 and beyond [1].

    The riffusion.com [2] guys made this practical. Also, my video of high-level overview and examples [3].

    1. SpecVQGAN: https://github.com/v-iashin/SpecVQGAN

    2. Riffusion: ://www.riffusion.com/

    3. Riffusion high-level overview: https://youtu.be/olkLVGcvib8

  • modular-diffusion

    Python library for designing and training your own Diffusion Models with PyTorch.

    Project mention: I Built a Modular Python Library for Designing and Training Diffusion Models from Scratch | /r/SideProject | 2023-09-06

    Last week, I released a project I've been working on for months: Modular Diffusion. It's a modular Python library for designing and training your own Diffusion Models in just a few lines of code. I also wrote a documentation page. The project has already gotten some great community feedback and I'm hoping you guys like it too!

  • word2wave

    Word2Wave: a framework for generating short audio samples from a text prompt using WaveGAN and COALA.

  • bark-speaker-directory

    Site for sharing Bark voices

    Project mention: 😋 AGI (bark 🐶) Smart waitress 🎙️ | dev.to | 2023-06-01

    🎙️ rsxdalv/bark-speaker-directory

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-01-24.

audio-generation related posts

Index

What are some of the best open-source audio-generation projects? This list will help you:

Project Stars
1 LocalAI 18,205
2 Amphion 3,715
3 AudioLDM 2,179
4 audio-ai-timeline 1,852
5 audio-diffusion-pytorch 1,748
6 tts-generation-webui 1,165
7 soundstorm-pytorch 1,099
8 tango 864
9 SpecVQGAN 313
10 modular-diffusion 251
11 word2wave 116
12 bark-speaker-directory 45
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com