Python text-to-audio

Open-source Python projects categorized as text-to-audio

Top 10 Python text-to-audio Projects

text-to-audio
  1. Amphion

    Amphion (/Γ¦mˈfaΙͺΙ™n/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

    Project mention: AIM Weekly for 04Nov2024 | dev.to | 2024-11-04

    🌐 Composed Image Retrieval πŸ“Ž Intro to Multimodal LLama 3.2 πŸ› οΈ Multi Agent Concierge πŸ’» RAG with Langchain Granite, Milvus 🫢 Download content βœ… Transformer Replacement? πŸ€– vLLM for runing models 🌐 Amphion πŸ“ Autogluon πŸš™ Notebook LLama like Google's Notebook LLM 🫢 Monocle2ai for tracing GenAI app code LFA&D Project πŸ€– Bee Agent Framework βœ… LLama RFP Response ▢️ GenAI Script πŸ‘½ Simular AI Agent S 🦾 DrawDB with AI ✨ Ollama with LLama 3.2 Vision!!!! Preview πŸš• Powerful RAG Checker πŸ“Š SQL Generator πŸ’» Role of LLMs 🐍 Document Extraction πŸ•ΆοΈ Open Source Vector DB Reddit πŸ” The Practical Guide to Self Hosting LLM 🦾 Stagehand Controller πŸ•ΆοΈ Understanding HNSWLIB 🐍 Best practices in RAG πŸ’» Enigma Agent πŸ“ Langchain, Ollama, Phi3 for Function Calling πŸ”‹ Compass Judger πŸ“ Princeton NLP SimPO πŸ” Princeton NLP ProLong πŸ”‹ Princeton NLP HELMET 🧐 Ollama Cheatsheet πŸš• Princeton NLP CopyCat πŸ“Š Princeton NLP Shp πŸ•ΆοΈ Can LLM Solve Hard Github Issues πŸ“ Enabling Large Language Models to Generate Text with Citations πŸ”‹ Princeton NLP CharXiv πŸ“Š Awesome AI Agents List 🦾 Nomic’s Matryoshka text embedding model

  2. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  3. tango

    A family of diffusion models for text-to-audio generation. (by declare-lab)

  4. audio-webui

    A webui for different audio related Neural Networks

  5. StreamSpeech

    StreamSpeech is an β€œAll in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.

    Project mention: Ask HN: Real-time speech-to-speech translation | news.ycombinator.com | 2024-10-24

    Has anyone had any luck with an offline, free, open-source real-time speech-to-speech translation app on under-powered devices (i.e., older smart phones)?

    * https://github.com/ictnlp/StreamSpeech

    * https://github.com/k2-fsa/sherpa-onnx

    * https://github.com/openai/whisper

    I'm looking for a simple app that can listen for English, translate into Korean (and other languages), then perform speech synthesis on the translation. Basically, a Babelfish that doesn't stick in the ear. Although real-time would be great, a 3- to 5-second delay is manageable.

    RTranslator is awkward (couldn't get it to perform speech-to-speech using a single phone). 3PO sprouts errors like dandelions and requires an online connection.

    Any suggestions?

  6. nuwa-pytorch

    Implementation of NÜWA, state of the art attention network for text to video synthesis, in Pytorch

  7. OpenMusic

    OpenMusic: SOTA Text-to-music (TTM) Generation

    Project mention: QA-MDT: Quality-Aware Masked Diffusion Transformer for Enhanced Music Generation | news.ycombinator.com | 2024-09-29
  8. EzAudio

    High-quality Text-to-Audio Generation with Efficient Diffusion Transformer (by haidog-yaqub)

    Project mention: AIM Weekly for 07 OctΒ 2024 | dev.to | 2024-10-07

    🫢 Building Resilient AI Infrastructure: Deep Dive Zilliz Cloud's New Production-Ready Features πŸ™… Contributing to Open Source πŸ› οΈ Upcoming Data Engineering Best Practices for AI πŸ“ Building Scalable Image Retrieval πŸ’« NASA and IBM Weather Model πŸ™Œ Improve Rag with Knowledge Graphs 🦾 Leader πŸ“Ž Evaluating RAG πŸš™ Solid Data Curation πŸ€– Sparse and Dense Embeddings πŸ” Cohere LLM University πŸ“’ DataFormer for Synthetic Data πŸ“’ PDF2Audio πŸ“Š Screenpipe πŸ“± Vector DB Bencmarks πŸ›Ό Extreme Quantization πŸ“’ AI Powered Question & Answering πŸˆβ€β¬› Building LLMS Stanford Class 🌐 New Python Web UI πŸ“Š Visualize RAG 🌐 Free Map Hosting πŸ“Š Pipefunc πŸ–₯️ The Pipe to extract πŸ‘½ New Audio Model 🧐 Easy Milvus Schema Generation πŸ‘½ Multimodal Models 72B 🌐 Fivetran + Milvus πŸ—£οΈ JSON Viewer πŸ‘½ ONNX Runtime GenAI πŸš™ LLM Explorer 🦾 Interesting Computer Vision Techniques πŸ“Š Build a model from embedding 🧩 Superchunk πŸ‘½ LLM Eval - Salesforce πŸ” Small AMD Model πŸ”₯ Comfy UI πŸ”₯ Molmo is a family of open vision-language models developed by the Allen Institute for AI. Molmo models are trained on PixMo

  9. word2wave

    Word2Wave: a framework for generating short audio samples from a text prompt using WaveGAN and COALA.

  10. ai-text-to-audio-latent-diffusion

    text-to-audio-latent-diffusion

  11. soundstorm

    Soundstorm is a cutting-edge AI-powered audio manipulation application designed to provide a rich yet simplified experience for sound designers, algorithmic composers, and experimental audio enthusiasts. From sample pack creation and algorithmic composition to AI text-to-audio and onscreen ChatGPT, Soundstorm is a sonic powerhouse.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python text-to-audio discussion

Log in or Post with

Index

What are some of the best open-source text-to-audio projects in Python? This list will help you:

# Project Stars
1 Amphion 8,054
2 tango 1,119
3 audio-webui 1,110
4 StreamSpeech 991
5 nuwa-pytorch 546
6 OpenMusic 521
7 EzAudio 252
8 word2wave 119
9 ai-text-to-audio-latent-diffusion 34
10 soundstorm 30

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com

Did you know that Python is
the 2nd most popular programming language
based on number of references?