Top 10 Python text-to-audio Projects
-
Amphion
Amphion (/Γ¦mΛfaΙͺΙn/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
π Composed Image Retrieval π Intro to Multimodal LLama 3.2 π οΈ Multi Agent Concierge π» RAG with Langchain Granite, Milvus π«Ά Download content β Transformer Replacement? π€ vLLM for runing models π Amphion π Autogluon π Notebook LLama like Google's Notebook LLM π«Ά Monocle2ai for tracing GenAI app code LFA&D Project π€ Bee Agent Framework β LLama RFP Response βΆοΈ GenAI Script π½ Simular AI Agent S π¦Ύ DrawDB with AI β¨ Ollama with LLama 3.2 Vision!!!! Preview π Powerful RAG Checker π SQL Generator π» Role of LLMs π Document Extraction πΆοΈ Open Source Vector DB Reddit π The Practical Guide to Self Hosting LLM π¦Ύ Stagehand Controller πΆοΈ Understanding HNSWLIB π Best practices in RAG π» Enigma Agent π Langchain, Ollama, Phi3 for Function Calling π Compass Judger π Princeton NLP SimPO π Princeton NLP ProLong π Princeton NLP HELMET π§ Ollama Cheatsheet π Princeton NLP CopyCat π Princeton NLP Shp πΆοΈ Can LLM Solve Hard Github Issues π Enabling Large Language Models to Generate Text with Citations π Princeton NLP CharXiv π Awesome AI Agents List π¦Ύ Nomicβs Matryoshka text embedding model
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
-
-
StreamSpeech
StreamSpeech is an βAll in Oneβ seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
Has anyone had any luck with an offline, free, open-source real-time speech-to-speech translation app on under-powered devices (i.e., older smart phones)?
* https://github.com/ictnlp/StreamSpeech
* https://github.com/k2-fsa/sherpa-onnx
* https://github.com/openai/whisper
I'm looking for a simple app that can listen for English, translate into Korean (and other languages), then perform speech synthesis on the translation. Basically, a Babelfish that doesn't stick in the ear. Although real-time would be great, a 3- to 5-second delay is manageable.
RTranslator is awkward (couldn't get it to perform speech-to-speech using a single phone). 3PO sprouts errors like dandelions and requires an online connection.
Any suggestions?
-
nuwa-pytorch
Implementation of NΓWA, state of the art attention network for text to video synthesis, in Pytorch
-
Project mention: QA-MDT: Quality-Aware Masked Diffusion Transformer for Enhanced Music Generation | news.ycombinator.com | 2024-09-29
-
EzAudio
High-quality Text-to-Audio Generation with Efficient Diffusion Transformer (by haidog-yaqub)
π«Ά Building Resilient AI Infrastructure: Deep Dive Zilliz Cloud's New Production-Ready Features π Contributing to Open Source π οΈ Upcoming Data Engineering Best Practices for AI π Building Scalable Image Retrieval π« NASA and IBM Weather Model π Improve Rag with Knowledge Graphs π¦Ύ Leader π Evaluating RAG π Solid Data Curation π€ Sparse and Dense Embeddings π Cohere LLM University π’ DataFormer for Synthetic Data π’ PDF2Audio π Screenpipe π± Vector DB Bencmarks πΌ Extreme Quantization π’ AI Powered Question & Answering πββ¬ Building LLMS Stanford Class π New Python Web UI π Visualize RAG π Free Map Hosting π Pipefunc π₯οΈ The Pipe to extract π½ New Audio Model π§ Easy Milvus Schema Generation π½ Multimodal Models 72B π Fivetran + Milvus π£οΈ JSON Viewer π½ ONNX Runtime GenAI π LLM Explorer π¦Ύ Interesting Computer Vision Techniques π Build a model from embedding 𧩠Superchunk π½ LLM Eval - Salesforce π Small AMD Model π₯ Comfy UI π₯ Molmo is a family of open vision-language models developed by the Allen Institute for AI. Molmo models are trained on PixMo
-
word2wave
Word2Wave: a framework for generating short audio samples from a text prompt using WaveGAN and COALA.
-
-
soundstorm
Soundstorm is a cutting-edge AI-powered audio manipulation application designed to provide a rich yet simplified experience for sound designers, algorithmic composers, and experimental audio enthusiasts. From sample pack creation and algorithmic composition to AI text-to-audio and onscreen ChatGPT, Soundstorm is a sonic powerhouse.
Python text-to-audio discussion
Index
What are some of the best open-source text-to-audio projects in Python? This list will help you:
# | Project | Stars |
---|---|---|
1 | Amphion | 8,054 |
2 | tango | 1,119 |
3 | audio-webui | 1,110 |
4 | StreamSpeech | 991 |
5 | nuwa-pytorch | 546 |
6 | OpenMusic | 521 |
7 | EzAudio | 252 |
8 | word2wave | 119 |
9 | ai-text-to-audio-latent-diffusion | 34 |
10 | soundstorm | 30 |