tortoise-tts-modal-api
unilm
tortoise-tts-modal-api | unilm | |
---|---|---|
6 | 44 | |
111 | 18,915 | |
0.9% | 3.2% | |
0.0 | 9.0 | |
9 months ago | 10 days ago | |
Python | Python | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
tortoise-tts-modal-api
-
[D] Any model like VALL-E available currently?
We put up a free, open source API for tortoise recently, will be adding improvements to this over time & appreciate contributions: https://github.com/metavoicexyz/tortoise-tts-modal-api
- Open-source, serverless text-to-speech
- Open-source serverless text-to-speech
-
[P] Built an at-cost, pay per second, open-source API for Tortoise text-to-speech (best I've heard!)
It can be used via a UI on: https://tts.themetavoice.xyz
-
Show HN: Tortoise TTS as an at-cost open-source pay-per-second API
Tortoise TTS is the best TTS available today. We built an open-source, at cost, pay per second API for it. The quality of intonation it generates is unparalleled, and we hope our at-cost API will make it easier for people to build on top!
This allows folks to run via a single API call - it costs $0.03/query. The WAV file is downloadable, we apply no restrictions.
We're open-sourcing all our work — we made Tortoise run 30% faster, and have more improvements coming. If you're keen to contribute we can help with ideas, pointers, compute and data; just DM us. Our fork with the improvements can be found at https://github.com/metavoicexyz/tortoise-tts. The deployment code can be found at https://github.com/metavoicexyz/tortoise-tts-modal-api.
There are already great alternatives for using : i) @mdnest_r's awesome Huggingface Spaces, ii) original Google Colab, iii) host it yourself. Our work should accelerate those who need an API, don't want to spend time/$ hosting and need a scalable infra backing them.
We're especially excited about combining text-to-speech with content generated from LLMs, and about how it fits into video creation tools.
Tortoise in its current form is also inaccessible to non-technical users, which is why we are also providing a simple UI on top (also "at-cost"): https://tts.themetavoice.xyz
To use, generate an API key on https://tts.themetavoice.xyz and call via POST request. Or use the web UI. Or run your own deployment.
unilm
-
A Picture Is Worth 170 Tokens: How Does GPT-4o Encode Images?
Has anyone tried Kosmos [0] ? I came across it the other day and it looked shiny and interesting, but I haven't had a chance to put it to the test much yet.
[0] - https://github.com/microsoft/unilm/tree/master/kosmos-2.5
- Kosmos-2.5: A Multimodal Literate Model
- 1-Bit LLMs Could Solve AI's Energy Demands
- GPUs Go Brrr
- The Era of 1-Bit LLMs: Training_Tips, Code And_FAQ [pdf]
- The Era of 1-Bit LLMs: Training Tips, Code and FAQ
-
The Era of 1-bit LLMs: ternary parameters for cost-effective computing
+1 On this, the real proof would have been testing both models side-by-side.
It seems that it may be published on GitHub [1] according to HuggingFace [2].
[1] https://github.com/microsoft/unilm/tree/master/bitnet
[2] https://huggingface.co/papers/2402.17764
- I'm an Old Fart and AI Makes Me Sad
-
On building a semantic search engine
e5-mistral is essentially a distillation from gpt-4 to a smaller model. You can see here https://github.com/microsoft/unilm/blob/16da2f193b9c1dab0a69...
they actually have custom prompts for each dataset being tested.
Question would be, if you haven't seen the task before, what is a good prompt to prepend for your task?
IMO e5-mistral is overfit to MTEB
-
Leveraging GPT-4 for PDF Data Extraction: A Comprehensive Guide
Layout LM v1, v2 and v3 models [ Github ] DocBERT [ Github ]
What are some alternatives?
tortoise-tts - A multi-voice TTS system trained with an emphasis on quality
transformers - 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
tortoise-tts-Windows - A multi-voice TTS system trained with an emphasis on quality
ERNIE - Official implementations for various pre-training models of ERNIE-family, covering topics of Language Understanding & Generation, Multimodal Understanding & Generation, and beyond.
MockingBird - 🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
involution - [CVPR 2021] Involution: Inverting the Inherence of Convolution for Visual Recognition, a brand new neural operator
gensim - Topic Modelling for Humans
maelstrom - A workbench for writing toy implementations of distributed systems.
rasa - 💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
memprompt - A method to fix GPT-3 after deployment with user feedback, without re-training.
LongNet - Implementation of plug in and play Attention from "LongNet: Scaling Transformers to 1,000,000,000 Tokens"
NExT-GPT - Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model