StreamDiffusion
OpenVoice
StreamDiffusion | OpenVoice | |
---|---|---|
4 | 14 | |
8,969 | 24,948 | |
- | 34.1% | |
9.6 | 8.8 | |
18 days ago | 14 days ago | |
Python | Python | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
StreamDiffusion
- FLaNK Weekly 31 December 2023
-
StreamDiffusion: Over 100fps Stable Diffusion on a 4090
Everyone does warmup before you measure. But measuring isn't always done right because we actually measure the GPU time only but some people naively use CPU time which is problematic because the process is asynchrenous. They have a few timing scripts though and I'm away from my GPU. There are some interesting things but they look like they know how to time. But it can also get confusing because is it considering batches or not. Some works do batch some do single. Only problem is when it isn't communicated correctly or left ambiguous.
Their paper is ambiguous unfortunately. Abstract, intro, and conclusion suggests single image by motivating with sequential generation (specifically mentioning metaverse). Experiment section says
> We note that we evaluate the throughput mainly via the average inference time per image through processing 100 images.
That implies batch along with their name Stream Batch...
Looking at the code I'm a bit confused. I'm away from my GPU so can't run. Maybe someone can let me know? This block[0] measures correctly but is using a downloaded image? Then just opens the image in the preprocess? (multi looks identical) This block[1] is using CPU? But running CPU. (there's another like this)
So I'm quite a bit confused tbh.
[0] https://github.com/cumulo-autumn/StreamDiffusion/blob/03e2a7...
[1] https://github.com/cumulo-autumn/StreamDiffusion/blob/03e2a7...
- StreamDiffusion: A Pipeline-Level Solution for Real-Time Interactive Generation
OpenVoice
- OpenVoice: Instant Voice Cloning
- OpenVoice V2 Released
-
Ask HN: Voice ID adoption at financial institutions
Given the inevitability of easy voice cloning[1], it seems irresponsible to be using voice as a positive authentication signal.
Unfortunately, major US financial institutions seem to be ramping up adoption of this technology[2].
Am I missing something?
[1] https://github.com/myshell-ai/OpenVoice
-
OpenAI: Navigating the Challenges and Opportunities of Synthetic Voices
They might have been forced to give a signal after this rose on HN today:
https://research.myshell.ai/open-voice
https://news.ycombinator.com/item?id=39861578
- OpenVoice: Versatile Instant Voice Cloning
- FLaNK Weekly 31 December 2023
What are some alternatives?
generative-ai-python - The Gemini API Python SDK enables developers to use Google's state-of-the-art generative AI models to build AI-powered features and applications.
tortoise-tts - A multi-voice TTS system trained with an emphasis on quality
tbmk - A commands bookmark for terminal 🔖
piper - A fast, local neural text to speech system
qsv - CSVs sliced, diced & analyzed.
FLiPStackWeekly - FLaNK AI Weekly covering Apache NiFi, Apache Flink, Apache Kafka, Apache Spark, Apache Iceberg, Apache Ozone, Apache Pulsar, and more...
Stirling-PDF - #1 Locally hosted web application that allows you to perform various operations on PDF files
whisper-plus - WhisperPlus: Faster, Smarter, and More Capable 🚀
FLaNK-Ice - Apache Iceberg - Cloud Data Lakehouse
JavaOnRaspberryPi - Sources and scripts for the book "Getting started with Java on the Raspberry Pi"
temporian - Temporian is an open-source Python library for preprocessing ⚡ and feature engineering 🛠temporal data 📈 for machine learning applications 🤖