Show HN: Voice-Pro – AI Voice Cloning Magic: Transform Any Voice in 15 Seconds

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Judoscale - Save 47% on cloud hosting with autoscaling that just works
Judoscale integrates with Django, FastAPI, Celery, and RQ to make autoscaling easy and reliable. Save big, and say goodbye to request timeouts and backed-up task queues.
judoscale.com
featured
CodeRabbit: AI Code Reviews for Developers
Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
coderabbit.ai
featured
  1. voice-pro

    Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal isolation, and multilingual translation.

    > Windows Defender may give a warning about untrusted application and disallow further execution of Voice-Pro. If SmartScreen security level is set to "Warn", just click "More info" and then click "Run anyway". If SmartScreen is set to level "Block" there will be no button to run the installation. In this case, open the properties of the start.bat file, and check "Unblock", apply the change and run the start.bat again.

    https://github.com/abus-aikorea/voice-pro?tab=readme-ov-file...

    hard pass and anyone who reads this and continues is bonkers

  2. Judoscale

    Save 47% on cloud hosting with autoscaling that just works. Judoscale integrates with Django, FastAPI, Celery, and RQ to make autoscaling easy and reliable. Save big, and say goodbye to request timeouts and backed-up task queues.

    Judoscale logo
  3. whisper-at

    Code and Pretrained Models for Interspeech 2023 Paper "Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong Audio Event Taggers"

    Have you considered supporting whisper-at - https://github.com/YuanGongND/whisper-at ? Being able to identify sounds on a timeline can be useful e.g. politicians speech and how the audience is reacting to it (e.g. clapping, applauding)

  4. TTS

    πŸΈπŸ’¬ - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

    It's really easy for a technical person to do as well.

    I use Coqui TTS[0] as part of my home automation, I wrote a small python script that lets me upload a voice clip for it to clone after I got the idea from HeyWillow[1], and a small shim that lets me send the output to a Home Assistant media player instead of using their standard output device. I run the TTS container on a VM with a Tesla P4 (~Β£100 to buy) and get about 1x-2x (roughly the same time it'd take to say it, to process) using the large model.

    Just for a giggle, I uploaded a few 3s-5s second clip of myself speaking and cloned my voice, then executed a command to our living room media player to call my wife into the room; from another room, she was 100% convinced it was myself speaking words I'd never spoken.

    I tried playing with a variety of sentences for a few hours and overall, it sounded almost exactly like me, to me, with the exception of some "attitude" and "intonation" I know I wouldn't use in my speech. I didn't notice much of an improvement using much longer clips; the short ones were "good enough".

    Tangentially, it really bugs me that most phone providers in the UK insist you record a "personal greeting" now before they'll let you check your voice mail box, I just record silence, because the last thing I want/need is a voicemail greeting in my voice confirming to some randomer I didn't want calling me, who I am and that my number is active, even more so knowing how I can

    [0] https://github.com/coqui-ai/TTS

  5. willow

    Open source, local, and self-hosted Amazon Echo/Google Home competitive Voice Assistant alternative

  6. ytdl-patched

    yt-dlp fork with some more features

    As I made the comment, I can't really imagine anything that's not so absurd that has a more than zero chance of happening.

    Seriously, what can anybody do about random hacker Joe publishing under the name XoX? Even if they burn GitHub and friends to the ground, if something is useful it will be really really hard to get rid of it. Remember youtube-dl? It's now https://github.com/yt-dlp/yt-dlp

    If they make anything that cripples open source development they will feel it quite soon when they realize that it also cripples their world as much of the tooling and infrastructure also depends on it.

    Killing open source is like killing the internet itself.

  7. open-dubbing

    Open dubbing is an AI dubbing system which uses machine learning models to automatically translate and synchronize audio dialogue into different languages.

  8. stable-diffusion-webui

    Stable Diffusion web UI

  9. CodeRabbit

    CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.

    CodeRabbit logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • WhisperSpeech – An Open Source text-to-speech system built by inverting Whisper

    9 projects | news.ycombinator.com | 17 Jan 2024
  • [D] TTS systems to download & run offline

    3 projects | /r/MachineLearning | 14 May 2023
  • [D] What is the best open source text to speech model?

    15 projects | /r/MachineLearning | 13 Apr 2023
  • [D] What's stopping you from working on speech and voice?

    7 projects | /r/MachineLearning | 30 Jan 2023
  • I made a free transcription service powered by Whisper AI

    8 projects | news.ycombinator.com | 18 Nov 2022

Did you know that Python is
the 2nd most popular programming language
based on number of references?