SaaSHub helps you find the best software and product alternatives Learn more โ
Top 23 Python Pytorch Projects
-
Project mention: Show HN: Voice-Pro โ AI Voice Cloning Magic: Transform Any Voice in 15 Seconds | news.ycombinator.com | 2024-11-27
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
pip install git+https://github.com/huggingface/transformers
-
ComfyUI
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
Project mention: AI model for near-instant image creation on consumer-grade hardware | news.ycombinator.com | 2024-12-10For runtime, I use ComfyUi [0] which is node based and therefore a bit hard to learn. But you can just look at the examples on their github. Foocus [1] also seems to be popular and a bit more conventional perhaps, though I didn't try it.
For models, Flux [2] is pretty good and quite straightforward to use. (In general, you will have a runtime and then you have to get the model weights seperately). Which Flux variant depends on your graphics card, the Flux.1 schnell should work for most decently modern ones. (And the website, civitai.com is a repository for models and other associated tools.)
[0] https://github.com/comfyanonymous/ComfyUI
[1] https://github.com/lllyasviel/Fooocus
[2] https://civitai.com/models/618692?modelVersionId=699279
-
Project mention: Building a Sarcasm Detection System with LSTM and GloVe: A Complete Guide | dev.to | 2025-01-02
Keras API reference
-
nn
๐งโ๐ซ 60+ Implementations/tutorials of deep learning papers with side-by-side notes ๐; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gans(cyclegan, stylegan2, ...), ๐ฎ reinforcement learning (ppo, dqn), capsnet, distillation, ... ๐ง
-
Project mention: ChatGPT unexpectedly began speaking in a user's cloned voice during testing | news.ycombinator.com | 2024-08-11
-
There are several implementations of the YOLO algorithm available, but for ease-of-use, we will use the Ultralytics implementation in this guide. We will implement and test the code locally and then deploy to Koyeb's GPUs for higher inference speed.
-
Project mention: Show HN: Voice-Pro โ AI Voice Cloning Magic: Transform Any Voice in 15 Seconds | news.ycombinator.com | 2024-11-27
It's really easy for a technical person to do as well.
I use Coqui TTS[0] as part of my home automation, I wrote a small python script that lets me upload a voice clip for it to clone after I got the idea from HeyWillow[1], and a small shim that lets me send the output to a Home Assistant media player instead of using their standard output device. I run the TTS container on a VM with a Tesla P4 (~ยฃ100 to buy) and get about 1x-2x (roughly the same time it'd take to say it, to process) using the large model.
Just for a giggle, I uploaded a few 3s-5s second clip of myself speaking and cloned my voice, then executed a command to our living room media player to call my wife into the room; from another room, she was 100% convinced it was myself speaking words I'd never spoken.
I tried playing with a variety of sentences for a few hours and overall, it sounded almost exactly like me, to me, with the exception of some "attitude" and "intonation" I know I wouldn't use in my speech. I didn't notice much of an improvement using much longer clips; the short ones were "good enough".
Tangentially, it really bugs me that most phone providers in the UK insist you record a "personal greeting" now before they'll let you check your voice mail box, I just record silence, because the last thing I want/need is a voicemail greeting in my voice confirming to some randomer I didn't want calling me, who I am and that my number is active, even more so knowing how I can
[0] https://github.com/coqui-ai/TTS
-
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Project mention: DeepSpeed-Domino: Communication-Free LLM Training Engine | news.ycombinator.com | 2024-11-26 -
Project mention: Deep Live Cam: Real-Time Face Swapping and One-Click Video Deepfake Tool | news.ycombinator.com | 2024-08-10
Interesting... This project is built upon "GFPGAN v1.4" (https://github.com/TencentARC/GFPGAN) and "FaceSwap Extension - Automatic 1111 - Proof of Concept" (https://github.com/revolverocelot1/-webui-faceswap-unlocked). The GFPGAN project is grounded on its own in the paper "GFP-GAN: Towards Real-World Blind Face Restoration with Generative Facial Prior" by Wang et al.
-
MockingBird
๐AIๆๅฃฐ: 5็งๅ ๅ ้ๆจ็ๅฃฐ้ณๅนถ็ๆไปปๆ่ฏญ้ณๅ ๅฎน Clone a voice in 5 seconds to generate arbitrary speech in real-time
-
Project mention: Zizmor would have caught the Ultralytics workflow vulnerability | news.ycombinator.com | 2024-12-08
-
Ray
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
I'm guessing this comment is some kind of "if you know, you know." Likely starting from https://docs.ray.io/en/latest/cluster/vms/user-guides/launch... and then trawling through one of these I guess https://github.com/ray-project/ray/issues?q=is%3Aissue+prem+...
-
vLLM stands for virtual large language models. It is one of the open source fast inferencing and serving libraries. As the name suggests, โvirtualโ encapsulates the concept of virtual memory and paging from operating systems, which allows addressing the problem of maximum utilization of resources and providing faster token generation by utilizing PagedAttention. Traditional LLM serving involves storing large attention keys and value tensors in GPU memory, leading to inefficient memory usage.
-
pytorch-image-models
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more
Project mention: This PR content was generated automatically using cover-agent | news.ycombinator.com | 2024-11-19Those are some pointless tests.
E.g. test_activation_stats_functions [1] that just checks that the returned value is a float, and that it can take random numbers as input.
test_get_state_dict_custom_unwrap [2] is probably supposed to check that custom_unwrap is invoked, but since it doesn't either record being called, or transform its input, the assertions can't actually check that it was called.
[1] https://github.com/huggingface/pytorch-image-models/pull/233...
[2] https://github.com/huggingface/pytorch-image-models/pull/233...
-
-
-
-
Real-ESRGAN
Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration.
Project mention: AI-Powered Nvidia RTX Video HDR Transforms Standard Video into HDR Video | news.ycombinator.com | 2024-01-24It's not exactly what you're after, as it's anime specific and you need to process the video yourself (eg disassemble to frames, run the upscaler, then assemble back to a movie file), but Real-ESRGAN is really good:
https://github.com/xinntao/Real-ESRGAN/
It's pretty brilliant for cleaning up very old, low resolution anime.
-
pytorch-lightning
Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.
Project mention: SB-1047 will stifle open-source AI and decrease safety | news.ycombinator.com | 2024-04-29It's very easy to get started, right in your Terminal, no fees! No credit card at all.
And there are cloud providers like https://replicate.com/ and https://lightning.ai/ that will let you use your LLM via an API key just like you did with OpenAI if you need that.
You don't need OpenAI - nobody does.
-
diffusers
๐ค Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.
Project mention: I Self-Hosted Llama 3.2 with Coolify on My Home Server: A Step-by-Step Guide | news.ycombinator.com | 2024-10-16> All the self-hosted LLM and text-to-image models come with some restrictions trained into them
https://github.com/huggingface/diffusers/issues/3422
-
EasyOCR
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
https://github.com/JaidedAI/EasyOCR
-
Python Pytorch discussion
Python Pytorch related posts
-
I tried cline 3.0.0 and here is what happend
-
GPT-5 is behind schedule
-
How to Install Google PaliGemma 2 Locally?
-
First step and troubleshooting Docling โ RAG with LlamaIndex on my CPU laptop
-
Ultralytics AI Pwn Request Supply Chain Attack
-
AI model for near-instant image creation on consumer-grade hardware
-
Intel Announces Arc B-Series "Battlemage" Discrete Graphics with Linux Support
-
A note from our sponsor - SaaSHub
www.saashub.com | 13 Jan 2025
Index
What are some of the best open-source Pytorch projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | stable-diffusion-webui | 145,457 |
2 | transformers | 137,387 |
3 | ComfyUI | 63,153 |
4 | Keras | 62,345 |
5 | nn | 57,762 |
6 | Real-Time-Voice-Cloning | 53,159 |
7 | yolov5 | 51,849 |
8 | TTS | 36,646 |
9 | DeepSpeed | 36,192 |
10 | GFPGAN | 36,133 |
11 | MockingBird | 35,634 |
12 | ultralytics | 35,068 |
13 | Ray | 34,825 |
14 | vllm | 33,579 |
15 | pytorch-image-models | 32,840 |
16 | fairseq | 30,792 |
17 | pytorch-tutorial | 30,222 |
18 | mmdetection | 30,023 |
19 | Real-ESRGAN | 29,216 |
20 | pytorch-lightning | 28,762 |
21 | diffusers | 27,049 |
22 | EasyOCR | 25,140 |
23 | supervision | 24,596 |