Python GPU

Open-source Python projects categorized as GPU

Top 23 Python GPU Projects

  1. Pytorch

    Tensors and Dynamic neural networks in Python with strong GPU acceleration

    Project mention: How to Get Started with Scikit-Learn: A Beginner-Friendly Guide to Machine Learning in Python | dev.to | 2025-04-24

    PyTorch

  2. Judoscale

    Save 47% on cloud hosting with autoscaling that just works. Judoscale integrates with Django, FastAPI, Celery, and RQ to make autoscaling easy and reliable. Save big, and say goodbye to request timeouts and backed-up task queues.

    Judoscale logo
  3. DeepSpeed

    DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

    Project mention: DeepSpeed-Domino: Communication-Free LLM Training Engine | news.ycombinator.com | 2024-11-26
  4. ivy

    Convert Machine Learning Code Between Frameworks

  5. scalene

    Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals

    Project mention: LLMs and Code Optimization | news.ycombinator.com | 2025-01-06

    This has been a feature of the Scalene Python profiler (https://github.com/plasma-umass/scalene) for some time (at this point, about 1.5 years) - bring your own API key for OpenAI / Azure / Bedrock, also works with Ollama. Optimizing Python code to use NumPy or other similar native libraries can easily yield multiple order of magnitude improvements in real-world settings. We tried it on several of the success stories of Scalene (before the integration with LLMs); see https://github.com/plasma-umass/scalene/issues/58 - and found that it often automatically yielded the same or better optimizations - see https://github.com/plasma-umass/scalene/issues/554. (Full disclosure: I am one of the principal designers of Scalene.)

  6. tvm

    Open deep learning compiler stack for cpu, gpu and specialized accelerators

    Project mention: Apache TVM | news.ycombinator.com | 2024-09-11
  7. cupy

    NumPy & SciPy for GPU

    Project mention: Nvidia adds native Python support to CUDA | news.ycombinator.com | 2025-04-04

    The plethora of packages, including DSLs for compute and MLIR.

    https://developer.nvidia.com/how-to-cuda-python

    https://cupy.dev/

  8. server

    The Triton Inference Server provides an optimized cloud and edge inferencing solution. (by triton-inference-server)

    Project mention: Scuda – Virtual GPU over IP | news.ycombinator.com | 2024-10-09

    This is very interesting but many of the motivations listed are far better served with alternate approaches.

    For "remote" model training there is NCCL + Deepspeed/FSDP/etc. For remote inferencing there are solutions like Triton Inference Server[0] that can do very high-performance hosting of any model for inference. For LLMs specifically there are nearly countless implementations.

    That said, the ability to use this for testing is interesting but I wonder about GPU contention and as others have noted the performance of such a solution will be terrible even with relatively high speed interconnect (100/400gb ethernet, etc).

    NCCL has been optimized to support DMA directly between network interfaces and GPUs which is of course considerably faster than solutions like this. Triton can also make use of shared memory, mmap, NCCL, MPI, etc which is one of the many tricks it uses for very performant inference - even across multiple chassis over another network layer.

    [0] - https://github.com/triton-inference-server/server

  9. CodeRabbit

    CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.

    CodeRabbit logo
  10. ImageAI

    A python library built to empower developers to build applications and systems with self-contained Computer Vision capabilities

  11. AlphaPose

    Real-Time and Accurate Full-Body Multi-Person Pose Estimation&Tracking System

  12. BigDL

    Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, DeepSeek, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, DeepSpeed, Axolotl, etc.

    Project mention: DeepSeek R1 671B Q4_K_M with 1~2 Arc A770 on Xeon | news.ycombinator.com | 2025-03-05

    That's because the OP is linking to the quickstart guide. There are benchmark numbers on the github's root page, but it does not appear to include the new deepseek yet:

    https://github.com/intel/ipex-llm/tree/main?tab=readme-ov-fi...

  13. skypilot

    SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 16+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.

    Project mention: Service to auto route LLM/Model traffic | news.ycombinator.com | 2025-02-19
  14. chainer

    A flexible framework of neural networks for deep learning

  15. nvitop

    An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.

    Project mention: nvitop VS nviwatch - a user suggested alternative | libhunt.com/r/nvitop | 2024-09-09
  16. tf-quant-finance

    High-performance TensorFlow library for quantitative finance.

  17. pytorch-forecasting

    Time series forecasting with PyTorch

  18. gpustat

    📊 A simple command-line utility for querying and monitoring GPU status

    Project mention: gpustat VS nviwatch - a user suggested alternative | libhunt.com/r/gpustat | 2024-09-09
  19. asitop

    Perf monitoring CLI tool for Apple Silicon

  20. jittor

    Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.

  21. leptonai

    A Pythonic framework to simplify AI service building

  22. TransformerEngine

    A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory utilization in both training and inference.

    Project mention: TransformerEngine: A library for accelerating Transformer models on Nvidia GPUs | news.ycombinator.com | 2024-09-25
  23. jetson_stats

    📊 Simple package for monitoring and control your NVIDIA Jetson [Orin, Xavier, Nano, TX] series

  24. pygraphistry

    PyGraphistry is a Python library to quickly load, shape, embed, and explore big graphs with the GPU-accelerated Graphistry visual graph analyzer

    Project mention: Initial CUDA Performance Lessons | news.ycombinator.com | 2024-10-11

    Nice!

    It's interesting from the perspective of maintenance too. You can bet most constants like warp sizes will change, so you get into things like having profiles, autotuners, or not sweating the small stuff.

    We went more extreme, and nowadays focus on several layers up: By accepting the (high!) constant overheads of tools like RAPIDS cuDF , we get in exchange the ability to easily crank code with good saturation on the newest GPUs and that any data scientist can edit and extend. Likewise, they just need to understand basics like data movement and columnar analytics data reps to make GPU pipelines. We have ~1 CUDA kernel left and many years of higher-level.

    As an example, this is one of the core methods of our new graph query language (think cypher on pandas/spark), and it gets Graph500 level performance on cheapo GPUs just by being data parallel with high saturation per step: https://github.com/graphistry/pygraphistry/blob/master/graph... . Despite ping-ponging a ton because cudf doesn't (yet) coalesce GPU kernel calls, it still places well, and is easy to maintain & extend.

  25. torchrec

    Pytorch domain library for recommendation systems

    Project mention: Advancements in Embedding-Based Retrieval at Pinterest Homefeed | news.ycombinator.com | 2025-02-14

    Nice, there are a ton of threads here to check out. For example I had not heard of

    https://pytorch.org/torchrec/

    Which seems to nicely package a lot of primitives I have worked with previously.

  26. InfluxDB

    InfluxDB high-performance time series database. Collect, organize, and act on massive volumes of high-resolution data to power real-time intelligent systems.

    InfluxDB logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python GPU discussion

Log in or Post with

Python GPU related posts

Index

What are some of the best open-source GPU projects in Python? This list will help you:

# Project Stars
1 Pytorch 89,253
2 DeepSpeed 38,004
3 ivy 14,177
4 scalene 12,595
5 tvm 12,226
6 cupy 10,139
7 server 9,098
8 ImageAI 8,775
9 AlphaPose 8,234
10 BigDL 7,774
11 skypilot 7,721
12 chainer 5,910
13 nvitop 5,435
14 tf-quant-finance 4,790
15 pytorch-forecasting 4,252
16 gpustat 4,182
17 asitop 3,975
18 jittor 3,138
19 leptonai 2,747
20 TransformerEngine 2,379
21 jetson_stats 2,274
22 pygraphistry 2,238
23 torchrec 2,095

Sponsored
Save 47% on cloud hosting with autoscaling that just works
Judoscale integrates with Django, FastAPI, Celery, and RQ to make autoscaling easy and reliable. Save big, and say goodbye to request timeouts and backed-up task queues.
judoscale.com