Python Computer Vision

Open-source Python projects categorized as Computer Vision

Top 23 Python Computer Vision Projects

  • Face Recognition

    The world's simplest facial recognition api for Python and the command line

    Project mention: Security Image Recognition | /r/computervision | 2023-12-10

    Camera connected to a PI? Something like this could run locally:

  • pytorch-CycleGAN-and-pix2pix

    Image-to-Image Translation in PyTorch

    Project mention: List of AI-Models | /r/GPT_do_dah | 2023-05-16

    Click to Learn more...

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

  • EasyOCR

    Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

    Project mention: Leveraging GPT-4 for PDF Data Extraction: A Comprehensive Guide | | 2023-12-27

    PyTesseract Module [ Github ] EasyOCR Module [ Github ] PaddlePaddle OCR [ Github ]

  • d2l-en

    Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.

  • datasets

    🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools

    Project mention: 🐍🐍 23 issues to grow yourself as an exceptional open-source Python expert 🧑‍💻 🥇 | | 2023-10-19
  • vit-pytorch

    Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

    Project mention: Is it easier to go from Pytorch to TF and Keras than the other way around? | /r/pytorch | 2023-05-13

    I also need to learn Pyspark so right now I am going to download the Fashion Mnist dataset, use Pyspark to downsize each image and put the into separate folders according to their labels (just to show employers I can do some basic ETL with Pyspark, not sure how I am going to load for training in Pytorch yet though). Then I am going to write the simplest Le Net to try to categorize the fashion MNIST dataset (results will most likely be bad but it's okay). Next, try to learn transfer learning in Pytorch for both CNN or maybe skip ahead to ViT. Ideally at this point I want to study the Attention mechanism a bit more and try to implement Simple Vit which I saw here:

  • vision

    Datasets, Transforms and Models specific to Computer Vision

    Project mention: Transitioning From PyTorch to Burn | | 2024-02-14

    Let's start by defining the ResNet module according to the Residual Network architecture, as replicated[1] by the torchvision implementation of the model we will import. Detailed architecture variants with a depth of 18, 34, 50, 101 and 152 layers can be found in the table below.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

  • supervision

    We write your reusable computer vision tools. 💜

    Project mention: Supervision: Reusable Computer Vision | | 2024-03-24

    You can always slice the images into smaller ones, run detection on each tile, and combine results. Supervision has a utility for this -, but it only works with detections. You can get a much more accurate result this way. Here is some side-by-side comparison:

  • facenet

    Face recognition using Tensorflow

  • labelme

    Image Polygonal Annotation with Python (polygon, rectangle, circle, line, point and image-level flag annotation).

    Project mention: labelme VS anylabeling - a user suggested alternative | | 2023-04-15
  • fashion-mnist

    A MNIST-like fashion product database. Benchmark :point_down:

    Project mention: Logistic Regression for Image Classification Using OpenCV | | 2023-12-31

    In this case there's no advantage to using logistic regression on an image other than the novelty. Logistic regression is excellent for feature explainability, but you can't explain anything from an image.

    Traditional classification algorithms but not deep learning such as SVMs and Random Forest perform a lot better on MNIST, up to 97% accuracy compared to the 88% from logistic regression in this post. Check the Original MNIST benchmarks here:

  • gaussian-splatting

    Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"

    Project mention: Show HN: Gaussian Splat renderer in VR with Unity | | 2024-01-24

    Chris' post doesn't really give much background info, so here's what's going on here and why it's awesome.

    Real-time 3D rendering has historically been based on rasterisation of polygons. This has brought us a long way and has a lot of advantages, but making photorealistic scenes takes a lot of work from the artist. You can scan real objects like photogrammetry and then convert to high poly meshes, but photogrammetry rigs are pro-level tools, and the assets won't render at real time speeds. Unreal 5 introduced Nanite which is a very advanced LoD algorithm and that helps a lot, but again, we seem to be hitting the limits of what can be done with polygon based rendering.

    3D Gaussian Splats is a new AI based technique that lets you render in real-time photorealistic 3D scenes that were captured with only a few photos taken using normal cameras. It replaces polygon based rendering with radiance fields.

    3DGS uses several advanced techniques:

    1. A 3D point cloud is estimated by using "structure in motion" techniques.

    2. The points are turned into "3D gaussians", which are sort of floating blobs of light where each one has a position, opacity and a covariance matrix defined using "spherical harmonics" (no me neither). They're ellipsoids so can be thought of as spheres that are stretched and rotated.

    3. Rendering is done via a form of ray-tracing in which the 3D Gaussians are projected to the 2D screen (into "splats"), sorted so transparency works and then rasterized on the fly using custom shaders.

    The neural network isn't actually used at rendering time, so GPUs can render the scene nice and fast.

    In terms of what it can do the technique might be similar to Unreal's Nanite. Both are designed for static scenes. Whilst 3D Gaussians can be moved around on the fly, so the scene can be changed in principle, none of the existing animation, game engines or artwork packages know what to do without polygons. But this sort of thing could be used to rapidly create VR worlds based on only videos taken from different angles, which seems useful.

  • ludwig

    Low-code framework for building custom LLMs, neural networks, and other AI models

    Project mention: Show HN: Toolkit for LLM Fine-Tuning, Ablating and Testing | | 2024-04-07

    This is a great project, little bit similar to, but it includes testing capabilities and ablation.

    questions regarding the LLM testing aspect: How extensive is the test coverage for LLM use cases, and what is the current state of this project area? Do you offer any guarantees, or is it considered an open-ended problem?

    Would love to see more progress toward this area!

  • Meshroom

    3D Reconstruction Software

    Project mention: AI bots saying I made a fake post: NOT FAKE POST | /r/opensource | 2023-12-10
  • pytorch-grad-cam

    Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.

    Project mention: Exploring GradCam and More with FiftyOne | | 2024-02-13

    For the two examples we will be looking at, we will be using pytorch_grad_cam, an incredible open source package that makes working with GradCam very easy. There are excellent other tutorials to check out on the repo as well.

  • Kornia

    Geometric Computer Vision Library for Spatial AI

  • nerfstudio

    A collaboration friendly studio for NeRFs

    Project mention: Smerf: Streamable Memory Efficient Radiance Fields | | 2023-12-13

    You’re under the right paper for doing this. Instead of one big model, they have several smaller ones for regions in the scene. This way rendering is fast for large scenes.

    This is similar to Block-NeRF [0], in their project page they show some videos of what you’re asking.

    As for an easy way of doing this, nothing out-of-the-box. You can keep an eye on nerfstudio [1], and if you feel brave you could implement this paper and make a PR!



  • RobustVideoMatting

    Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!

    Project mention: lineart_coarse + openpose, batch img2img | /r/StableDiffusion | 2023-05-10
  • U-2-Net

    The code for our newly accepted paper in Pattern Recognition 2020: "U^2-Net: Going Deeper with Nested U-Structure for Salient Object Detection."

    Project mention: I used the ChatGPT API to create a proof-of-concept AI driven video game. Using generative AI for the images and dialogue and GPT-3.5 for narrative and game control. More info in comments. | /r/ChatGPT | 2023-06-17

    I use a finetuned custom Stable Diffusion model in combination with a style embedding for the characters for image generation and U²-Net for background removal.

  • deeplake

    Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow.

    Project mention: FLaNK AI Weekly 25 March 2025 | | 2024-03-25
  • autogluon

    AutoGluon: AutoML for Image, Text, Time Series, and Tabular Data

    Project mention: pip install remyxai - easiest way to create custom vision models | /r/computervision | 2023-04-25

    This seems not very convincing. There are other popular frameworks that provide AutoML with existing datasets (eg

  • BackgroundMattingV2

    Real-Time High-Resolution Background Matting

  • fiftyone

    The open-source tool for building high-quality datasets and computer vision models

    Project mention: How to Cluster Images | | 2024-04-09

    With all that background out of the way, let’s turn theory into practice and learn how to use clustering to structure our unstructured data. We’ll be leveraging two open-source machine learning libraries: scikit-learn, which comes pre-packaged with implementations of most common clustering algorithms, and fiftyone, which streamlines the management and visualization of unstructured data:

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-04-09.

Python Computer Vision related posts


What are some of the best open-source Computer Vision projects in Python? This list will help you:

Project Stars
1 Face Recognition 51,634
2 pytorch-CycleGAN-and-pix2pix 21,904
3 EasyOCR 21,707
4 d2l-en 21,485
5 datasets 18,304
6 vit-pytorch 17,695
7 vision 15,324
8 supervision 13,740
9 facenet 13,475
10 labelme 12,216
11 fashion-mnist 11,439
12 gaussian-splatting 11,013
13 ludwig 10,753
14 Meshroom 10,518
15 pytorch-grad-cam 9,313
16 Kornia 9,288
17 nerfstudio 8,413
18 RobustVideoMatting 8,118
19 U-2-Net 8,045
20 deeplake 7,651
21 autogluon 7,028
22 BackgroundMattingV2 6,638
23 fiftyone 6,603
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives