Python AI

Open-source Python projects categorized as AI

Top 23 Python AI Projects

  • MockingBird

    🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time

    Project mention: TIL cyber criminals with the help of A.I voice cloning software, used a deepfaked voice of a company executive to fool a Emirati bank manager to transfer 35 million dollars into their personal accounts. The bank manager had recognized the executive's voice from having worked with him before. | reddit.com/r/todayilearned | 2022-09-12

    Actually, there are already open source implementations available, for example, the MockingBird project on GitHub. It supports English and Mandarin Chinese. For those with enough computation power and willingness to try, you can even make your own voice dataset and train the model to generate ‘your’ sound, simply following the project docs.

  • spaCy

    💫 Industrial-strength Natural Language Processing (NLP) in Python

    Project mention: Looking for open source projects in Machine Learning and Data Science | reddit.com/r/ArtificialInteligence | 2023-02-06

    You could try spaCy. This is the brains of the operation - an open-source NLP library for advanced NLP in Python. Another is DocArray - It's built on top of NumPy and Dask, and good for preprocessing, modeling, and analysis of text data.

  • InfluxDB

    Build time-series-based applications quickly and at scale.. InfluxDB is the Time Series Platform where developers build real-time applications for analytics, IoT and cloud-native services. Easy to start, it is available in the cloud or on-premises.

  • lightning

    Deep learning framework to train, deploy, and ship AI products Lightning fast.

    Project mention: PyTorch Lightning – DL framework to train, deploy, and ship AI fast | news.ycombinator.com | 2023-01-29
  • MLflow

    Open source platform for the machine learning lifecycle

    Project mention: ML experiment tracking with DagsHub, MLFlow, and DVC | dev.to | 2023-01-12

    Here, we’ll implement the experimentation workflow using DagsHub, Google Colab, MLflow, and data version control (DVC). We’ll focus on how to do this without diving deep into the technicalities of building or designing a workbench from scratch. Going that route might increase the complexity involved, especially if you are in the early stages of understanding ML workflows, just working on a small project, or trying to implement a proof of concept.

  • dvc

    🦉Data Version Control | Git for Data & Models | ML Experiments Management

    Project mention: [Discussion] Github like alternative for ML? | reddit.com/r/MachineLearning | 2023-01-26

    Have you checked https://dvc.org/ ?

  • ColossalAI

    Colossal-AI: A Unified Deep Learning System for Big Model Era

    Project mention: An Open-Source Version of ChatGPT is Coming [News] | reddit.com/r/MachineLearning | 2022-12-31

    Need to deploy the inference model with Colossal AI.

  • frigate

    NVR with realtime local object detection for IP cameras

    Project mention: Inexpensive and decent hardware/software to run AI security cams? | reddit.com/r/HomeServer | 2023-02-08

    What do you think would be the best hardware and software suited for this if we should go this route? I've been reading up on Frigate and Google Coral, but I am unclear on how it all fits together, the hardware required, etc, if that makes sense. And then what about an Android app to send notifications and view footage? Or would I set up postfix to send emails as notification? I am unsure of how this all would work.

  • Sonar

    Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.

  • haystack

    :mag: Haystack is an open source NLP framework to interact with your data using Transformer models and LLMs (GPT-3 and alike). Haystack offers production-ready tools to quickly build ChatGPT-like question answering, semantic search, text generation, and more.

    Project mention: New free tool that uses fine-tuned BERT model to surface answers from research papers | reddit.com/r/LanguageTechnology | 2022-10-28

    Some cool tools like HayStack that would be useful in putting some of these together.

  • RobustVideoMatting

    Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!

    Project mention: CatFileCreator in Nuke | reddit.com/r/vfx | 2022-10-10

    I have done a bit of coding and I will use pretrained models only. Looking at things like depth and segmentation. Like this as an example. I am using it on a collab now but its so cumbersome. https://github.com/PeterL1n/RobustVideoMatting

  • cookiecutter-data-science

    A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.

    Project mention: Questions about Cookiecutter and Anaconda. | reddit.com/r/datascience | 2022-12-30

    I opened an Anaconda cmd window and ran `cookiecutter https://github.com/drivendata/cookiecutter-data-science ` . I answered all prompted questions. After searching for a while I found where the project folder was created. However, how do I get this on GitHub? The only thing I can figure out is to create a brand new repo on GitHub with the exact same name, open it in GitHub desktop, click "show in explorer", and then drag and drop all files from the Cookiecutter folder into the GitHub Desktop folder. However to me this does not sound like the intended way to create a new project and put it on GitHub.

  • metaflow

    :rocket: Build and manage real-life data science projects with ease!

    Project mention: [OC] Gender diversity in Tech companies | reddit.com/r/dataisbeautiful | 2023-01-16

    They had to figure out video compression that worked at the volume that they wanted to deliver. They had to build and maintain their own CDN to be able to have a always available and consistent viewing experience. Don’t even get me started on the resiliency tools like hystrix that they were kind enough to open source. I mean, they have their own fucking data science framework and they’re looking into using neural networks to downscale video.. Sound familiar? That’s cause that’s practically the same thing as Nvidia’s DLSS (which upscales instead of downscales).

  • mycroft-core

    Mycroft Core, the Mycroft Artificial Intelligence platform.

    Project mention: Sundar Pichai: An important next step on our AI journey | news.ycombinator.com | 2023-02-06

    The AI I'm interested in is of this sort https://mycroft.ai/ i.e. where I run and control it locally.

    I don't want to go on any more "journeys" with Google. The last one start with me rooting for and trusting them (circa IPO.. 2004?) and ended with a dystopian nightmare spy apparatus and abuses like AMP.

  • dream-textures

    Stable Diffusion built-in to Blender

    Project mention: Are there jobs that have intersections between graphics and machine learning? | reddit.com/r/GraphicsProgramming | 2023-02-03

    Texture/material generation

  • snorkel

    A system for quickly generating training data with weak supervision

    Project mention: [Discussion] - "data sourcing will be more important than model building in the era of foundational model fine-tuning" | reddit.com/r/MachineLearning | 2022-12-03
  • deeplake

    Data Lake for Deep Learning. Build, manage, query, version, & visualize datasets. Stream data real-time to PyTorch/TensorFlow. https://activeloop.ai

    Project mention: Launch HN: Activeloop (YC S18) – Data lake for deep learning | news.ycombinator.com | 2022-11-15

    Re: HF - we know them and admire their work (primarily, until very recently, focused on NLP, while we focus mostly on CV). As mentioned in the post, a large part of Deep Lake, including the Python-based dataloader and dataset format, is open source as well - https://github.com/activeloopai/deeplake.

    Likewise, we curate a list of large open source datasets here -> https://datasets.activeloop.ai/docs/ml/, but our main thing isn't aggregating datasets (focus for HF datasets), but rather providing people with a way to manage their data efficiently. That being said, all of the 125+ public datasets we have are available in seconds with one line of code. :)

    We haven't benchmarked against HF datasets in a while, but Deep Lake's dataloader is much, much faster in third-party benchmarks (see this https://arxiv.org/pdf/2209.13705 and here for an older version, that was much slower than what we have now, see this: https://pasteboard.co/la3DmCUR2iFb.png). HF under the hood uses Git-LFS (to the best of my knowledge) and is not opinionated on formats, so LAION just dumps Parquet files on their storage.

    While your setup would work for a few TBs, scaling to PB would be tricky including maintaining your own infrastructure. And yep, as you said NAS/NFS would neither be able to handle the scale (especially writes with 1k workers). I am also slightly curious about your use of mmap files with image/video compressed data (as zero-copy won’t happen) unless you decompress inside the GPU ;), but would love to learn more from you! Re: pricing thanks for the feedback, storage is one component and customly priced for PB-scale workloads.

  • autoscraper

    A Smart, Automatic, Fast and Lightweight Web Scraper for Python

    Project mention: A Smart, Automatic, Fast and Lightweight Web Scraper for Python | reddit.com/r/webdev | 2022-12-02
  • BentoML

    Unified Model Serving Framework 🍱

    Project mention: Ask HN: Who is hiring? (November 2022) | news.ycombinator.com | 2022-11-01
  • clearml

    ClearML - Auto-Magical CI/CD to streamline your ML workflow. Experiment Manager, MLOps and Data-Management

    Project mention: Is there any workflow orchestrator that is Hydra friendly ? | reddit.com/r/mlops | 2022-06-16
  • adversarial-robustness-toolbox

    Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams

    Project mention: [D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models? | reddit.com/r/MachineLearning | 2023-01-22
  • Automagica

    AI-powered Smart Robotic Process Automation 🤖

  • thinc

    🔮 A refreshing functional take on deep learning, compatible with your favorite libraries

    Project mention: Tinygrad: A simple and powerful neural network framework | news.ycombinator.com | 2022-11-03

    I love those tiny DNN frameworks, some examples that I studied in the past (I still use PyTorch for work related projects) :

    thinc.by the creators of spaCy https://github.com/explosion/thinc

  • zenml

    ZenML 🙏: Build portable, production-ready MLOps pipelines. https://zenml.io.

    Project mention: [P] I reviewed 50+ open-source MLOps tools. Here’s the result | reddit.com/r/MachineLearning | 2022-05-29

    Currently, you can see the integrations we support here and it includes a lot of tools in your list. I also feel I agree with your categorization (it is exactly the categorization we use in our docs pretty much). Perhaps one thing missing might be feature stores but that is a minor thing in the bigger picture.

  • pytorch-forecasting

    Time series forecasting with PyTorch

    Project mention: LSTM/CNN architectures for time series forecasting[Discussion] | reddit.com/r/MachineLearning | 2022-05-06

    Pytorch-forecasting

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2023-02-08.

Python AI related posts

Index

What are some of the best open-source AI projects in Python? This list will help you:

Project Stars
1 MockingBird 26,089
2 spaCy 25,158
3 lightning 21,465
4 MLflow 13,574
5 dvc 11,043
6 ColossalAI 8,254
7 frigate 6,963
8 haystack 6,698
9 RobustVideoMatting 6,609
10 cookiecutter-data-science 6,474
11 metaflow 6,367
12 mycroft-core 6,124
13 dream-textures 5,724
14 snorkel 5,390
15 deeplake 5,197
16 autoscraper 4,922
17 BentoML 4,512
18 clearml 4,064
19 adversarial-robustness-toolbox 3,447
20 Automagica 2,729
21 thinc 2,662
22 zenml 2,648
23 pytorch-forecasting 2,561
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com