Python Machinelearning

Open-source Python projects categorized as Machinelearning

Top 23 Python Machinelearning Projects

  • horovod

    Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

    Project mention: [D] What is the recommended approach to training NN on big data set? | | 2022-12-08

    And in case scaling is really important to you. May I suggest you look into Horovod?

  • ludwig

    Data-centric declarative deep learning framework

  • Sonar

    Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.

  • vaex

    Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀

    Project mention: preprocessing millions of records - how to speed up the processing | | 2022-06-03

    Try vaex, vaex, using lazy evaluation and parallel calculations, you should be fine.

  • clearml

    ClearML - Auto-Magical CI/CD to streamline your ML workflow. Experiment Manager, MLOps and Data-Management

    Project mention: Is there any workflow orchestrator that is Hydra friendly ? | | 2022-06-16
  • igel

    a delightful machine learning tool that allows you to train, test, and use models without writing code

  • tslearn

    A machine learning toolkit dedicated to time-series data

    Project mention: tslearn: NEW Data - star count:2325.0 | | 2022-12-31
  • marqo

    Tensor search for humans.

    Project mention: From “iron manual” to “Iron Man” – augmenting GPT with a fast editable memory | | 2023-02-08
  • InfluxDB

    Build time-series-based applications quickly and at scale.. InfluxDB is the Time Series Platform where developers build real-time applications for analytics, IoT and cloud-native services. Easy to start, it is available in the cloud or on-premises.

  • nannyml

    Detecting silent model failure. NannyML estimates performance for regression and classification models using tabular data. It alerts you when and why it changed. It is the only open-source library capable of fully capturing the impact of data drift on performance.

    Project mention: [HIRING][Full Time, Part Time, Temporary, Internship, Freelance] Data Science Intern (Remote) | | 2022-05-20

    Description NannyML - creators of an Open Source Python library, are looking for multiple Data Science interns to help across research, prototyping, and product. Github: About Us NannyML is an Open Source Python lib …

  • deepsparse

    Inference runtime offering GPU-class performance on CPUs and APIs to integrate ML into your application

    Project mention: [D] How to get the fastest PyTorch inference and what is the "best" model serving framework? | | 2022-10-28

    For 1), what is the easiest way to speed up inference (assume only PyTorch and primarily GPU but also some CPU)? I have been using ONNX and Torchscript but there is a bit of a learning curve and sometimes it can be tricky to get the model to actually work. Is there anything else worth trying? I am enthused by things like TorchDynamo (although I have not tested it extensively) due to its apparent ease of use. I also saw the post yesterday about Kernl using (OpenAI) Triton kernels to speed up transformer models which also looks interesting. Are things like SageMaker Neo or NeuralMagic worth trying? My only reservation with some of these is they still seem to be pretty model/architecture specific. I am a little reluctant to put much time into these unless I know others have had some success first.

  • nsfw_model

    Keras model of NSFW detector

    Project mention: Any suggestions for client side or API content moderation tools for image uploads | | 2023-01-03

    I did some tests myself, and the results look very accurate. The model I use has 93% accuracy, and has been trained for days with over 60 GBs of data

  • pytorch2keras

    PyTorch to Keras model convertor

  • fal

    do more with dbt. fal helps you run Python alongside dbt, so you can send Slack alerts, detect anomalies and build machine learning models.

    Project mention: Dbt-fal: a dbt Python adapter with local code execution | | 2023-01-12

    We built a dbt adapter that helps you run local Python code with your dbt project with any other data warehouse. You can see it here:

    This new adapter helps you run your dbt Python models with isolated Python environments using our open source library:

  • retentioneering-tools

    Retentioneering: product analytics, data-driven customer journey map optimization, marketing analytics, web analytics, transaction analytics, graph visualization, and behavioral segmentation with customer segments in Python. Opensource analytics, predictive analytics over clickstream, sentiment analysis, AB tests, machine learning, and Monte Carlo Markov Chain simulations, extending Pandas, Networkx and sklearn.

    Project mention: My Favorite Off-the-Shelf Data Science Repos, What Are Yours? | | 2022-06-22

    Here are my top off-the-shelf data science models for Marketing. Would be interested which other marketing data science tools you use?

    Product Recommendation on Your Website with Metarank (

    Metarank is a tool that helps you easily build an advanced recommendation engine for your products or content on your website. To get started you only need historical performance data of your products (e.g. number of clicks) and additional metadata like product rating, genre, ingredients or price. In a YAML file, you define the features and the model parameters (e.g. number of iterations, modeling technique). The API service integrates with Apache Flink and can be easily integrated into Kubernetes clusters.

    User Journey Analysis on your Website with Retentioneering (

    Retentioneering helps you to understand the user journey on your website. Retentioneering is a Python library that allows you to easily connect your Google Analytics data (in Bigquery). You define user-id, event-type and time stamp. From this data input a comprehensive graph network is created with gains and losses as you know it from a customer journey. In addition, customer segments are created that have a similar customer journey. This reduces the complexity of a purely descriptive view of the data.

    Marketing Mix Modeling with Robyn (

    Less third-party cookie means less attribution models. The answer to this is Marketing Mix Modeling. Marketing mix models are regression models that use statistical probability to calculate the effect size of marketing channels and other independent variables. The advantage is that business context can be modeled much more realistically. For example, Google Searches for the own brand can be integrated to determine the share of the own brand strength in the revenue. Likewise, offline advertising measures can be modeled with other metrics in this context (e.g. offline advertising with GRPs). Robyn takes into account adstock effects, ROAS calculation and multicollinarity in the marketing channels. In addition, with simple functionality, budgets can be optimized using the predictions and results from marketing tests can be integrated into the model for calibration.

  • MetaSpore

    A unified end-to-end machine intelligence platform

    Project mention: Quickly develop risk control algorithms in business scenarios based on MetaSpore | | 2022-06-15

    The evaluation problems related to financial loans are mainly based on tabular data, so the importance of feature engineering is self-evident. The common features in the dataset include ID type, Categorical type, and continuous number type, which require common data handling such as EDA, missing value completion, outlier processing, normalization, feature binning, and importance assessment. The process can reference the GitHub codebase:, which part about tianchi_loan instructions.

  • LiuAlgoTrader

    Framework for algorithmic trading

  • deep-significance

    Enabling easy statistical significance testing for deep neural networks.

  • covalent

    Pythonic tool for running data-science/high performance/quantum-computing workflows in heterogenous environments. (by AgnostiqHQ)

    Project mention: Show HN: Covalent – distributed computing for ML, HPC and Quantum (open source) | | 2022-11-09
  • CodeRL

    This is the official code for the paper CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning (NeurIPS22).

    Project mention: [D] Most important AI Paper´s this year so far in my opinion + Proto AGI speculation at the end | | 2022-08-14

    CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning Paper: Github:

  • yolo-hand-detection

    A pre-trained YOLO based hand detection network.

  • zoofs

    zoofs is a python library for performing feature selection using a variety of nature-inspired wrapper algorithms. The algorithms range from swarm-intelligence to physics-based to Evolutionary. It's easy to use , flexible and powerful tool to reduce your feature size.

    Project mention: [D] Feature engineering automation? | | 2022-04-27

    and as u described considering you will end up with a lot of features. for feature selection. Zoofs is a wrapper based feature selection, so you'll be able select feature-based purely on performance if u have a healthy test set or if u perform cross-validation

  • hydra-zen

    Pythonic functions for creating and enhancing Hydra applications

    Project mention: [Project] I built a minimal stateless ML project template built on my current favourite stack | | 2023-02-02

    It provides mature configuration support via [Hydra-Zen]( and automates configuration generation via [decorators]( implemented in this repo.

  • sagemaker-explaining-credit-decisions

    Amazon SageMaker Solution for explaining credit decisions.

  • fastchess

    Predicts the best chess move with 27.5% accuracy by a single matrix multiplication

    Project mention: fastchess VS Synergy-Chess - a user suggested alternative | | 2022-06-18
  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2023-02-08.

Python Machinelearning related posts


What are some of the best open-source Machinelearning projects in Python? This list will help you:

Project Stars
1 horovod 12,981
2 ludwig 8,744
3 vaex 7,732
4 clearml 4,064
5 igel 3,024
6 tslearn 2,375
7 marqo 2,215
8 nannyml 1,373
9 deepsparse 1,257
10 nsfw_model 1,089
11 pytorch2keras 831
12 fal 658
13 retentioneering-tools 596
14 MetaSpore 585
15 LiuAlgoTrader 476
16 deep-significance 276
17 covalent 274
18 CodeRL 254
19 yolo-hand-detection 210
20 zoofs 170
21 hydra-zen 155
22 sagemaker-explaining-credit-decisions 85
23 fastchess 73
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives