Python Statistics

Open-source Python projects categorized as Statistics

Top 23 Python Statistic Projects

  • scikit-learn

    scikit-learn: machine learning in Python

    Project mention: Polars | | 2024-01-08

    sklearn is adding support through the dataframe interchange protocol ( scipy, as far as I know, doesn't explicitly support dataframes (it just happens to work when you wrap a Series in `np.array` or `np.asarray`). I don't know about PyTorch but in general you can convert to numpy.

  • ydata-profiling

    1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.

    Project mention: FLaNK 25 December 2023 | | 2023-12-26
  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

  • statsmodels

    Statsmodels: statistical modeling and econometrics in Python

    Project mention: statsmodels Release Candidate 0.14.0rc0 tagged | /r/Python | 2023-04-26
  • imbalanced-learn

    A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

    Project mention: What’s your approach to highly imbalanced data sets? | /r/datascience | 2023-05-26

    There's a pletora of undersampling and oversampling models you can try out. To avoid removing information form the dataset, you can focus on oversampling techniques. You can try imbalanced-learn or smote-variants. Given enough data, using fully synthetic data is also an option, you can check ydata-synthetic for it. Let us know how it turned out!

  • boltons

    🔩 Like builtins, but boltons. 250+ constructs, recipes, and snippets which extend (and rely on nothing but) the Python standard library. Nothing like Michael Bolton.

    Project mention: Boltons is a set of over 250 BSD-licensed, pure-Python utilities | | 2023-12-11
  • Tautulli

    A Python based monitoring and tracking tool for Plex Media Server.

    Project mention: I'm fine with the basics of Plex - now what can I do to really use plex to it's full potential? | /r/PleX | 2023-12-09

    With Tautulli you have a better monitoring system than what Plex offers. Streaming history split by user, you can add notifications to a lot of services like Slack, email and so on. You can even create newsletters being sent out to users based on what was added to your server.

  • statsforecast

    Lightning ⚡️ fast forecasting with statistical and econometric models.

    Project mention: TimeGPT-1 | | 2023-10-13

    I can't find the TimeGPT-1 model.

    LICENSE Apache-2

    Mentions ARIMA, ETS, CES, and Theta modeling

  • Onboard AI

    ChatGPT with full context of any GitHub repo. Onboard AI learns any GitHub repo in minutes and lets you chat with it to locate functionality, understand different parts, and generate new code. Use it for free at

  • sweetviz

    Visualize and compare datasets, target values and associations, with one line of code.

  • eiten

    Statistical and Algorithmic Investing Strategies for Everyone

  • github-stats

    Better GitHub statistics images for your profile, with stats from private repos too

    Project mention: Ask HN: How to Do a GitHub Wrapped? | | 2023-12-19

    I have done similar work using the GitHub APIs before. I recommend using their GraphQL explorer to develop your queries interactively. You may need to fall back on the REST API instead of the GraphQL one for certain stats.

    You can also refer to my code here, which may already collect some of the statistics you're interested in.

    I predict the most annoying part of this project will be dealing with authentication. There are a handful of ways to do it, and the permissions can be finicky depending on what data you are fetching.

    Best of luck!

  • pgmpy

    Python Library for learning (Structure and Parameter), inference (Probabilistic and Causal), and simulations in Bayesian Networks.

  • lifetimes

    Lifetime value in Python

  • pycm

    Multi-class confusion matrix library in Python

    Project mention: PyCM 4.0 Released: Multilabel Confusion Matrix Support | /r/coolgithubprojects | 2023-06-07
  • uncertainty-baselines

    High-quality implementations of standard and SOTA methods on a variety of tasks.

  • maloja

    Self-hosted music scrobble database to create personal listening statistics and charts

    Project mention: Can I get charged for having FLAC content on my server hosted at Germany? | /r/selfhosted | 2023-07-07

    With that recommendation, I want to add Maloja to the mix. (scrobble which works great with navidrome).

  • hierarchicalforecast

    Probabilistic Hierarchical forecasting 👑 with statistical and econometric methods.

    Project mention: [D] When less is more in the hierarchical forecasting case. | /r/MachineLearning | 2023-07-03
  • popmon

    Monitor the stability of a Pandas or Spark dataframe ⚙︎

  • sportsipy

    A free sports API written for python

    Project mention: I’ve been struggling with organizing projects and utilizing classes so I’ve been looking for public projects I can study | /r/learnpython | 2023-04-05
  • pypinfo

    Easily view PyPI download statistics via Google's BigQuery.

  • fitter

    Fit data to many distributions

    Project mention: I recently discovered the python package 'fitter', which is a really nifty package for fitting various data distributions. Has anyone discovered any other cool packages that the field would find useful? | /r/datascience | 2023-04-17

    Fitter: cokelaer/fitter: Fit data to many distributions (

  • meteostat-python

    Access and analyze historical weather and climate data with Python.

    Project mention: Povijesni vremenski podaci | /r/croatia | 2023-06-15

    Probaj s:

  • Contributions-Importer-For-Github

    This tool helps users to import contributions to GitHub from private git repositories, or from public repositories that are not hosted in GitHub.

  • github-repo-stats

    GitHub Action for advanced repository traffic analysis and reporting

    Project mention: How I Fixed GitHub's Repo Traffic Insights 🛠️ 📊 | | 2023-12-03

    Within the discussion, I came across a GitHub action tool that fetches traffic data and stores it in a CSV file, also generating a PDF report:

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-01-08.

Python Statistics related posts


What are some of the best open-source Statistic projects in Python? This list will help you:

Project Stars
1 scikit-learn 57,481
2 ydata-profiling 11,837
3 statsmodels 9,331
4 imbalanced-learn 6,642
5 boltons 6,376
6 Tautulli 5,267
7 statsforecast 3,403
8 sweetviz 2,789
9 eiten 2,655
10 github-stats 2,640
11 pgmpy 2,581
12 lifetimes 1,418
13 pycm 1,417
14 uncertainty-baselines 1,337
15 maloja 892
16 hierarchicalforecast 490
17 popmon 481
18 sportsipy 462
19 pypinfo 390
20 fitter 347
21 meteostat-python 341
22 Contributions-Importer-For-Github 326
23 github-repo-stats 273
Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.