Top 23 Python Statistic Projects

scikit-learn

81 58,046 9.9 Python

scikit-learn: machine learning in Python

Project mention: AutoCodeRover resolves 22% of real-world GitHub in SWE-bench lite | news.ycombinator.com | 2024-04-09

Thank you for your interest. There are some interesting examples in the SWE-bench-lite benchmark which are resolved by AutoCodeRover:
- From sympy: https://github.com/sympy/sympy/issues/13643. AutoCodeRover's patch for it: https://github.com/nus-apr/auto-code-rover/blob/main/results...
- Another one from scikit-learn: https://github.com/scikit-learn/scikit-learn/issues/13070. AutoCodeRover's patch (https://github.com/nus-apr/auto-code-rover/blob/main/results...) modified a few lines below (compared to the developer patch) and wrote a different comment.
There are more examples in the results directory (https://github.com/nus-apr/auto-code-rover/tree/main/results).

ydata-profiling

43 12,022 8.5 Python

1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.

Project mention: FLaNK 25 December 2023 | dev.to | 2023-12-26

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
statsmodels

8 9,534 9.4 Python

Statsmodels: statistical modeling and econometrics in Python

Project mention: statsmodels Release Candidate 0.14.0rc0 tagged | /r/Python | 2023-04-26

imbalanced-learn

1 6,697 7.4 Python

A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

Project mention: What’s your approach to highly imbalanced data sets? | /r/datascience | 2023-05-26

There's a pletora of undersampling and oversampling models you can try out. To avoid removing information form the dataset, you can focus on oversampling techniques. You can try imbalanced-learn or smote-variants. Given enough data, using fully synthetic data is also an option, you can check ydata-synthetic for it. Let us know how it turned out!

boltons

1 6,415 6.6 Python

🔩 Like builtins, but boltons. 250+ constructs, recipes, and snippets which extend (and rely on nothing but) the Python standard library. Nothing like Michael Bolton.

Project mention: Boltons is a set of over 250 BSD-licensed, pure-Python utilities | news.ycombinator.com | 2023-12-11

Tautulli

419 5,361 8.3 Python

A Python based monitoring and tracking tool for Plex Media Server.

Project mention: I'm fine with the basics of Plex - now what can I do to really use plex to it's full potential? | /r/PleX | 2023-12-09

With Tautulli you have a better monitoring system than what Plex offers. Streaming history split by user, you can add notifications to a lot of services like Slack, email and so on. You can even create newsletters being sent out to users based on what was added to your server.

statsforecast

58 3,540 8.9 Python

Lightning ⚡️ fast forecasting with statistical and econometric models.

Project mention: TimeGPT-1 | news.ycombinator.com | 2023-10-13

I can't find the TimeGPT-1 model.
LICENSE Apache-2
https://github.com/Nixtla/statsforecast/blob/main/LICENSE
Mentions ARIMA, ETS, CES, and Theta modeling

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
sweetviz

1 2,833 6.7 Python

Visualize and compare datasets, target values and associations, with one line of code.
github-stats

5 2,713 9.5 Python

Better GitHub statistics images for your profile, with stats from private repos too

Project mention: Ask HN: How to Do a GitHub Wrapped? | news.ycombinator.com | 2023-12-19

I have done similar work using the GitHub APIs before. I recommend using their GraphQL explorer to develop your queries interactively. You may need to fall back on the REST API instead of the GraphQL one for certain stats.
https://docs.github.com/en/graphql/overview/explorer
You can also refer to my code here, which may already collect some of the statistics you're interested in.
https://github.com/jstrieb/github-stats/blob/master/github_s...
I predict the most annoying part of this project will be dealing with authentication. There are a handful of ways to do it, and the permissions can be finicky depending on what data you are fetching.
Best of luck!

eiten

4 2,655 0.0 Python

Statistical and Algorithmic Investing Strategies for Everyone
pgmpy

2 2,612 8.1 Python

Python Library for learning (Structure and Parameter), inference (Probabilistic and Causal), and simulations in Bayesian Networks.
lifetimes

1 1,433 0.0 Python

Lifetime value in Python
pycm

18 1,429 5.0 Python

Multi-class confusion matrix library in Python

Project mention: PyCM 4.0 Released: Multilabel Confusion Matrix Support | /r/coolgithubprojects | 2023-06-07

uncertainty-baselines

3 1,362 5.5 Python

High-quality implementations of standard and SOTA methods on a variety of tasks.
maloja

16 936 9.2 Python

Self-hosted music scrobble database to create personal listening statistics and charts

Project mention: Can I get charged for having FLAC content on my server hosted at Germany? | /r/selfhosted | 2023-07-07

With that recommendation, I want to add Maloja to the mix. (scrobble which works great with navidrome).

hierarchicalforecast

11 512 7.0 Python

Probabilistic Hierarchical forecasting 👑 with statistical and econometric methods.

Project mention: [D] When less is more in the hierarchical forecasting case. | /r/MachineLearning | 2023-07-03

popmon

1 485 6.9 Python

Monitor the stability of a Pandas or Spark dataframe ⚙︎
sportsipy

3 472 0.0 Python

A free sports API written for python
pypinfo

1 394 5.4 Python

Easily view PyPI download statistics via Google's BigQuery.
fitter

2 353 5.2 Python

Fit data to many distributions
meteostat-python

2 352 3.3 Python

Access and analyze historical weather and climate data with Python.

Project mention: Povijesni vremenski podaci | /r/croatia | 2023-06-15

Probaj s: https://github.com/meteostat/meteostat-python

Contributions-Importer-For-Github

2 337 6.5 Python

This tool helps users to import contributions to GitHub from private git repositories, or from public repositories that are not hosted in GitHub.
github-repo-stats

3 279 7.7 Python

GitHub Action for advanced repository traffic analysis and reporting

Project mention: How I Fixed GitHub's Repo Traffic Insights 🛠️ 📊 | dev.to | 2023-12-03

Within the discussion, I came across a GitHub action tool that fetches traffic data and stores it in a CSV file, also generating a PDF report:

SaaSHub

www.saashub.com sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Statistics related posts

Frouros: An open-source Python library for drift detection in machine learning
1 project | news.ycombinator.com | 6 Apr 2024
Ask HN: How to Do a GitHub Wrapped?
1 project | news.ycombinator.com | 19 Dec 2023
[D] Major bug in Scikit-Learn's implementation of F-1 score
2 projects | /r/MachineLearning | 8 Dec 2023
80% faster, 50% less memory, 0% loss of accuracy Llama finetuning
6 projects | news.ycombinator.com | 1 Dec 2023
Contraction Clustering (RASTER): A fast clustering algorithm
1 project | news.ycombinator.com | 27 Nov 2023
Cubic Spline Interpolation
1 project | news.ycombinator.com | 22 Oct 2023
TimeGPT-1
2 projects | news.ycombinator.com | 13 Oct 2023
A note from our sponsor - SaaSHub
www.saashub.com | 25 Apr 2024

SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Statistic projects in Python? This list will help you:

	Project	Stars
1	scikit-learn	58,046
2	ydata-profiling	12,022
3	statsmodels	9,534
4	imbalanced-learn	6,697
5	boltons	6,415
6	Tautulli	5,361
7	statsforecast	3,540
8	sweetviz	2,833
9	github-stats	2,713
10	eiten	2,655
11	pgmpy	2,612
12	lifetimes	1,433
13	pycm	1,429
14	uncertainty-baselines	1,362
15	maloja	936
16	hierarchicalforecast	512
17	popmon	485
18	sportsipy	472
19	pypinfo	394
20	fitter	353
21	meteostat-python	352
22	Contributions-Importer-For-Github	337
23	github-repo-stats	279