Statistics

Top 23 Statistic Open-Source Projects

Statistics
  1. scikit-learn

    scikit-learn: machine learning in Python

    Project mention: 10 Useful Tools and Libraries for Python Developers | dev.to | 2025-03-29

    7. Scikit-learn - Machine Learning

  2. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  3. Probabilistic-Programming-and-Bayesian-Methods-for-Hackers

    aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)

  4. Umami

    Umami is a modern, privacy-focused alternative to Google Analytics.

    Project mention: Your Guide To Using Open Source Software as an Indie Developer | dev.to | 2025-05-25

    There was a time when open source software meant “functional, but clunky.” That’s changed. Tools like Plausible (analytics), N8N (automation), Umami (web stats), and Vaultwarden (password manager) are beautifully built, stable, and powerful. Many match or even beat their commercial alternatives.

  5. Plausible Analytics

    Simple, open source, lightweight and privacy-friendly web analytics alternative to Google Analytics.

    Project mention: Your Guide To Using Open Source Software as an Indie Developer | dev.to | 2025-05-25

    There was a time when open source software meant “functional, but clunky.” That’s changed. Tools like Plausible (analytics), N8N (automation), Umami (web stats), and Vaultwarden (password manager) are beautifully built, stable, and powerful. Many match or even beat their commercial alternatives.

  6. excelize

    Go language library for reading and writing Microsoft Excel™ (XLAM / XLSM / XLSX / XLTM / XLTX) spreadsheets

  7. ydata-profiling

    1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.

    Project mention: The DuckDB Local UI | news.ycombinator.com | 2025-03-12

    WhatTheDuck does SQL with duckdb-wasm IIRC

    Pygwalker does open-source descriptive statistics and charts from pandas dataframes: https://github.com/Kanaries/pygwalker

    ydata-profiling does Exploratory Data Analysis (EDA) with Pandas and Spark DataFrames and integrates with various apps: https://github.com/ydataai/ydata-profiling

  8. tokei

    Count your code, quickly.

    Project mention: Tokei: Count Your Code, Quickly | news.ycombinator.com | 2025-05-24
  9. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  10. statsmodels

    Statsmodels: statistical modeling and econometrics in Python

    Project mention: The Truth About Linear Regression | news.ycombinator.com | 2024-07-30

    statsmodels is the closest thing in python to R. statsmodels has mixed model support, but mgcv apparently requires more. It is well above my paygrade, but this seems relevant: https://github.com/statsmodels/statsmodels/issues/8029 (i.e. no out of the box support, you might be able to build an approximation on your own).

  11. miller

    Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON

    Project mention: XAN: A Modern CSV-Centric Data Manipulation Toolkit for the Terminal | news.ycombinator.com | 2025-03-27

    I recently came across https://github.com/johnkerl/miller. I don't know how these tools compare.

  12. gonum

    Gonum is a set of numeric libraries for the Go programming language. It contains libraries for matrices, statistics, optimization, and more

  13. scc

    Sloc, Cloc and Code: scc is a very fast accurate code counter with complexity calculations and COCOMO estimates written in pure Go

    Project mention: Tokei: Count Your Code, Quickly | news.ycombinator.com | 2025-05-24

    Related: https://github.com/boyter/scc , which can also separately count code that is generated (based on keywords in the files).

    This is useful in cases where API layers (think protobuf -> target lang) are generated by a complier, and you want to know how much code is manually created.

  14. imbalanced-learn

    A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

  15. boltons

    🔩 Like builtins, but boltons. 250+ constructs, recipes, and snippets which extend (and rely on nothing but) the Python standard library. Nothing like Michael Bolton.

  16. growthbook

    Open Source Feature Flagging and A/B Testing Platform

    Project mention: Progressive frustration | dev.to | 2025-04-18

    I first tried to use growthbook. They had only react support. I thought - I could use the js sdk and work around it. Ok fine. It seemed a bit complicated to use in terms of their UI. Okay fine, I try to find an easier one maybe I can self-host. That way I could even put it behind cloudflare CDN and use caching on it and clever cache-busting when I change values could help propagate changes. Okay fine I have a plan. I ended up going with Flagsmith instead. It was even easier. Perfect.

  17. git-quick-stats

    ▁▅▆▃▅ Git quick statistics is a simple and efficient way to access various statistics in git repository.

    Project mention: Show HN: Simplest Git Statistics in CLI | news.ycombinator.com | 2025-06-17
  18. Smile

    Statistical Machine Intelligence & Learning Engine

    Project mention: Kotlin for AI-Powered App Development | dev.to | 2025-05-23

    Kotlin can use any Java library, giving you access to powerful machine learning frameworks like DeepLearning4J, Smile, and Weka.

  19. Tautulli

    A Python based monitoring and tracking tool for Plex Media Server.

  20. evidence

    Business intelligence as code: build fast, interactive data visualizations in SQL and markdown

    Project mention: Data viz library built with Apache ECharts, Leaflet, and shadcn | news.ycombinator.com | 2025-04-12

    It would be better to link to the main page, https://evidence.dev/, which is titled "Evidence - Business intelligence as code".

  21. stdlib

    ✨ Standard library for JavaScript and Node.js. ✨

    Project mention: GSoC 2025 Projects Announced | dev.to | 2025-05-08

    We hope that you'll join us in our mission to advance cutting-edge scientific computation in JavaScript. Start by showing your support and starring the project on GitHub today: https://github.com/stdlib-js/stdlib.

  22. criterion.rs

    Statistics-driven benchmarking library for Rust

    Project mention: Parsing JSON in 500 lines of Rust | news.ycombinator.com | 2025-02-18

    Regarding the 'sudo' issue: Doing a benchmark by just running an example executable is not really recommended because there's a ton of reasons why you might get differing performance.

    It's probably better to set up an actual benchmark using a crate like Criterion instead [0].

    [0] https://github.com/bheisler/criterion.rs

  23. statsforecast

    Lightning ⚡️ fast forecasting with statistical and econometric models.

    Project mention: This Week In Python | dev.to | 2025-03-21

    statsforecast – Forecasting with statistical and econometric models

  24. datascience

    Curated list of Python resources for data science. (by r0f1)

  25. probability

    Probabilistic reasoning and statistical analysis in TensorFlow

  26. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Statistics discussion

Log in or Post with

Statistics related posts

Index

What are some of the best open-source Statistic projects? This list will help you:

# Project Stars
1 scikit-learn 62,340
2 Probabilistic-Programming-and-Bayesian-Methods-for-Hackers 27,488
3 Umami 26,836
4 Plausible Analytics 22,693
5 excelize 19,279
6 ydata-profiling 12,975
7 tokei 12,623
8 statsmodels 10,732
9 miller 9,321
10 gonum 8,042
11 scc 7,414
12 imbalanced-learn 6,999
13 boltons 6,633
14 growthbook 6,638
15 git-quick-stats 6,556
16 Smile 6,194
17 Tautulli 5,992
18 evidence 5,293
19 stdlib 5,215
20 criterion.rs 5,076
21 statsforecast 4,408
22 datascience 4,403
23 probability 4,341

Sponsored
InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com

Did you know that Python is
the 2nd most popular programming language
based on number of references?