Top 23 Statistic Open-Source Projects

  • scikit-learn

    scikit-learn: machine learning in Python

  • Project mention: How to Build a Logistic Regression Model: A Spam-filter Tutorial | | 2024-05-05

    Online Courses: Coursera: "Machine Learning" by Andrew Ng edX: "Introduction to Machine Learning" by MIT Tutorials: Scikit-learn documentation: Kaggle Learn: Books: "Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow" by Aurélien Géron "The Elements of Statistical Learning" by Trevor Hastie, Robert Tibshirani, and Jerome Friedman By understanding the core concepts of logistic regression, its limitations, and exploring further resources, you'll be well-equipped to navigate the exciting world of machine learning!

  • Probabilistic-Programming-and-Bayesian-Methods-for-Hackers

    aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)

  • Project mention: Probabilistic Programming and Bayesian Methods for Hackers (2013) | | 2024-02-10
  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • Umami

    Umami is a simple, fast, privacy-focused alternative to Google Analytics.

  • Project mention: Any Google Analytics Alternatives? | | 2024-05-01

    Another open source alternative similar to Plausible is

  • Plausible Analytics

    Simple, open source, lightweight (< 1 KB) and privacy-friendly web analytics alternative to Google Analytics.

  • Project mention: Time Series Analysis of Plausible Data | | 2024-05-21

    # Function to get Plausible Analytics timeseries data def get_plausible_timeseries_data(): # Calculate the date range for the last 90 days date_to ='%Y-%m-%d') date_from = ( - timedelta(days=90)).strftime('%Y-%m-%d') # Setting the metrics we want to look at metrics='visitors,pageviews' # Actually pulling the data we want url = f"{site_id}&period=custom&date={date_from},{date_to}&metrics={metrics}" headers = { "Authorization": f"Bearer {api_key}" } response = requests.get(url, headers=headers) data = response.json() # Putting the data into a dataframe we can use for analysis results = data['results'] df = pd.DataFrame(results) # Adjusting the date field so we can avoid future warnings and be more accurate df['date'] = pd.to_datetime(df['date']) return df

  • excelize

    Go language library for reading and writing Microsoft Excel™ (XLAM / XLSM / XLSX / XLTM / XLTX) spreadsheets

  • Project mention: Recommend a powerful excel processing library, @zurmokeeper/exceljs, which supports encryption and decryption of xlsx files and flexible setting of multiple table headers when exporting, etc. | /r/node | 2023-07-01

    Then I found out that WPS only supports ecma376 standard encrytion for xlsx files. Then I referred to the official documentation and libraries in other languages, such as msoffcrypto-tool written in python. msoffcrypto-tool) and go's excelize. Since I don't know much about encryption and decryption, the process of implementation is also a bit of a twist.

  • ydata-profiling

    1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.

  • Project mention: FLaNK 25 December 2023 | | 2023-12-26
  • tokei

    Count your code, quickly.

  • Project mention: XAMPPRocky/tokei: Count your code, quickly | | 2024-04-09
  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • statsmodels

    Statsmodels: statistical modeling and econometrics in Python

  • miller

    Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON

  • Project mention: Qsv: Efficient CSV CLI Toolkit | | 2023-12-22
  • gonum

    Gonum is a set of numeric libraries for the Go programming language. It contains libraries for matrices, statistics, optimization, and more

  • Project mention: How to set up interface to accept multi-dimension array? | /r/golang | 2023-07-13

    But if you want to see what can be done for numeric stuff, check out gonum. Personally, I still wouldn't use Go, and I rather suspect it's still pretty easy to reach for something like what you're trying to do and not find it because Go just can't write that type sensibly, but you can at least see what is available, written by people who disagree with me about Go not being a great language for this.

  • imbalanced-learn

    A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

  • Project mention: What’s your approach to highly imbalanced data sets? | /r/datascience | 2023-05-26

    There's a pletora of undersampling and oversampling models you can try out. To avoid removing information form the dataset, you can focus on oversampling techniques. You can try imbalanced-learn or smote-variants. Given enough data, using fully synthetic data is also an option, you can check ydata-synthetic for it. Let us know how it turned out!

  • boltons

    🔩 Like builtins, but boltons. 250+ constructs, recipes, and snippets which extend (and rely on nothing but) the Python standard library. Nothing like Michael Bolton.

  • Project mention: Boltons is a set of over 250 BSD-licensed, pure-Python utilities | | 2023-12-11
  • git-quick-stats

    ▁▅▆▃▅ Git quick statistics is a simple and efficient way to access various statistics in git repository.

  • scc

    Sloc, Cloc and Code: scc is a very fast accurate code counter with complexity calculations and COCOMO estimates written in pure Go

  • Project mention: Scc: A fast code counter with complexity calculations and COCOMO estimates | | 2024-04-23
  • Smile

    Statistical Machine Intelligence & Learning Engine

  • Project mention: The Current State of Clojure's Machine Learning Ecosystem | | 2024-04-07

    > I don't think it's right to recommend that new users move away from the package because of licensing issues

    I was going to chime in to agree but then I saw how this was done - a completely innocuous looking commit:

    And literally no mention in the release notes:

    I think if you are going to change license especially in a way that makes it less permissive you need to be super open and clear about both the fact you are doing it and your reasons for that. This is done so silently as to look like it is intentionally trying to mislead and trick people.

    So maybe I wouldn't say to move away because of the specific license, but it's legitimate to avoid something when it's so clearly driven by a single entity and that entity acts in a way that isn't trustworthy.

  • growthbook

    Open Source Feature Flagging and A/B Testing Platform

  • Project mention: GrowthBook: Open-source feature flagging and A/B testing platform | /r/opensource | 2023-10-20
  • Tautulli

    A Python based monitoring and tracking tool for Plex Media Server.

  • Project mention: I'm fine with the basics of Plex - now what can I do to really use plex to it's full potential? | /r/PleX | 2023-12-09

    With Tautulli you have a better monitoring system than what Plex offers. Streaming history split by user, you can add notifications to a lot of services like Slack, email and so on. You can even create newsletters being sent out to users based on what was added to your server.


    Statistics-driven benchmarking library for Rust

  • Project mention: How to benchmark in Rust with libtest bench | /r/bencher | 2023-12-03

    The three popular options for benchmarking in Rust are: libtest bench, Criterion, and Iai.

  • probability

    Probabilistic reasoning and statistical analysis in TensorFlow

  • Project mention: How often do you see Bayesian Statistics or Stan in the DS world? Essential skill or a nice to have? | /r/datascience | 2023-06-17


  • datascience

    Curated list of Python resources for data science.

  • stdlib

    ✨ Standard library for JavaScript and Node.js. ✨

  • Project mention: Node still seems better than python after all this time for web server speed but.. | /r/node | 2023-06-20

    Numpy is a library - node.js has plenty of them, what is missing? There is stdlib package that offers optimized math functions, for example.

  • statsforecast

    Lightning ⚡️ fast forecasting with statistical and econometric models.

  • Project mention: TimeGPT-1 | | 2023-10-13

    I can't find the TimeGPT-1 model.

    LICENSE Apache-2

    Mentions ARIMA, ETS, CES, and Theta modeling

  • Tablesaw

    Java dataframe and visualization library

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Statistics related posts

  • Time Series Analysis of Plausible Data

    1 project | | 21 May 2024
  • How to Build a Logistic Regression Model: A Spam-filter Tutorial

    1 project | | 5 May 2024
  • Any Google Analytics Alternatives?

    3 projects | | 1 May 2024
  • We need to Speak about Google Code Quality

    2 projects | | 24 Apr 2024
  • Show HN: Open-Source Ad-Free File Upload Service

    1 project | | 22 Apr 2024
  • Plausible as an alternative to Google Analytics

    2 projects | | 18 Apr 2024
  • Umami: Best free Go-To Google Analytics Alternative

    1 project | | 11 Apr 2024
  • A note from our sponsor - InfluxDB | 24 May 2024
    Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →


What are some of the best open-source Statistic projects? This list will help you:

Project Stars
1 scikit-learn 58,344
2 Probabilistic-Programming-and-Bayesian-Methods-for-Hackers 26,406
3 Umami 19,912
4 Plausible Analytics 18,611
5 excelize 17,415
6 ydata-profiling 12,101
7 tokei 10,282
8 statsmodels 9,608
9 miller 8,598
10 gonum 7,307
11 imbalanced-learn 6,720
12 boltons 6,427
13 git-quick-stats 6,181
14 scc 6,183
15 Smile 5,934
16 growthbook 5,605
17 Tautulli 5,404
18 4,205
19 probability 4,140
20 datascience 4,122
21 stdlib 4,046
22 statsforecast 3,599
23 Tablesaw 3,449

SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives