Top 23 Statistic Open-Source Projects

scikit-learn

81 58,130 9.9 Python

scikit-learn: machine learning in Python

Project mention: AutoCodeRover resolves 22% of real-world GitHub in SWE-bench lite | news.ycombinator.com | 2024-04-09

Thank you for your interest. There are some interesting examples in the SWE-bench-lite benchmark which are resolved by AutoCodeRover:
- From sympy: https://github.com/sympy/sympy/issues/13643. AutoCodeRover's patch for it: https://github.com/nus-apr/auto-code-rover/blob/main/results...
- Another one from scikit-learn: https://github.com/scikit-learn/scikit-learn/issues/13070. AutoCodeRover's patch (https://github.com/nus-apr/auto-code-rover/blob/main/results...) modified a few lines below (compared to the developer patch) and wrote a different comment.
There are more examples in the results directory (https://github.com/nus-apr/auto-code-rover/tree/main/results).

Probabilistic-Programming-and-Bayesian-Methods-for-Hackers

30 26,382 0.0 Jupyter Notebook

aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)

Project mention: Probabilistic Programming and Bayesian Methods for Hackers (2013) | news.ycombinator.com | 2024-02-10

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Umami

113 19,654 9.8 TypeScript

Umami is a simple, fast, privacy-focused alternative to Google Analytics.

Project mention: Any Google Analytics Alternatives? | news.ycombinator.com | 2024-05-01

Another open source alternative similar to Plausible is https://umami.is/

Plausible Analytics

305 18,415 9.8 Elixir

Simple, open source, lightweight (< 1 KB) and privacy-friendly web analytics alternative to Google Analytics.

Project mention: Any Google Analytics Alternatives? | news.ycombinator.com | 2024-05-01

I think a single Google Analytics alternative is pretty hard to pick considering that GA can be used to very much varying extents.
For simple and "detailed enough" insights, I enjoyed using Plausible (https://plausible.io/) in the past.
For more in depth analytics that give you a detailed view into your own product, PostHog.com seems to be by far the best and most popular option out there.

excelize

15 17,311 8.8 Go

Go language library for reading and writing Microsoft Excel™ (XLAM / XLSM / XLSX / XLTM / XLTX) spreadsheets

Project mention: Recommend a powerful excel processing library, @zurmokeeper/exceljs, which supports encryption and decryption of xlsx files and flexible setting of multiple table headers when exporting, etc. | /r/node | 2023-07-01

Then I found out that WPS only supports ecma376 standard encrytion for xlsx files. Then I referred to the official documentation and libraries in other languages, such as msoffcrypto-tool written in python. msoffcrypto-tool) and go's excelize. Since I don't know much about encryption and decryption, the process of implementation is also a bit of a twist.

ydata-profiling

43 12,053 8.5 Python

1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.

Project mention: FLaNK 25 December 2023 | dev.to | 2023-12-26

tokei

30 10,006 5.7 Rust

Count your code, quickly.

Project mention: XAMPPRocky/tokei: Count your code, quickly | news.ycombinator.com | 2024-04-09

SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
statsmodels

8 9,557 9.4 Python

Statsmodels: statistical modeling and econometrics in Python
miller

63 8,559 9.0 Go

Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON

Project mention: Qsv: Efficient CSV CLI Toolkit | news.ycombinator.com | 2023-12-22

gonum

24 7,272 8.3 Go

Gonum is a set of numeric libraries for the Go programming language. It contains libraries for matrices, statistics, optimization, and more

Project mention: How to set up interface to accept multi-dimension array? | /r/golang | 2023-07-13

But if you want to see what can be done for numeric stuff, check out gonum. Personally, I still wouldn't use Go, and I rather suspect it's still pretty easy to reach for something like what you're trying to do and not find it because Go just can't write that type sensibly, but you can at least see what is available, written by people who disagree with me about Go not being a great language for this.

imbalanced-learn

1 6,703 7.5 Python

A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

Project mention: What’s your approach to highly imbalanced data sets? | /r/datascience | 2023-05-26

There's a pletora of undersampling and oversampling models you can try out. To avoid removing information form the dataset, you can focus on oversampling techniques. You can try imbalanced-learn or smote-variants. Given enough data, using fully synthetic data is also an option, you can check ydata-synthetic for it. Let us know how it turned out!

boltons

1 6,417 8.0 Python

🔩 Like builtins, but boltons. 250+ constructs, recipes, and snippets which extend (and rely on nothing but) the Python standard library. Nothing like Michael Bolton.

Project mention: Boltons is a set of over 250 BSD-licensed, pure-Python utilities | news.ycombinator.com | 2023-12-11

git-quick-stats

8 6,156 4.8 Shell

▁▅▆▃▅ Git quick statistics is a simple and efficient way to access various statistics in git repository.
scc

19 6,103 8.2 Go

Sloc, Cloc and Code: scc is a very fast accurate code counter with complexity calculations and COCOMO estimates written in pure Go

Project mention: Scc: A fast code counter with complexity calculations and COCOMO estimates | news.ycombinator.com | 2024-04-23

Smile

9 5,925 9.8 Java

Statistical Machine Intelligence & Learning Engine

Project mention: The Current State of Clojure's Machine Learning Ecosystem | news.ycombinator.com | 2024-04-07

> I don't think it's right to recommend that new users move away from the package because of licensing issues
I was going to chime in to agree but then I saw how this was done - a completely innocuous looking commit:
https://github.com/haifengl/smile/commit/6f22097b233a3436519...
And literally no mention in the release notes:
https://github.com/haifengl/smile/releases/tag/v3.0.0
I think if you are going to change license especially in a way that makes it less permissive you need to be super open and clear about both the fact you are doing it and your reasons for that. This is done so silently as to look like it is intentionally trying to mislead and trick people.
So maybe I wouldn't say to move away because of the specific license, but it's legitimate to avoid something when it's so clearly driven by a single entity and that entity acts in a way that isn't trustworthy.

growthbook

30 5,549 9.8 TypeScript

Open Source Feature Flagging and A/B Testing Platform

Project mention: GrowthBook: Open-source feature flagging and A/B testing platform | /r/opensource | 2023-10-20

Tautulli

419 5,371 8.3 Python

A Python based monitoring and tracking tool for Plex Media Server.

Project mention: I'm fine with the basics of Plex - now what can I do to really use plex to it's full potential? | /r/PleX | 2023-12-09

With Tautulli you have a better monitoring system than what Plex offers. Streaming history split by user, you can add notifications to a lot of services like Slack, email and so on. You can even create newsletters being sent out to users based on what was added to your server.

criterion.rs

30 4,170 6.5 Rust

Statistics-driven benchmarking library for Rust

Project mention: How to benchmark in Rust with libtest bench | /r/bencher | 2023-12-03

The three popular options for benchmarking in Rust are: libtest bench, Criterion, and Iai.

probability

10 4,133 9.3 Jupyter Notebook

Probabilistic reasoning and statistical analysis in TensorFlow

Project mention: How often do you see Bayesian Statistics or Stan in the DS world? Essential skill or a nice to have? | /r/datascience | 2023-06-17

TensorFlow-Probability

datascience

4 4,071 8.3

Curated list of Python resources for data science.
stdlib

9 4,026 10.0 JavaScript

✨ Standard library for JavaScript and Node.js. ✨

Project mention: Node still seems better than python after all this time for web server speed but.. | /r/node | 2023-06-20

Numpy is a library - node.js has plenty of them, what is missing? There is stdlib package that offers optimized math functions, for example.

statsforecast

58 3,565 8.9 Python

Lightning ⚡️ fast forecasting with statistical and econometric models.

Project mention: TimeGPT-1 | news.ycombinator.com | 2023-10-13

I can't find the TimeGPT-1 model.
LICENSE Apache-2
https://github.com/Nixtla/statsforecast/blob/main/LICENSE
Mentions ARIMA, ETS, CES, and Theta modeling

Tablesaw

4 3,442 4.3 Java

Java dataframe and visualization library
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Statistics related posts

Any Google Analytics Alternatives?

3 projects | news.ycombinator.com | 1 May 2024
We need to Speak about Google Code Quality

2 projects | dev.to | 24 Apr 2024
Show HN: Open-Source Ad-Free File Upload Service

1 project | news.ycombinator.com | 22 Apr 2024
Plausible as an alternative to Google Analytics

2 projects | dev.to | 18 Apr 2024
Umami: Best free Go-To Google Analytics Alternative

1 project | dev.to | 11 Apr 2024
Frouros: An open-source Python library for drift detection in machine learning

1 project | news.ycombinator.com | 6 Apr 2024
Simple no bs persistent notepad

2 projects | news.ycombinator.com | 16 Mar 2024
A note from our sponsor - InfluxDB
www.influxdata.com | 4 May 2024

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source Statistic projects? This list will help you:

	Project	Stars
1	scikit-learn	58,130
2	Probabilistic-Programming-and-Bayesian-Methods-for-Hackers	26,382
3	Umami	19,654
4	Plausible Analytics	18,415
5	excelize	17,311
6	ydata-profiling	12,053
7	tokei	10,006
8	statsmodels	9,557
9	miller	8,559
10	gonum	7,272
11	imbalanced-learn	6,703
12	boltons	6,417
13	git-quick-stats	6,156
14	scc	6,103
15	Smile	5,925
16	growthbook	5,549
17	Tautulli	5,371
18	criterion.rs	4,170
19	probability	4,133
20	datascience	4,071
21	stdlib	4,026
22	statsforecast	3,565
23	Tablesaw	3,442

Statistics

Top 23 Statistic Open-Source Projects

Statistics related posts

Any Google Analytics Alternatives?

We need to Speak about Google Code Quality

Show HN: Open-Source Ad-Free File Upload Service

Plausible as an alternative to Google Analytics

Umami: Best free Go-To Google Analytics Alternative

Frouros: An open-source Python library for drift detection in machine learning

Simple no bs persistent notepad

Index