Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 23 Statistic Open-Source Projects
-
Probabilistic-Programming-and-Bayesian-Methods-for-Hackers
aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
Plausible Analytics
Simple, open source, lightweight (< 1 KB) and privacy-friendly web analytics alternative to Google Analytics.
-
excelize
Go language library for reading and writing Microsoft Excel™ (XLAM / XLSM / XLSX / XLTM / XLTX) spreadsheets
-
ydata-profiling
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
miller
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
-
gonum
Gonum is a set of numeric libraries for the Go programming language. It contains libraries for matrices, statistics, optimization, and more
-
boltons
🔩 Like builtins, but boltons. 250+ constructs, recipes, and snippets which extend (and rely on nothing but) the Python standard library. Nothing like Michael Bolton.
-
git-quick-stats
▁▅▆▃▅ Git quick statistics is a simple and efficient way to access various statistics in git repository.
-
scc
Sloc, Cloc and Code: scc is a very fast accurate code counter with complexity calculations and COCOMO estimates written in pure Go
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: AutoCodeRover resolves 22% of real-world GitHub in SWE-bench lite | news.ycombinator.com | 2024-04-09Thank you for your interest. There are some interesting examples in the SWE-bench-lite benchmark which are resolved by AutoCodeRover:
- From sympy: https://github.com/sympy/sympy/issues/13643. AutoCodeRover's patch for it: https://github.com/nus-apr/auto-code-rover/blob/main/results...
- Another one from scikit-learn: https://github.com/scikit-learn/scikit-learn/issues/13070. AutoCodeRover's patch (https://github.com/nus-apr/auto-code-rover/blob/main/results...) modified a few lines below (compared to the developer patch) and wrote a different comment.
There are more examples in the results directory (https://github.com/nus-apr/auto-code-rover/tree/main/results).
Project mention: Probabilistic Programming and Bayesian Methods for Hackers (2013) | news.ycombinator.com | 2024-02-10
Another open source alternative similar to Plausible is https://umami.is/
I think a single Google Analytics alternative is pretty hard to pick considering that GA can be used to very much varying extents.
For simple and "detailed enough" insights, I enjoyed using Plausible (https://plausible.io/) in the past.
For more in depth analytics that give you a detailed view into your own product, PostHog.com seems to be by far the best and most popular option out there.
Project mention: Recommend a powerful excel processing library, @zurmokeeper/exceljs, which supports encryption and decryption of xlsx files and flexible setting of multiple table headers when exporting, etc. | /r/node | 2023-07-01Then I found out that WPS only supports ecma376 standard encrytion for xlsx files. Then I referred to the official documentation and libraries in other languages, such as msoffcrypto-tool written in python. msoffcrypto-tool) and go's excelize. Since I don't know much about encryption and decryption, the process of implementation is also a bit of a twist.
But if you want to see what can be done for numeric stuff, check out gonum. Personally, I still wouldn't use Go, and I rather suspect it's still pretty easy to reach for something like what you're trying to do and not find it because Go just can't write that type sensibly, but you can at least see what is available, written by people who disagree with me about Go not being a great language for this.
There's a pletora of undersampling and oversampling models you can try out. To avoid removing information form the dataset, you can focus on oversampling techniques. You can try imbalanced-learn or smote-variants. Given enough data, using fully synthetic data is also an option, you can check ydata-synthetic for it. Let us know how it turned out!
Project mention: Boltons is a set of over 250 BSD-licensed, pure-Python utilities | news.ycombinator.com | 2023-12-11
Project mention: Scc: A fast code counter with complexity calculations and COCOMO estimates | news.ycombinator.com | 2024-04-23
Project mention: The Current State of Clojure's Machine Learning Ecosystem | news.ycombinator.com | 2024-04-07> I don't think it's right to recommend that new users move away from the package because of licensing issues
I was going to chime in to agree but then I saw how this was done - a completely innocuous looking commit:
https://github.com/haifengl/smile/commit/6f22097b233a3436519...
And literally no mention in the release notes:
https://github.com/haifengl/smile/releases/tag/v3.0.0
I think if you are going to change license especially in a way that makes it less permissive you need to be super open and clear about both the fact you are doing it and your reasons for that. This is done so silently as to look like it is intentionally trying to mislead and trick people.
So maybe I wouldn't say to move away because of the specific license, but it's legitimate to avoid something when it's so clearly driven by a single entity and that entity acts in a way that isn't trustworthy.
Project mention: GrowthBook: Open-source feature flagging and A/B testing platform | /r/opensource | 2023-10-20
Project mention: I'm fine with the basics of Plex - now what can I do to really use plex to it's full potential? | /r/PleX | 2023-12-09With Tautulli you have a better monitoring system than what Plex offers. Streaming history split by user, you can add notifications to a lot of services like Slack, email and so on. You can even create newsletters being sent out to users based on what was added to your server.
The three popular options for benchmarking in Rust are: libtest bench, Criterion, and Iai.
Project mention: How often do you see Bayesian Statistics or Stan in the DS world? Essential skill or a nice to have? | /r/datascience | 2023-06-17TensorFlow-Probability
Project mention: Node still seems better than python after all this time for web server speed but.. | /r/node | 2023-06-20Numpy is a library - node.js has plenty of them, what is missing? There is stdlib package that offers optimized math functions, for example.
I can't find the TimeGPT-1 model.
LICENSE Apache-2
https://github.com/Nixtla/statsforecast/blob/main/LICENSE
Mentions ARIMA, ETS, CES, and Theta modeling
Statistics related posts
-
Any Google Analytics Alternatives?
-
We need to Speak about Google Code Quality
-
Show HN: Open-Source Ad-Free File Upload Service
-
Plausible as an alternative to Google Analytics
-
Umami: Best free Go-To Google Analytics Alternative
-
Frouros: An open-source Python library for drift detection in machine learning
-
Simple no bs persistent notepad
-
A note from our sponsor - InfluxDB
www.influxdata.com | 4 May 2024
Index
What are some of the best open-source Statistic projects? This list will help you:
Project | Stars | |
---|---|---|
1 | scikit-learn | 58,130 |
2 | Probabilistic-Programming-and-Bayesian-Methods-for-Hackers | 26,382 |
3 | Umami | 19,654 |
4 | Plausible Analytics | 18,415 |
5 | excelize | 17,311 |
6 | ydata-profiling | 12,053 |
7 | tokei | 10,006 |
8 | statsmodels | 9,557 |
9 | miller | 8,559 |
10 | gonum | 7,272 |
11 | imbalanced-learn | 6,703 |
12 | boltons | 6,417 |
13 | git-quick-stats | 6,156 |
14 | scc | 6,103 |
15 | Smile | 5,925 |
16 | growthbook | 5,549 |
17 | Tautulli | 5,371 |
18 | criterion.rs | 4,170 |
19 | probability | 4,133 |
20 | datascience | 4,071 |
21 | stdlib | 4,026 |
22 | statsforecast | 3,565 |
23 | Tablesaw | 3,442 |
Sponsored