Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 23 Statistic Open-Source Projects
-
sklearn is adding support through the dataframe interchange protocol (https://github.com/scikit-learn/scikit-learn/issues/25896). scipy, as far as I know, doesn't explicitly support dataframes (it just happens to work when you wrap a Series in `np.array` or `np.asarray`). I don't know about PyTorch but in general you can convert to numpy.
-
Probabilistic-Programming-and-Bayesian-Methods-for-Hackers
aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)
Project mention: Probabilistic Programming and Bayesian Methods for Hackers (2013) | news.ycombinator.com | 2024-02-10 -
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
Project mention: 15 open-source tools to elevate your software design workflow | dev.to | 2024-01-22
Link | Demo | Github | License
-
Plausible Analytics
Simple, open source, lightweight (< 1 KB) and privacy-friendly web analytics alternative to Google Analytics.
No clue what you mean, browser cache might even clear itself without you doing anything manually. This thing makes no sense.
Nowhere ever did it say Tech Demo anywhere, not in the HN headline, not on the page itself. No, thanks. And even as a tech demo, there is nothing impressive going in. It is stores shit to local storage, I guess. Lol, I just looked this up, and it was in Firefox on 2009 already? WHAT? https://developer.mozilla.org/en-US/docs/Web/API/Window/loca... I never used it myself directly, but I remember reading about some API that kind of is the new version of cookies that can store more and better and I think that is it. 2009, I would swear what I think about was newer, maybe I am mixing something up, maybe not.
It has unnecessarily tracking from the comment above, not sure if it even sends all your notes to https://plausible.io, and I do not care. For me, this fails as a tech demo or whatever the fuck It's supposed to be. Sorry to not get all excited about everything posted here. In 2009 it for sure would ;)
-
excelize
Go language library for reading and writing Microsoft Excel™ (XLAM / XLSM / XLSX / XLTM / XLTX) spreadsheets
Project mention: Recommend a powerful excel processing library, @zurmokeeper/exceljs, which supports encryption and decryption of xlsx files and flexible setting of multiple table headers when exporting, etc. | /r/node | 2023-07-01Then I found out that WPS only supports ecma376 standard encrytion for xlsx files. Then I referred to the official documentation and libraries in other languages, such as msoffcrypto-tool written in python. msoffcrypto-tool) and go's excelize. Since I don't know much about encryption and decryption, the process of implementation is also a bit of a twist.
-
ydata-profiling
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
-
Project mention: The Linux Kernel Prepares for Rust 1.77 Upgrade | news.ycombinator.com | 2024-02-18
So If we would only count code and not comments, it is only 9489 LoC Rust. Which would be about 0.03% and if we take all lines and not only LoC it would be around 0.05%
[0] https://github.com/XAMPPRocky/tokei
[1] https://github.com/torvalds/linux/commit/b401b621758e46812da...
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
-
miller
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
-
gonum
Gonum is a set of numeric libraries for the Go programming language. It contains libraries for matrices, statistics, optimization, and more
But if you want to see what can be done for numeric stuff, check out gonum. Personally, I still wouldn't use Go, and I rather suspect it's still pretty easy to reach for something like what you're trying to do and not find it because Go just can't write that type sensibly, but you can at least see what is available, written by people who disagree with me about Go not being a great language for this.
-
There's a pletora of undersampling and oversampling models you can try out. To avoid removing information form the dataset, you can focus on oversampling techniques. You can try imbalanced-learn or smote-variants. Given enough data, using fully synthetic data is also an option, you can check ydata-synthetic for it. Let us know how it turned out!
-
boltons
🔩 Like builtins, but boltons. 250+ constructs, recipes, and snippets which extend (and rely on nothing but) the Python standard library. Nothing like Michael Bolton.
Project mention: Boltons is a set of over 250 BSD-licensed, pure-Python utilities | news.ycombinator.com | 2023-12-11 -
git-quick-stats
▁▅▆▃▅ Git quick statistics is a simple and efficient way to access various statistics in git repository.
-
Check out Smile too.
-
scc
Sloc, Cloc and Code: scc is a very fast accurate code counter with complexity calculations and COCOMO estimates written in pure Go
View on GitHub
-
Project mention: GrowthBook: Open-source feature flagging and A/B testing platform | /r/opensource | 2023-10-20
-
Project mention: I'm fine with the basics of Plex - now what can I do to really use plex to it's full potential? | /r/PleX | 2023-12-09
With Tautulli you have a better monitoring system than what Plex offers. Streaming history split by user, you can add notifications to a lot of services like Slack, email and so on. You can even create newsletters being sent out to users based on what was added to your server.
-
The three popular options for benchmarking in Rust are: libtest bench, Criterion, and Iai.
-
Project mention: How often do you see Bayesian Statistics or Stan in the DS world? Essential skill or a nice to have? | /r/datascience | 2023-06-17
TensorFlow-Probability
-
-
Project mention: Node still seems better than python after all this time for web server speed but.. | /r/node | 2023-06-20
Numpy is a library - node.js has plenty of them, what is missing? There is stdlib package that offers optimized math functions, for example.
-
I can't find the TimeGPT-1 model.
LICENSE Apache-2
https://github.com/Nixtla/statsforecast/blob/main/LICENSE
Mentions ARIMA, ETS, CES, and Theta modeling
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Statistics related posts
- Simple no bs persistent notepad
- Analyzing Spotify Stream History
- One Worker to Track Them All: Injecting Analytics Scripts into Multiple Websites with Cloudflare Workers
- Ask HN: How to Do a GitHub Wrapped?
- Using Analytics on My Website
- [D] Major bug in Scikit-Learn's implementation of F-1 score
- Is there a downside to Vercel Analytics?
-
A note from our sponsor - InfluxDB
www.influxdata.com | 28 Mar 2024
Index
What are some of the best open-source Statistic projects? This list will help you:
Project | Stars | |
---|---|---|
1 | scikit-learn | 57,747 |
2 | Probabilistic-Programming-and-Bayesian-Methods-for-Hackers | 26,288 |
3 | Umami | 19,188 |
4 | Plausible Analytics | 18,032 |
5 | excelize | 17,106 |
6 | ydata-profiling | 11,923 |
7 | tokei | 9,737 |
8 | statsmodels | 9,435 |
9 | miller | 8,510 |
10 | gonum | 7,204 |
11 | imbalanced-learn | 6,669 |
12 | boltons | 6,395 |
13 | git-quick-stats | 6,113 |
14 | Smile | 5,904 |
15 | scc | 5,826 |
16 | growthbook | 5,442 |
17 | Tautulli | 5,310 |
18 | criterion.rs | 4,094 |
19 | probability | 4,057 |
20 | datascience | 4,037 |
21 | stdlib | 3,941 |
22 | statsforecast | 3,468 |
23 | Tablesaw | 3,425 |