Statistics

Top 23 Statistic Open-Source Projects

  • scikit-learn

    scikit-learn: machine learning in Python

    Project mention: Polars | news.ycombinator.com | 2024-01-08

    sklearn is adding support through the dataframe interchange protocol (https://github.com/scikit-learn/scikit-learn/issues/25896). scipy, as far as I know, doesn't explicitly support dataframes (it just happens to work when you wrap a Series in `np.array` or `np.asarray`). I don't know about PyTorch but in general you can convert to numpy.

  • Probabilistic-Programming-and-Bayesian-Methods-for-Hackers

    aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)

    Project mention: Probabilistic Programming and Bayesian Methods for Hackers (2013) | news.ycombinator.com | 2024-02-10
  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

  • Umami

    Umami is a simple, fast, privacy-focused alternative to Google Analytics.

    Project mention: 15 open-source tools to elevate your software design workflow | dev.to | 2024-01-22

    Link | Demo | Github | License

  • Plausible Analytics

    Simple, open source, lightweight (< 1 KB) and privacy-friendly web analytics alternative to Google Analytics.

    Project mention: Simple no bs persistent notepad | news.ycombinator.com | 2024-03-16

    No clue what you mean, browser cache might even clear itself without you doing anything manually. This thing makes no sense.

    Nowhere ever did it say Tech Demo anywhere, not in the HN headline, not on the page itself. No, thanks. And even as a tech demo, there is nothing impressive going in. It is stores shit to local storage, I guess. Lol, I just looked this up, and it was in Firefox on 2009 already? WHAT? https://developer.mozilla.org/en-US/docs/Web/API/Window/loca... I never used it myself directly, but I remember reading about some API that kind of is the new version of cookies that can store more and better and I think that is it. 2009, I would swear what I think about was newer, maybe I am mixing something up, maybe not.

    It has unnecessarily tracking from the comment above, not sure if it even sends all your notes to https://plausible.io, and I do not care. For me, this fails as a tech demo or whatever the fuck It's supposed to be. Sorry to not get all excited about everything posted here. In 2009 it for sure would ;)

  • excelize

    Go language library for reading and writing Microsoft Excel™ (XLAM / XLSM / XLSX / XLTM / XLTX) spreadsheets

    Project mention: Recommend a powerful excel processing library, @zurmokeeper/exceljs, which supports encryption and decryption of xlsx files and flexible setting of multiple table headers when exporting, etc. | /r/node | 2023-07-01

    Then I found out that WPS only supports ecma376 standard encrytion for xlsx files. Then I referred to the official documentation and libraries in other languages, such as msoffcrypto-tool written in python. msoffcrypto-tool) and go's excelize. Since I don't know much about encryption and decryption, the process of implementation is also a bit of a twist.

  • ydata-profiling

    1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.

    Project mention: FLaNK 25 December 2023 | dev.to | 2023-12-26
  • tokei

    Count your code, quickly.

    Project mention: The Linux Kernel Prepares for Rust 1.77 Upgrade | news.ycombinator.com | 2024-02-18

    So If we would only count code and not comments, it is only 9489 LoC Rust. Which would be about 0.03% and if we take all lines and not only LoC it would be around 0.05%

    [0] https://github.com/XAMPPRocky/tokei

    [1] https://github.com/torvalds/linux/commit/b401b621758e46812da...

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

  • statsmodels

    Statsmodels: statistical modeling and econometrics in Python

    Project mention: statsmodels Release Candidate 0.14.0rc0 tagged | /r/Python | 2023-04-26
  • miller

    Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON

    Project mention: Qsv: Efficient CSV CLI Toolkit | news.ycombinator.com | 2023-12-22
  • gonum

    Gonum is a set of numeric libraries for the Go programming language. It contains libraries for matrices, statistics, optimization, and more

    Project mention: How to set up interface to accept multi-dimension array? | /r/golang | 2023-07-13

    But if you want to see what can be done for numeric stuff, check out gonum. Personally, I still wouldn't use Go, and I rather suspect it's still pretty easy to reach for something like what you're trying to do and not find it because Go just can't write that type sensibly, but you can at least see what is available, written by people who disagree with me about Go not being a great language for this.

  • imbalanced-learn

    A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

    Project mention: What’s your approach to highly imbalanced data sets? | /r/datascience | 2023-05-26

    There's a pletora of undersampling and oversampling models you can try out. To avoid removing information form the dataset, you can focus on oversampling techniques. You can try imbalanced-learn or smote-variants. Given enough data, using fully synthetic data is also an option, you can check ydata-synthetic for it. Let us know how it turned out!

  • boltons

    🔩 Like builtins, but boltons. 250+ constructs, recipes, and snippets which extend (and rely on nothing but) the Python standard library. Nothing like Michael Bolton.

    Project mention: Boltons is a set of over 250 BSD-licensed, pure-Python utilities | news.ycombinator.com | 2023-12-11
  • git-quick-stats

    ▁▅▆▃▅ Git quick statistics is a simple and efficient way to access various statistics in git repository.

  • Smile

    Statistical Machine Intelligence & Learning Engine

    Project mention: Need statistic test library for Spark Scala | /r/scala | 2023-05-05

    Check out Smile too.

  • scc

    Sloc, Cloc and Code: scc is a very fast accurate code counter with complexity calculations and COCOMO estimates written in pure Go

    Project mention: Essential Command Line Tools for Developers | dev.to | 2024-01-15

    View on GitHub

  • growthbook

    Open Source Feature Flagging and A/B Testing Platform

    Project mention: GrowthBook: Open-source feature flagging and A/B testing platform | /r/opensource | 2023-10-20
  • Tautulli

    A Python based monitoring and tracking tool for Plex Media Server.

    Project mention: I'm fine with the basics of Plex - now what can I do to really use plex to it's full potential? | /r/PleX | 2023-12-09

    With Tautulli you have a better monitoring system than what Plex offers. Streaming history split by user, you can add notifications to a lot of services like Slack, email and so on. You can even create newsletters being sent out to users based on what was added to your server.

  • criterion.rs

    Statistics-driven benchmarking library for Rust

    Project mention: How to benchmark in Rust with libtest bench | /r/bencher | 2023-12-03

    The three popular options for benchmarking in Rust are: libtest bench, Criterion, and Iai.

  • probability

    Probabilistic reasoning and statistical analysis in TensorFlow

    Project mention: How often do you see Bayesian Statistics or Stan in the DS world? Essential skill or a nice to have? | /r/datascience | 2023-06-17

    TensorFlow-Probability

  • datascience

    Curated list of Python resources for data science.

    Project mention: Datasciene Libraries for Python | news.ycombinator.com | 2023-04-16
  • stdlib

    ✨ Standard library for JavaScript and Node.js. ✨

    Project mention: Node still seems better than python after all this time for web server speed but.. | /r/node | 2023-06-20

    Numpy is a library - node.js has plenty of them, what is missing? There is stdlib package that offers optimized math functions, for example.

  • statsforecast

    Lightning ⚡️ fast forecasting with statistical and econometric models.

    Project mention: TimeGPT-1 | news.ycombinator.com | 2023-10-13

    I can't find the TimeGPT-1 model.

    LICENSE Apache-2

    https://github.com/Nixtla/statsforecast/blob/main/LICENSE

    Mentions ARIMA, ETS, CES, and Theta modeling

  • Tablesaw

    Java dataframe and visualization library

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-03-16.

Statistics related posts

Index

What are some of the best open-source Statistic projects? This list will help you:

Project Stars
1 scikit-learn 57,747
2 Probabilistic-Programming-and-Bayesian-Methods-for-Hackers 26,288
3 Umami 19,188
4 Plausible Analytics 18,032
5 excelize 17,106
6 ydata-profiling 11,923
7 tokei 9,737
8 statsmodels 9,435
9 miller 8,510
10 gonum 7,204
11 imbalanced-learn 6,669
12 boltons 6,395
13 git-quick-stats 6,113
14 Smile 5,904
15 scc 5,826
16 growthbook 5,442
17 Tautulli 5,310
18 criterion.rs 4,094
19 probability 4,057
20 datascience 4,037
21 stdlib 3,941
22 statsforecast 3,468
23 Tablesaw 3,425
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com