Statistics

Top 23 Statistic Open-Source Projects

  • scikit-learn

    scikit-learn: machine learning in Python

  • Project mention: AutoCodeRover resolves 22% of real-world GitHub in SWE-bench lite | news.ycombinator.com | 2024-04-09

    Thank you for your interest. There are some interesting examples in the SWE-bench-lite benchmark which are resolved by AutoCodeRover:

    - From sympy: https://github.com/sympy/sympy/issues/13643. AutoCodeRover's patch for it: https://github.com/nus-apr/auto-code-rover/blob/main/results...

    - Another one from scikit-learn: https://github.com/scikit-learn/scikit-learn/issues/13070. AutoCodeRover's patch (https://github.com/nus-apr/auto-code-rover/blob/main/results...) modified a few lines below (compared to the developer patch) and wrote a different comment.

    There are more examples in the results directory (https://github.com/nus-apr/auto-code-rover/tree/main/results).

  • Probabilistic-Programming-and-Bayesian-Methods-for-Hackers

    aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)

  • Project mention: Probabilistic Programming and Bayesian Methods for Hackers (2013) | news.ycombinator.com | 2024-02-10
  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • Umami

    Umami is a simple, fast, privacy-focused alternative to Google Analytics.

  • Project mention: Any Google Analytics Alternatives? | news.ycombinator.com | 2024-05-01

    Another open source alternative similar to Plausible is https://umami.is/

  • Plausible Analytics

    Simple, open source, lightweight (< 1 KB) and privacy-friendly web analytics alternative to Google Analytics.

  • Project mention: Any Google Analytics Alternatives? | news.ycombinator.com | 2024-05-01

    I think a single Google Analytics alternative is pretty hard to pick considering that GA can be used to very much varying extents.

    For simple and "detailed enough" insights, I enjoyed using Plausible (https://plausible.io/) in the past.

    For more in depth analytics that give you a detailed view into your own product, PostHog.com seems to be by far the best and most popular option out there.

  • excelize

    Go language library for reading and writing Microsoft Excel™ (XLAM / XLSM / XLSX / XLTM / XLTX) spreadsheets

  • Project mention: Recommend a powerful excel processing library, @zurmokeeper/exceljs, which supports encryption and decryption of xlsx files and flexible setting of multiple table headers when exporting, etc. | /r/node | 2023-07-01

    Then I found out that WPS only supports ecma376 standard encrytion for xlsx files. Then I referred to the official documentation and libraries in other languages, such as msoffcrypto-tool written in python. msoffcrypto-tool) and go's excelize. Since I don't know much about encryption and decryption, the process of implementation is also a bit of a twist.

  • ydata-profiling

    1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.

  • Project mention: FLaNK 25 December 2023 | dev.to | 2023-12-26
  • tokei

    Count your code, quickly.

  • Project mention: XAMPPRocky/tokei: Count your code, quickly | news.ycombinator.com | 2024-04-09
  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • statsmodels

    Statsmodels: statistical modeling and econometrics in Python

  • miller

    Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON

  • Project mention: Qsv: Efficient CSV CLI Toolkit | news.ycombinator.com | 2023-12-22
  • gonum

    Gonum is a set of numeric libraries for the Go programming language. It contains libraries for matrices, statistics, optimization, and more

  • Project mention: How to set up interface to accept multi-dimension array? | /r/golang | 2023-07-13

    But if you want to see what can be done for numeric stuff, check out gonum. Personally, I still wouldn't use Go, and I rather suspect it's still pretty easy to reach for something like what you're trying to do and not find it because Go just can't write that type sensibly, but you can at least see what is available, written by people who disagree with me about Go not being a great language for this.

  • imbalanced-learn

    A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

  • Project mention: What’s your approach to highly imbalanced data sets? | /r/datascience | 2023-05-26

    There's a pletora of undersampling and oversampling models you can try out. To avoid removing information form the dataset, you can focus on oversampling techniques. You can try imbalanced-learn or smote-variants. Given enough data, using fully synthetic data is also an option, you can check ydata-synthetic for it. Let us know how it turned out!

  • boltons

    🔩 Like builtins, but boltons. 250+ constructs, recipes, and snippets which extend (and rely on nothing but) the Python standard library. Nothing like Michael Bolton.

  • Project mention: Boltons is a set of over 250 BSD-licensed, pure-Python utilities | news.ycombinator.com | 2023-12-11
  • git-quick-stats

    ▁▅▆▃▅ Git quick statistics is a simple and efficient way to access various statistics in git repository.

  • scc

    Sloc, Cloc and Code: scc is a very fast accurate code counter with complexity calculations and COCOMO estimates written in pure Go

  • Project mention: Scc: A fast code counter with complexity calculations and COCOMO estimates | news.ycombinator.com | 2024-04-23
  • Smile

    Statistical Machine Intelligence & Learning Engine

  • Project mention: The Current State of Clojure's Machine Learning Ecosystem | news.ycombinator.com | 2024-04-07

    > I don't think it's right to recommend that new users move away from the package because of licensing issues

    I was going to chime in to agree but then I saw how this was done - a completely innocuous looking commit:

    https://github.com/haifengl/smile/commit/6f22097b233a3436519...

    And literally no mention in the release notes:

    https://github.com/haifengl/smile/releases/tag/v3.0.0

    I think if you are going to change license especially in a way that makes it less permissive you need to be super open and clear about both the fact you are doing it and your reasons for that. This is done so silently as to look like it is intentionally trying to mislead and trick people.

    So maybe I wouldn't say to move away because of the specific license, but it's legitimate to avoid something when it's so clearly driven by a single entity and that entity acts in a way that isn't trustworthy.

  • growthbook

    Open Source Feature Flagging and A/B Testing Platform

  • Project mention: GrowthBook: Open-source feature flagging and A/B testing platform | /r/opensource | 2023-10-20
  • Tautulli

    A Python based monitoring and tracking tool for Plex Media Server.

  • Project mention: I'm fine with the basics of Plex - now what can I do to really use plex to it's full potential? | /r/PleX | 2023-12-09

    With Tautulli you have a better monitoring system than what Plex offers. Streaming history split by user, you can add notifications to a lot of services like Slack, email and so on. You can even create newsletters being sent out to users based on what was added to your server.

  • criterion.rs

    Statistics-driven benchmarking library for Rust

  • Project mention: How to benchmark in Rust with libtest bench | /r/bencher | 2023-12-03

    The three popular options for benchmarking in Rust are: libtest bench, Criterion, and Iai.

  • probability

    Probabilistic reasoning and statistical analysis in TensorFlow

  • Project mention: How often do you see Bayesian Statistics or Stan in the DS world? Essential skill or a nice to have? | /r/datascience | 2023-06-17

    TensorFlow-Probability

  • datascience

    Curated list of Python resources for data science.

  • stdlib

    ✨ Standard library for JavaScript and Node.js. ✨

  • Project mention: Node still seems better than python after all this time for web server speed but.. | /r/node | 2023-06-20

    Numpy is a library - node.js has plenty of them, what is missing? There is stdlib package that offers optimized math functions, for example.

  • statsforecast

    Lightning ⚡️ fast forecasting with statistical and econometric models.

  • Project mention: TimeGPT-1 | news.ycombinator.com | 2023-10-13

    I can't find the TimeGPT-1 model.

    LICENSE Apache-2

    https://github.com/Nixtla/statsforecast/blob/main/LICENSE

    Mentions ARIMA, ETS, CES, and Theta modeling

  • Tablesaw

    Java dataframe and visualization library

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Statistics related posts

  • Any Google Analytics Alternatives?

    3 projects | news.ycombinator.com | 1 May 2024
  • We need to Speak about Google Code Quality

    2 projects | dev.to | 24 Apr 2024
  • Show HN: Open-Source Ad-Free File Upload Service

    1 project | news.ycombinator.com | 22 Apr 2024
  • Plausible as an alternative to Google Analytics

    2 projects | dev.to | 18 Apr 2024
  • Umami: Best free Go-To Google Analytics Alternative

    1 project | dev.to | 11 Apr 2024
  • Frouros: An open-source Python library for drift detection in machine learning

    1 project | news.ycombinator.com | 6 Apr 2024
  • Simple no bs persistent notepad

    2 projects | news.ycombinator.com | 16 Mar 2024
  • A note from our sponsor - InfluxDB
    www.influxdata.com | 4 May 2024
    Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source Statistic projects? This list will help you:

Project Stars
1 scikit-learn 58,130
2 Probabilistic-Programming-and-Bayesian-Methods-for-Hackers 26,382
3 Umami 19,654
4 Plausible Analytics 18,415
5 excelize 17,311
6 ydata-profiling 12,053
7 tokei 10,006
8 statsmodels 9,557
9 miller 8,559
10 gonum 7,272
11 imbalanced-learn 6,703
12 boltons 6,417
13 git-quick-stats 6,156
14 scc 6,103
15 Smile 5,925
16 growthbook 5,549
17 Tautulli 5,371
18 criterion.rs 4,170
19 probability 4,133
20 datascience 4,071
21 stdlib 4,026
22 statsforecast 3,565
23 Tablesaw 3,442

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com