#Statistics

Open-source projects categorized as Statistics

Top 23 Statistic Open-Source Projects

  • GitHub repo scikit-learn

    scikit-learn: machine learning in Python

    Project mention: Beginner's Question: Naive Bayes Implementation for Spam Classification | reddit.com/r/MLQuestions | 2021-04-17

    Look at the Sklearn implementation and check out some of the differences in the fit method (https://github.com/scikit-learn/scikit-learn/blob/95119c13a/sklearn/naive_bayes.py#L593)

  • GitHub repo Probabilistic-Programming-and-Bayesian-Methods-for-Hackers

    aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)

    Project mention: Kelly's criterion for gamblers: one of the most important concepts for understanding how investment size impacts returns | reddit.com/r/options | 2021-04-08

    The reference you shared looks really interesting I'll check it out. I have a blog exploring monte carlo simulations of my markov decision process model of the wheel. I got into MCMC and python with a really great book/python notebook on it that you might be interested in. Cheers.

  • GitHub repo G2

    📊 A highly interactive data-driven visualization grammar for statistical charts.

  • GitHub repo Plausible Analytics

    Simple, open-source, lightweight (< 1 KB) and privacy-friendly web analytics alternative to Google Analytics.

    Project mention: Right in my dev shoes | dev.to | 2021-04-22

    For instance, if we need to get audience information about our websites, what about using tools that respect people's privacy? Plausible.io is one of them and it allows to get that information without the need for private data (there are others, it's just the one that I use, I don't have any bindings with the company).

  • GitHub repo pandas-profiling

    Create HTML profiling reports from pandas DataFrame objects

    Project mention: Data quality assessment tool | reddit.com/r/datascience | 2021-02-05

    pandas profiling https://github.com/pandas-profiling/pandas-profiling

  • GitHub repo Umami

    Umami is a simple, fast, website analytics alternative to Google Analytics.

    Project mention: Help with ProxyPass and reverse proxies | reddit.com/r/apache | 2021-03-24
  • GitHub repo statsmodels

    Statsmodels: statistical modeling and econometrics in Python

    Project mention: [C] I have an MS in Statistics - how can I get better at coding? | reddit.com/r/statistics | 2021-01-04
  • GitHub repo boltons

    🔩 Like builtins, but boltons. 250+ constructs, recipes, and snippets which extend (and rely on nothing but) the Python standard library. Nothing like Michael Bolton.

  • GitHub repo Smile

    Statistical Machine Intelligence & Learning Engine

    Project mention: Kotlin with Randon Forest Classifier | reddit.com/r/Kotlin | 2021-04-19

    I've heard good things about Smile, probably beats libs like Weka by far. I'm not sure if you can load a scikit-learn model though, so you might need to retrain the model in Kotlin.

  • GitHub repo gonum

    Gonum is a set of numeric libraries for the Go programming language. It contains libraries for matrices, statistics, optimization, and more

    Project mention: Go+: Go designed for data science | news.ycombinator.com | 2021-03-27

    Apart from Gonum[1] numerical libraries, I haven't found specific data science related Go libraries in my search for it for some hobby projects when compared to Python ecosystem.

    Interestingly Prose[2] A Go library for text processing yielded better results for named-entity extraction when compared to NLTK in my tests in terms of accuracy and obviously performance.

    Perhaps Go is not being applied enough in the Data Science/ML and for fields where it's applied (Network) Math in the standard library seems to be sufficient.

    [1] https://github.com/gonum/gonum

    [2] https://github.com/jdkato/prose

  • GitHub repo tokei

    Count your code, quickly.

    Project mention: PlaintDB Serves - another milestone reached | dev.to | 2021-04-14

    Let's look at the stats of PliantDB as of tonight, using tokei:

  • GitHub repo Tautulli

    A Python based monitoring and tracking tool for Plex Media Server.

    Project mention: Launch a command-line function when any movie starts? | reddit.com/r/Tautulli | 2021-04-20
  • GitHub repo probability

    Probabilistic reasoning and statistical analysis in TensorFlow

    Project mention: Problem in installing tensorflow on Raspberry Pi 4b | reddit.com/r/engineering | 2021-03-24

    2021-03-20 20:09:56.451490: E tensorflow/core/platform/hadoop/hadoopfile_system.cc:132] HadoopFileSystem load error: libhdfs.so: cannot open shared object file: No such file or directory WARNING:tensorflow:From /home/pi/.virtualenvs/cv/lib/python3.7/site-packages/tensorflow_core/python/ops/distributions/distribution.py:265: ReparameterizationType.init_ (from tensorflow.python.ops.distributions.distribution) is deprecated and will be removed after 2019-01-01. Instructions for updating: The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use tfp.distributions instead of tf.distributions. WARNING:tensorflow:From /home/pi/.virtualenvs/cv/lib/python3.7/site-packages/tensorflowcore/python/ops/distributions/bernoulli.py:169: RegisterKL.init_ (from tensorflow.python.ops.distributions.kullbackleibler) is deprecated and will be removed after 2019-01-01. Instructions for updating: The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use tfp.distributions instead of tf.distributions. ERROR thonny.backend: PROBLEM WITH THONNY'S BACK-END Traceback (most recent call last): File "/usr/lib/python3/dist-packages/thonny/plugins/cpython/cpython_backend.py", line 1240, in wrapper result = method(self, args, *kwargs) File "/usr/lib/python3/dist-packages/thonny/plugins/cpython/cpython_backend.py", line 1227, in wrapper return method(self, args, *kwargs) File "/usr/lib/python3/dist-packages/thonny/plugins/cpython/cpython_backend.py", line 1297, in _execute_prepared_user_code exec(statements, global_vars) File "/home/pi/Desktop/security/security_system_v2.py", line 10, in import tensorflow as tf File "/usr/lib/python3/dist-packages/thonny/plugins/cpython/cpython_backend.py", line 314, in _custom_import module = self._original_import(args, *kw) File "/home/pi/.virtualenvs/cv/lib/python3.7/site-packages/tensorflow/init.py", line 98, in from tensorflow_core import * File "/usr/lib/python3/dist-packages/thonny/plugins/cpython/cpython_backend.py", line 314, in _custom_import module = self._original_import(args, *kw) File "/home/pi/.virtualenvs/cv/lib/python3.7/site-packages/tensorflow_core/init.py", line 28, in from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import File "/usr/lib/python3/dist-packages/thonny/plugins/cpython/cpython_backend.py", line 314, in _custom_import module = self._original_import(args, *kw) File "", line 1019, in _handle_fromlist File "/home/pi/.virtualenvs/cv/lib/python3.7/site-packages/tensorflow/init.py", line 50, in __getattr_ module = self.load() File "/home/pi/.virtualenvs/cv/lib/python3.7/site-packages/tensorflow/init.py", line 44, in _load module = _importlib.import_module(self.name) File "/usr/lib/python3.7/importlib/init.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "/home/pi/.virtualenvs/cv/lib/python3.7/site-packages/tensorflow_core/python/init.py", line 88, in from tensorflow.python import keras File "/usr/lib/python3/dist-packages/thonny/plugins/cpython/cpython_backend.py", line 314, in _custom_import module = self._original_import(args, *kw) File "/home/pi/.virtualenvs/cv/lib/python3.7/site-packages/tensorflow_core/python/keras/init.py", line 26, in from tensorflow.python.keras import activations File "/usr/lib/python3/dist-packages/thonny/plugins/cpython/cpython_backend.py", line 314, in _custom_import module = self._original_import(args, *kw) File "/home/pi/.virtualenvs/cv/lib/python3.7/site-packages/tensorflow_core/python/keras/init.py", line 26, in from tensorflow.python.keras import activations File "/usr/lib/python3/dist-packages/thonny/plugins/cpython/cpython_backend.py", line 314, in _custom_import module = self._original_import(args, *kw) File "/home/pi/.virtualenvs/cv/lib/python3.7/site-packages/tensorflow_core/python/keras/activations.py", line 23, in from tensorflow.python.keras.utils.generic_utils import deserialize_keras_object File "/usr/lib/python3/dist-packages/thonny/plugins/cpython/cpython_backend.py", line 314, in _custom_import module = self._original_import(args, *kw) File "/home/pi/.virtualenvs/cv/lib/python3.7/site-packages/tensorflow_core/python/keras/utils/init.py", line 34, in from tensorflow.python.keras.utils.io_utils import HDF5Matrix File "/usr/lib/python3/dist-packages/thonny/plugins/cpython/cpython_backend.py", line 314, in _custom_import module = self._original_import(args, *kw) File "/home/pi/.virtualenvs/cv/lib/python3.7/site-packages/tensorflow_core/python/keras/utils/io_utils.py", line 30, in import h5py File "/usr/lib/python3/dist-packages/thonny/plugins/cpython/cpython_backend.py", line 314, in _custom_import module = self._original_import(args, *kw) File "/home/pi/.virtualenvs/cv/lib/python3.7/site-packages/h5py/init_.py", line 25, in from . import _errors File "/usr/lib/python3/dist-packages/thonny/plugins/cpython/cpython_backend.py", line 314, in _custom_import module = self._original_import(args, *kw) File "h5py/_errors.pyx", line 1, in init h5py._errors ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 44 from C header, got 40 from PyObject

  • GitHub repo miller

    Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON

    Project mention: Consultare un databate XML, JSON, CVS o RDF | reddit.com/r/ItalyInformatica | 2021-03-31
  • GitHub repo datascience

    Curated list of Python resources for data science. (by r0f1)

    Project mention: Opinionated List of Data Science Libraries for Python | news.ycombinator.com | 2021-03-30
  • GitHub repo Tablesaw

    Java dataframe and visualization library

  • GitHub repo MathNet

    Math.NET Numerics

  • GitHub repo scc

    Sloc, Cloc and Code: scc is a very fast accurate code counter with complexity calculations and COCOMO estimates written in pure Go

    Project mention: Show HN: Simplenetes – I replaced Kubernetes with 17k lines of shell script | news.ycombinator.com | 2021-04-01
  • GitHub repo Stats

    A well tested and comprehensive Golang statistics library package with no dependencies. (by montanaflynn)

  • GitHub repo eiten

    Statistical and Algorithmic Investing Strategies for Everyone

    Project mention: Pair trading died - hello massive trading | reddit.com/r/algotrading | 2021-02-10

    My question is: is this a backtesting or is it a real trading system that you plotted there, because there's a valley of difference. What you seem to be doing is basically what a long-short portfolio allocation does, see this for example.

  • GitHub repo Math PHP

    Powerful modern math library for PHP: Features descriptive statistics and regressions; Continuous and discrete probability distributions; Linear algebra with matrices and vectors, Numerical analysis; special mathematical functions; Algebra

  • GitHub repo criterion.rs

    Statistics-driven benchmarking library for Rust

    Project mention: Announcing message-io 0.12 - an event-driven message library to build network applications easy and fast. Now with zero-copy write/read messages. Performance close to using native OS socket with all the facilities the library offers. | reddit.com/r/rust | 2021-04-08

    Currently, there are latency benchmarks (cargo bench inside the repo shows that). There is a pending issue to attack the throughput benchmarks. The idea is to use the criterion.rs but I do not get (or I do not be able to get) stable throughput tests due to the usage of criterion do not "fit" well with the message-io usage. (closure incompatibilities for measure the time correctly).

  • GitHub repo laravel-stats

    📈 Get insights about your Laravel or Lumen Project

    Project mention: What dev composer packages are a must have? | reddit.com/r/laravel | 2021-02-16

    wnx/laravel-stats: https://github.com/stefanzweifel/laravel-stats. My own package to see how big my app actually is.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2021-04-22.

Index

What are some of the best open-source Statistic projects? This list will help you:

Project Stars
1 scikit-learn 45,396
2 Probabilistic-Programming-and-Bayesian-Methods-for-Hackers 22,839
3 G2 10,611
4 Plausible Analytics 7,367
5 pandas-profiling 7,111
6 Umami 6,644
7 statsmodels 6,234
8 boltons 5,429
9 Smile 5,233
10 gonum 4,793
11 tokei 4,772
12 Tautulli 3,842
13 probability 3,278
14 miller 2,710
15 datascience 2,607
16 Tablesaw 2,553
17 MathNet 2,469
18 scc 2,388
19 Stats 1,952
20 eiten 1,895
21 Math PHP 1,884
22 criterion.rs 1,761
23 laravel-stats 1,406