Top 23 Jupyter Notebook Statistic Projects

ProbabilisticProgrammingandBayesianMethodsforHackers
aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understandingfirst, mathematicssecond point of view. All in pure Python ;)
Project mention: Predicting the distribution of a variable rather than a point estimate  reddit.com/r/datascience  20220814You’re welcome! I would recommend Bayesian Methods for Hackers

Project mention: [P] Any good resources which can help me with Multivariate Time Series Forecasting using Probabilistic Machine Learning?  reddit.com/r/MachineLearning  20220814

Scout APM
Less time debugging, more time building. Scout APM allows you to find and fix performance issues with no hassle. Now with error monitoring and external services monitoring, Scout is a developer's best friend when it comes to application development.

Project mention: [Project] BFLOAT16 on ALL hardware (>= 2009), up to 2000x faster ML algos, 50% less RAM usage for all old/new hardware  Hyperlearn Reborn.  reddit.com/r/MachineLearning  20220602
Hello everyone!! It's been a while!! Years back I released Hyperlearn https://github.com/danielhanchen/hyperlearn. It has 1.2K Github stars, where I made tonnes of algos faster:

It is available in the StatsForecast package.

imodels
Interpretable ML package 🔍 for concise, transparent, and accurate predictive modeling (sklearncompatible).
Option 2) fit a model from https://github.com/csinva/imodels on the predicted values of the RF

Project mention: Package for Computations and Statistics on Manifolds  news.ycombinator.com  20220110

Project mention: Is Edward2 still a part of Tensorflow/Tensorflow Probability or is it discontinued?  reddit.com/r/tensorflow  20220623

SonarQube
Static code analysis for 29 languages.. Your projects are multilanguage. So is SonarQube analysis. Find Bugs, Vulnerabilities, Security Hotspots, and Code Smells so you can release quality code every time. Get started analyzing your projects today for free.


DataScienceProjects
The code repository for projects and tutorials in R and Python that covers a variety of topics in data visualization, statistics sports analytics and general application of probability theory.

datasciencelearning
Repository of code and resources related to different data science and machine learning topics. For learning, practice and teaching purposes.
Project mention: error: the following arguments are required: i/inputpath, o/outputpath How to define these?  reddit.com/r/learnpython  20220114https://github.com/5agado/datasciencelearning/tree/master/graphics/learn_to_paint  github

theelementsofstatisticallearning
My notes and codes (jupyter notebooks) for the "The Elements of Statistical Learning" by Trevor Hastie, Robert Tibshirani and Jerome Friedman

covid19severityprediction
Extensive and accessible COVID19 data + forecasting for counties and hospitals. 📈

Here or here for the Python versions of ISLR.

I've created a short course on linear programming  you can find the resources here  https://github.com/ADGEfficiency/teachingmonolith/tree/master/linearprogramming

Project mention: DCGAN (CIFAR10) Generating fake images is easy, but how to also output the class label (1 to 10) with the fake generated images?  reddit.com/r/learnmachinelearning  20220313
I have this DCGAN model (https://github.com/csinva/ganvaepretrainedpytorch/tree/master/cifar10_dcgan) which generates fake Cifar10 images. However I also want to get the intended class label output with the fake generated images. How can I do this? This model which I found only generates fake images but doesn't know what class the generated images belong to.

conformal_classification
Wrapper for a PyTorch classifier which allows it to output prediction sets. The sets are theoretically guaranteed to contain the true class with high probability (via conformal prediction).
Project mention: [R] Introduction to Conformal Prediction and DistributionFree Uncertainty Quantification  Link to a free online lecture by the author in comments  reddit.com/r/MachineLearning  20220306Uncertainty Sets for Image Classifiers using Conformal Prediction https://arxiv.org/abs/2009.14193 https://github.com/aangelopoulos/conformal_classification

Project mention: Time series forecasting model predicts increasing number for target variable when the actual values are zeroes  reddit.com/r/datascience  20220801
You can try HierarchicalForecast package to reconciliate predictions.

PyImpetus
PyImpetus is a Markov Blanket based feature subset selection algorithm that considers features both separately and together as a group in order to provide not just the best set of features but also the best combination of features

Project mention: [R] Introduction to Conformal Prediction and DistributionFree Uncertainty Quantification  Link to a free online lecture by the author in comments  reddit.com/r/MachineLearning  20220306
Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control https://arxiv.org/abs/2110.01052 https://github.com/aangelopoulos/ltt

eip1559_analysis
Can we estimate the economic impact of EIP1559 on miners? This repository try to estimate the loss of miners' revenue coming from transactions fees, using Ethereum historical data.

humblebenchmarks
Benchmarking programming languages using statistics and machine learning algorithms

extremeheatexcessdeathsanalysis
A statistical analysis of excess deaths attributable to extreme heat in California's most populous counties

Project mention: Great thread on the importance of EDA and confounder adjustment prior to differential expression analysis (RNAseq)  reddit.com/r/bioinformatics  20220412
Fwiw, last week I released a new method for batch correction that conditions on confounders which are correlated with the batch variable. Preprint here: https://arxiv.org/abs/2203.12720 and Python code here: https://github.com/calvinmccarter/condoadapter.
Jupyter Notebook Statistics related posts
 [Q] Weekly time series forecasting
 Only lost once (HOMER) but my 99% just turned back into 100% after hitting 200 played. Anyone else?
 200 up this morning
 [D] What are some statistical packages you use in R that aren't available in Python?
 [P] Fastest and most accurate version of the Exponential Smoothing (ETS) Algorithm for Python
 Exponential Smoothing (ETS) for Python
 [P] It's settled: AutoArima is a lot(!) faster and more accurate than FBProphet. Now you can replace it with just two lines of code without making changes to your pipeline
Index
What are some of the best opensource Statistic projects in Jupyter Notebook? This list will help you:
Project  Stars  

1  ProbabilisticProgrammingandBayesianMethodsforHackers  24,703 
2  probability  3,753 
3  hyperlearn  1,402 
4  statsforecast  880 
5  imodels  875 
6  geomstats  793 
7  edward2  586 
8  facet  403 
9  DataScienceProjects  384 
10  datasciencelearning  373 
11  theelementsofstatisticallearning  354 
12  covid19severityprediction  206 
13  ISLpython  135 
14  teachingmonolith  134 
15  ganvaepretrainedpytorch  131 
16  conformal_classification  127 
17  hierarchicalforecast  124 
18  PyImpetus  92 
19  ltt  26 
20  eip1559_analysis  11 
21  humblebenchmarks  5 
22  extremeheatexcessdeathsanalysis  2 
23  condoadapter  1 
Are you hiring? Post a new remote job listing for free.