Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 23 Jupyter Notebook Data Analysis Projects
-
Get started with Data Science in the Data Science for Beginners curricula.
-
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
machine_learning_complete
A comprehensive machine learning repository containing 30+ notebooks on different concepts, algorithms and techniques.
-
Data-science
Collection of useful data science topics along with articles, videos, and code (by khuyentran1401)
-
-
Linear-Algebra-With-Python
Lecture Notes for Linear Algebra Featuring Python. This series of lecture notes will walk you through all the must-know concepts that set the foundation of data science or advanced quantitative skillsets. Suitable for statistician/econometrician, quantitative analysts, data scientists and etc. to quickly refresh the linear algebra with the assistance of Python computation and visualization.
Project mention: Python for Econometrics for Practitioners [Free Online Courses] | /r/CompSocial | 2023-08-24Linear Algebra with Python: This training will walk you through all the must-know concepts that set the foundation of data science or advanced quantitative skill sets. Suitable for statisticians, econometricians, quantitative analysts, data scientists, etc. to quickly refresh linear algebra with the assistance of Python computation and visualization. Core concepts covered are: linear combination, vector space, linear transformation, eigenvalues and -vector, diagnolization, singular value decomposition, etc.
-
100-pandas-puzzles
100 data puzzles for pandas, ranging from short and simple to super tricky (60% complete)
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
As it happens, there's a PyMC implementation of the 1st and 2nd editions of Statistical Rethinking here:
https://github.com/pymc-devs/pymc-resources
(I think the author of the book discussed above, Osvaldo Martin, is the primary or sole contributor for the Rethinking implementations, in fact -- he had a full implementation in his own repo [here](https://github.com/aloctavodia/Statistical-Rethinking-with-P...) before deprecating it in favor of the above-linked one.)
-
Project mention: 80% faster, 50% less memory, 0% loss of accuracy Llama finetuning | news.ycombinator.com | 2023-12-01
Good point - the main issue is we encountered this exact issue with our old package Hyperlearn (https://github.com/danielhanchen/hyperlearn).
I OSSed all the code to the community - I'm actually an extremely open person and I love contributing to the OSS community.
The issue was the package got gobbled up by other startups and big tech companies with no credit - I didn't want any cash from it, but it stung and hurt really bad hearing other startups and companies claim it was them who made it faster, whilst it was actually my work. It hurt really bad - as an OSS person, I don't want money, but just some recognition for the work.
I also used to accept and help everyone with their writing their startup's software, but I never got paid or even any thanks - sadly I didn't expect the world to be such a hostile place.
So after a sad awakening, I decided with my brother instead of OSSing everything, we would first OSS something which is still very good - 5X faster training is already very reasonable.
I'm all open to other suggestions on how we should approach this though! There are no evil intentions - in fact I insisted we OSS EVERYTHING even the 30x faster algos, but after a level headed discussion with my brother - we still have to pay life expenses no?
If you have other ways we can go about this - I'm all ears!! We're literally making stuff up as we go along!
-
hamilton
Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage and metadata. Runs and scales everywhere python does.
Project mention: Using IPython Jupyter Magic commands to improve the notebook experience | dev.to | 2024-03-03In this post, we’ll show how your team can turn any utility function(s) into reusable IPython Jupyter magics for a better notebook experience. As an example, we’ll use Hamilton, my open source library, to motivate the creation of a magic that facilitates better development ergonomics for using it. You needn’t know what Hamilton is to understand this post.
-
-
-
-
-
tempo
API for manipulating time series on top of Apache Spark: lagged time values, rolling statistics (mean, avg, sum, count, etc), AS OF joins, downsampling, and interpolation (by databrickslabs)
-
-
Rust still has some key pieces missing, but looks promising, see: https://github.com/wiseaidev/rust-data-analysis
F# has a very decent data community: https://datascienceinfsharp.com
And obviously Julia is also something to consider.
-
covid19-severity-prediction
Extensive and accessible COVID-19 data + forecasting for counties and hospitals. 📈
-
Econometrics-With-Python
Tutorials of econometrics featuring Python programming. This is a crash course for reviewing the most important concepts and techniques of basic econometrics, the theories are presented lightly without hustles of derivation and Python codes are straightforward.
Project mention: Python for Econometrics for Practitioners [Free Online Courses] | /r/CompSocial | 2023-08-24Econometrics with Python: This is a crash course for reviewing the most important concepts and techniques of econometrics. The theories are presented lightly without hustles of mathematical derivation and Python codes are mostly procedural and straightforward. Core concepts covered: multi- linear regression, logistic model, dummy variable, simultaneous equations model, panel data model and time series.
-
DataScienceWithPython
Learn Data Science with focus on adding value with the most efficient tech stack.
-
-
All the code used as part of this article (and more!) is available on my Github profile.
-
daru-view
daru-view is for easy and interactive plotting in web application & IRuby notebook. daru-view is a plugin gem to the existing daru gem.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Jupyter Notebook Data Analysis related posts
- Welcome to 14 days of Data Science!
- Data Science for Beginners - A Curriculum
- Assessing the Quality of Synthetic Data with Data-centric AI
- Is anyone willing to work with us on a Synthetic Data Project?
- Where can I find data science projects to gain more experience.
- Have a lot of free time in my DS work, feel guilty about it, is it normal?
- How can a correlation coefficient be "invalid"?
-
A note from our sponsor - InfluxDB
www.influxdata.com | 29 Mar 2024
Index
What are some of the best open-source Data Analysis projects in Jupyter Notebook? This list will help you:
Project | Stars | |
---|---|---|
1 | Data-Science-For-Beginners | 26,031 |
2 | pandas_exercises | 10,036 |
3 | machine_learning_complete | 4,462 |
4 | Data-science | 3,938 |
5 | ML-Workspace | 3,310 |
6 | Linear-Algebra-With-Python | 2,098 |
7 | 100-pandas-puzzles | 2,097 |
8 | pymc-resources | 1,870 |
9 | hyperlearn | 1,510 |
10 | hamilton | 1,272 |
11 | kangas | 1,023 |
12 | qs_ledger | 948 |
13 | machine-learning | 658 |
14 | datacamp | 300 |
15 | tempo | 294 |
16 | RasgoQL | 266 |
17 | rust-data-analysis | 263 |
18 | covid19-severity-prediction | 226 |
19 | Econometrics-With-Python | 213 |
20 | DataScienceWithPython | 170 |
21 | PANDAS-TUTORIAL | 152 |
22 | Data-Visualization | 151 |
23 | daru-view | 90 |