Top 23 Jupyter Notebook AI Projects

generative-ai-for-beginners

8 42,394 9.8 Jupyter Notebook

18 Lessons, Get Started Building with Generative AI 🔗 https://microsoft.github.io/generative-ai-for-beginners/

Project mention: Build a serverless ChatGPT with RAG using LangChain.js | dev.to | 2024-04-10

Generative AI For Beginners: a collection of resources to learn about Generative AI, including tutorials, code samples, and more.

google-research

98 32,804 9.6 Jupyter Notebook

Google Research

Project mention: Show HN: Next-token prediction in JavaScript – build fast LLMs from scratch | news.ycombinator.com | 2024-04-10

People on here will be happy to say that I do a similar thing, however my sequence length is dynamic because I also use a 2nd data structure - I'll use pretentious academic speak: I use a simple bigram LM (2-gram) for single next-word likeliness and separately a trie that models all words and phrases (so, n-gram). Not sure how many total nodes because sentence lengths vary in training data, but there are about 200,000 entry points (keys) so probably about 2-10 million total nodes in the default setup.
"Constructing 7-gram LM": They likely started with bigrams (what I use) which only tells you the next word based on 1 word given, and thought to increase accuracy by modeling out more words in a sequence, and eventually let the user (developer) pass in any amount they want to model (https://github.com/google-research/google-research/blob/5c87...). I thought of this too at first, but I actually got more accuracy (and speed) out of just keeping them as bigrams and making a totally separate structure that models out an n-gram of all phrases (e.g. could be a 24-token long sequence or 100+ tokens etc. I model it all) and if that phrase is found, then I just get the bigram assumption of the last token of the phrase. This works better when the training data is more diverse (for a very generic model), but theirs would probably outperform mine on accuracy when the training data has a lot of nearly identical sentences that only change wildly toward the end - I don't find this pattern in typical data though, maybe for certain coding and other tasks there are those patterns though. But because it's not dynamic and they make you provide that number, even a low number (any phrase longer than 2 words) - theirs will always have to do more lookup work than with simple bigrams and they're also limited by that fixed number as far as accuracy. I wonder how scalable that is - if I need to train on occasional ~100-word long sentences but also (and mostly) just ~3-word long sentences, I guess I set this to 100 and have a mostly "undefined" trie.
I also thought of the name "LMJS", theirs is "jslm" :) but I went with simply "next-token-prediction" because that's what it ultimately does as a library. I don't know what theirs is really designed for other than proving a concept. Most of their code files are actually comments and hypothetical scenarios.
I recently added a browser example showing simple autocomplete using my library: https://github.com/bennyschmidt/next-token-prediction/tree/m... (video)
And next I'm implementing 8-dimensional embeddings that are converted to normalized vectors between 0-1 to see if doing math on them does anything useful beyond similarity, right now they look like this:
  [nextFrequency, prevalence, specificity, length, firstLetter, lastLetter, firstVowel, lastVowel]

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
AI-For-Beginners

8 31,046 6.7 Jupyter Notebook

12 Weeks, 24 Lessons, AI for All!

Project mention: FREE AI Course By Microsoft: ZERO to HERO! 🔥 | dev.to | 2024-03-18

🔗 https://github.com/microsoft/AI-For-Beginners 🔗 https://microsoft.github.io/AI-For-Beginners/

learnopencv

6 20,363 8.6 Jupyter Notebook

Learn OpenCV : C++ and Python Examples

Project mention: YOLO-NAS Pose | /r/pytorch | 2023-11-16

Deci's YOLO-NAS Pose: Redefining Pose Estimation! Elevating healthcare, sports, tech, and robotics with precision and speed. Github link and blog link down below! Repo: https://github.com/spmallick/learnopencv/tree/master/YOLO-NAS-Pose

h4cker

4 16,518 9.3 Jupyter Notebook

This repository is primarily maintained by Omar Santos (@santosomar) and includes thousands of resources related to ethical hacking, bug bounties, digital forensics and incident response (DFIR), artificial intelligence security, vulnerability research, exploit development, reverse engineering, and more.
StableLM

43 15,853 5.0 Jupyter Notebook

StableLM: Stability AI Language Models

Project mention: The Era of 1-bit LLMs: ternary parameters for cost-effective computing | news.ycombinator.com | 2024-02-28

https://github.com/Stability-AI/StableLM?tab=readme-ov-file#...

stable-diffusion-webui-colab

71 15,237 9.0 Jupyter Notebook

stable diffusion webui colab

Project mention: Stable-Diffusion-Webui-Colab | news.ycombinator.com | 2023-07-24

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
dopamine

3 10,371 4.8 Jupyter Notebook

Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.
ML-Papers-of-the-Week

2 8,692 8.5 Jupyter Notebook

🔥Highlighting the top ML papers every week.

Project mention: [D] Where can I find a list of the foundational academic papers in RL/ML/DL and what are your go-to places to find new academic papers in RL/ML/DL? | /r/MachineLearning | 2023-07-07

Labml.ai stopped working in May. I like https://github.com/dair-ai/ML-Papers-of-the-Week

generative-ai

1 5,396 9.7 Jupyter Notebook

Sample code and notebooks for Generative AI on Google Cloud (by GoogleCloudPlatform)

Project mention: Google Imagen 2 | news.ycombinator.com | 2023-12-13

I've used the code based on similar examples from GitHub [1]. According to docs [2], imagegeneration@005 was released on the 11th, so I guessed it's Imagen 2, though there are no confirmations.
[1] https://github.com/GoogleCloudPlatform/generative-ai/blob/ma...
[2] https://console.cloud.google.com/vertex-ai/publishers/google...

nlpaug

10 4,252 0.0 Jupyter Notebook

Data augmentation for NLP
ArtLine

12 3,531 1.4 Jupyter Notebook

A Deep Learning based project for creating line art portraits.
Dreambooth-Stable-Diffusion

100 3,162 6.8 Jupyter Notebook

Implementation of Dreambooth (https://arxiv.org/abs/2208.12242) by way of Textual Inversion (https://arxiv.org/abs/2208.01618) for Stable Diffusion (https://arxiv.org/abs/2112.10752). Tweaks focused on training faces, objects, and styles. (by JoePenna)

Project mention: Will there be comprehensive tutorials for fine-tuning SD XL when it comes out? | /r/StableDiffusion | 2023-07-01

Tons of stuff here, no? https://github.com/JoePenna/Dreambooth-Stable-Diffusion/

examples

4 2,433 9.3 Jupyter Notebook

Jupyter Notebooks to help you get hands-on with Pinecone vector databases (by pinecone-io)
clip-retrieval

11 2,124 7.9 Jupyter Notebook

Easily compute clip embeddings and build a clip retrieval system with them

Project mention: FLaNK AI for 11 March 2024 | dev.to | 2024-03-11

machine-learning-experiments

8 1,602 2.6 Jupyter Notebook

🤖 Interactive Machine Learning experiments: 🏋️models training + 🎨models demo
vertex-ai-samples

24 1,342 9.8 Jupyter Notebook

Sample code and notebooks for Vertex AI, the end-to-end machine learning platform on Google Cloud

Project mention: Gemini 1.5 outshines GPT-4-Turbo-128K on long code prompts, HVM author | news.ycombinator.com | 2024-02-18

imodels

7 1,290 8.5 Jupyter Notebook

Interpretable ML package 🔍 for concise, transparent, and accurate predictive modeling (sklearn-compatible).
tensor-house

4 1,162 7.5 Jupyter Notebook

A collection of reference Jupyter notebooks and demo AI/ML applications for enterprise use cases: marketing, pricing, supply chain, smart manufacturing, and more.
Deep-Learning-In-Production

2 1,072 0.0 Jupyter Notebook

Build, train, deploy, scale and maintain deep learning models. Understand ML infrastructure and MLOps using hands-on examples.
chameleon-llm

3 1,017 6.2 Jupyter Notebook

Codes for "Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models".

Project mention: Giving GPT “Infinite” Knowledge | news.ycombinator.com | 2023-05-08

> Do you know any active research in this area? I briefly considered playing with this, but my back-of-the-envelope semi-educated feeling for now is that it won't scale.
I am aware of a couple of potentially promising research directions. One formally academic called Chameleon [0], and one that's more like a grassroots organic effort that aims to build an actually functional Auto-GPT-like, called Agent-LLM [1]. I have read the Chameleon paper, and I must say I'm quite impressed with their architecture. It added a few bits and pieces that most of the early GPT-based agents didn't have, and I have a strong intuition that these will contribute to these things actually working.
Auto-GPT is another, relatively famous piece of work in this area. However, at least as of v0.2.2, I found it relatively underwhelming. For any online knowledge retrieval+synthesis and retrieval+usage tasks, it seemed to get stuck, but it did sort-of-kind-of OK on plain online knowledge retrieval. After having a look at the Auto-GPT source code, my intuition (yes, I know - "fuzzy feelings without a solid basis" - but I believe that this is simply due to not having an AI background to explain this with crystal-clear wording) is that the poor performance of the current version of Auto-GPT is insufficient skill in prompt-chain architecture and the surprisingly low quality and at times buggy code.
I think Auto-GPT has some potential. I think the implementation lets down the concept, but that's just a question of refactoring the prompts and the overall code - which it seems like the upstream Github repo has been quite busy with, so I might give it another go in a couple of weeks to see how far it's moved forward.
> Specifically, as task complexity grows, the amount of results to combine will quickly exceed the context window size of the "combiner" GPT-4. Sure, you can stuff another layer on top, turning it into a tree/DAG, but eventually, I think the partial result itself will be larger than 8k, or even 32k tokens - and I feel this "eventually" will be hit rather quickly. But maybe my feelings are wrong and there is some mileage in this approach.
Auto-GPT uses an approach based on summarisation and something I'd term 'micro-agents'. For example, when Auto-GPT is searching for an answer to a particular question online, for each search result it finds, it spins up a sub-chain that gets asked a question 'What does this page say about X?' or 'Based on the contents of this page, how can you do Y?'. Ultimately, intelligence is about lossy compression, and this is a starkly exposed when it comes to LLMs because you have no choice but to lose some information.
> I think the partial result itself will be larger than 8k, or even 32k tokens - and I feel this "eventually" will be hit rather quickly. But maybe my feelings are wrong and there is some mileage in this approach.
The solution to that would be to synthesize output section by section, or even as an "output stream" that can be captured and/or edited outside the LLM in whole or in chunks. IMO, I do think there's some mileage to be exploited in a recursive "store, summarise, synthesise" approach, but the problem will be that of signal loss. Every time you pass a subtask to a sub-agent, or summarise the outcome of that sub-agent into your current knowledge base, some noise is introduced. It might be that the signal to noise ratio will dissipate as higher and higher order LLM chains are used - analogously to how terrible it was to use electricity or radio waves before any amplification technology became available.
One possible avenue to explore to crack down on decreasing SNR (based on my own original research, but I can also see some people disclosing online that they are exploring the same path), is to have a second LLM in the loop, double-checking the result of the first one. This has some limitations, but I have successfully used this approach to verify that, for example, the LLM does not outright refuse to carry out a task. This is currently cost-prohibitive to do in a way that would make me personally satisfied and confident enough in the output to make it run full-auto, but I expect that increasing ability to run AI locally will make people more willing to experiment with massive layering of cooperating LLM chains that check each others' work, cooperate, and/or even repeat work using different prompts to pick the best output a la redundant avionics computers.
[0]: https://github.com/lupantech/chameleon-llm

PConv-Keras

2 893 0.0 Jupyter Notebook

Unofficial implementation of "Image Inpainting for Irregular Holes Using Partial Convolutions". Try at: www.fixmyphoto.ai
Basic-Mathematics-for-Machine-Learning

1 567 10.0 Jupyter Notebook

The motive behind Creating this repo is to feel the fear of mathematics and do what ever you want to do in Machine Learning , Deep Learning and other fields of AI
SaaSHub

www.saashub.com sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Jupyter Notebook AI related posts

Machine Learning and AI Beyond the Basics Book
1 project | news.ycombinator.com | 16 Apr 2024
Google Research website is down
1 project | news.ycombinator.com | 5 Apr 2024
GPT-4, without specialized training, beat a GPT-3.5 class model that cost $10B
3 projects | news.ycombinator.com | 24 Mar 2024
FREE AI Course By Microsoft: ZERO to HERO! 🔥
1 project | dev.to | 18 Mar 2024
Building an Open Source Decentralized E-Book Search Engine
5 projects | news.ycombinator.com | 11 Mar 2024
Generative AI for Beginners – 18 Lessons
1 project | news.ycombinator.com | 5 Mar 2024
Generative AI for Beginners – Version 2
1 project | news.ycombinator.com | 23 Feb 2024
A note from our sponsor - InfluxDB
www.influxdata.com | 27 Apr 2024

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source AI projects in Jupyter Notebook? This list will help you:

	Project	Stars
1	generative-ai-for-beginners	42,394
2	google-research	32,804
3	AI-For-Beginners	31,046
4	learnopencv	20,363
5	h4cker	16,518
6	StableLM	15,853
7	stable-diffusion-webui-colab	15,237
8	dopamine	10,371
9	ML-Papers-of-the-Week	8,692
10	generative-ai	5,396
11	nlpaug	4,252
12	ArtLine	3,531
13	Dreambooth-Stable-Diffusion	3,162
14	examples	2,433
15	clip-retrieval	2,124
16	machine-learning-experiments	1,602
17	vertex-ai-samples	1,342
18	imodels	1,290
19	tensor-house	1,162
20	Deep-Learning-In-Production	1,072
21	chameleon-llm	1,017
22	PConv-Keras	893
23	Basic-Mathematics-for-Machine-Learning	567