Jupyter Notebook AI

Open-source Jupyter Notebook projects categorized as AI

Top 23 Jupyter Notebook AI Projects

  • generative-ai-for-beginners

    18 Lessons, Get Started Building with Generative AI đź”— https://microsoft.github.io/generative-ai-for-beginners/

  • Project mention: Build a serverless ChatGPT with RAG using LangChain.js | dev.to | 2024-04-10

    Generative AI For Beginners: a collection of resources to learn about Generative AI, including tutorials, code samples, and more.

  • google-research

    Google Research

  • Project mention: Show HN: Next-token prediction in JavaScript – build fast LLMs from scratch | news.ycombinator.com | 2024-04-10

    People on here will be happy to say that I do a similar thing, however my sequence length is dynamic because I also use a 2nd data structure - I'll use pretentious academic speak: I use a simple bigram LM (2-gram) for single next-word likeliness and separately a trie that models all words and phrases (so, n-gram). Not sure how many total nodes because sentence lengths vary in training data, but there are about 200,000 entry points (keys) so probably about 2-10 million total nodes in the default setup.

    "Constructing 7-gram LM": They likely started with bigrams (what I use) which only tells you the next word based on 1 word given, and thought to increase accuracy by modeling out more words in a sequence, and eventually let the user (developer) pass in any amount they want to model (https://github.com/google-research/google-research/blob/5c87...). I thought of this too at first, but I actually got more accuracy (and speed) out of just keeping them as bigrams and making a totally separate structure that models out an n-gram of all phrases (e.g. could be a 24-token long sequence or 100+ tokens etc. I model it all) and if that phrase is found, then I just get the bigram assumption of the last token of the phrase. This works better when the training data is more diverse (for a very generic model), but theirs would probably outperform mine on accuracy when the training data has a lot of nearly identical sentences that only change wildly toward the end - I don't find this pattern in typical data though, maybe for certain coding and other tasks there are those patterns though. But because it's not dynamic and they make you provide that number, even a low number (any phrase longer than 2 words) - theirs will always have to do more lookup work than with simple bigrams and they're also limited by that fixed number as far as accuracy. I wonder how scalable that is - if I need to train on occasional ~100-word long sentences but also (and mostly) just ~3-word long sentences, I guess I set this to 100 and have a mostly "undefined" trie.

    I also thought of the name "LMJS", theirs is "jslm" :) but I went with simply "next-token-prediction" because that's what it ultimately does as a library. I don't know what theirs is really designed for other than proving a concept. Most of their code files are actually comments and hypothetical scenarios.

    I recently added a browser example showing simple autocomplete using my library: https://github.com/bennyschmidt/next-token-prediction/tree/m... (video)

    And next I'm implementing 8-dimensional embeddings that are converted to normalized vectors between 0-1 to see if doing math on them does anything useful beyond similarity, right now they look like this:

      [nextFrequency, prevalence, specificity, length, firstLetter, lastLetter, firstVowel, lastVowel]

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • AI-For-Beginners

    12 Weeks, 24 Lessons, AI for All!

  • Project mention: FREE AI Course By Microsoft: ZERO to HERO! 🔥 | dev.to | 2024-03-18

    đź”— https://github.com/microsoft/AI-For-Beginners đź”— https://microsoft.github.io/AI-For-Beginners/

  • learnopencv

    Learn OpenCV : C++ and Python Examples

  • Project mention: YOLO-NAS Pose | /r/pytorch | 2023-11-16

    Deci's YOLO-NAS Pose: Redefining Pose Estimation! Elevating healthcare, sports, tech, and robotics with precision and speed. Github link and blog link down below! Repo: https://github.com/spmallick/learnopencv/tree/master/YOLO-NAS-Pose

  • h4cker

    This repository is primarily maintained by Omar Santos (@santosomar) and includes thousands of resources related to ethical hacking, bug bounties, digital forensics and incident response (DFIR), artificial intelligence security, vulnerability research, exploit development, reverse engineering, and more.

  • StableLM

    StableLM: Stability AI Language Models

  • Project mention: The Era of 1-bit LLMs: ternary parameters for cost-effective computing | news.ycombinator.com | 2024-02-28

    https://github.com/Stability-AI/StableLM?tab=readme-ov-file#...

  • stable-diffusion-webui-colab

    stable diffusion webui colab

  • Project mention: Stable-Diffusion-Webui-Colab | news.ycombinator.com | 2023-07-24
  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • dopamine

    Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.

  • ML-Papers-of-the-Week

    🔥Highlighting the top ML papers every week.

  • Project mention: [D] Where can I find a list of the foundational academic papers in RL/ML/DL and what are your go-to places to find new academic papers in RL/ML/DL? | /r/MachineLearning | 2023-07-07

    Labml.ai stopped working in May. I like https://github.com/dair-ai/ML-Papers-of-the-Week

  • generative-ai

    Sample code and notebooks for Generative AI on Google Cloud (by GoogleCloudPlatform)

  • Project mention: Google Imagen 2 | news.ycombinator.com | 2023-12-13

    I've used the code based on similar examples from GitHub [1]. According to docs [2], imagegeneration@005 was released on the 11th, so I guessed it's Imagen 2, though there are no confirmations.

    [1] https://github.com/GoogleCloudPlatform/generative-ai/blob/ma...

    [2] https://console.cloud.google.com/vertex-ai/publishers/google...

  • nlpaug

    Data augmentation for NLP

  • ArtLine

    A Deep Learning based project for creating line art portraits.

  • Dreambooth-Stable-Diffusion

    Implementation of Dreambooth (https://arxiv.org/abs/2208.12242) by way of Textual Inversion (https://arxiv.org/abs/2208.01618) for Stable Diffusion (https://arxiv.org/abs/2112.10752). Tweaks focused on training faces, objects, and styles. (by JoePenna)

  • Project mention: Will there be comprehensive tutorials for fine-tuning SD XL when it comes out? | /r/StableDiffusion | 2023-07-01

    Tons of stuff here, no? https://github.com/JoePenna/Dreambooth-Stable-Diffusion/

  • examples

    Jupyter Notebooks to help you get hands-on with Pinecone vector databases (by pinecone-io)

  • clip-retrieval

    Easily compute clip embeddings and build a clip retrieval system with them

  • Project mention: FLaNK AI for 11 March 2024 | dev.to | 2024-03-11
  • machine-learning-experiments

    🤖 Interactive Machine Learning experiments: 🏋️models training + 🎨models demo

  • vertex-ai-samples

    Sample code and notebooks for Vertex AI, the end-to-end machine learning platform on Google Cloud

  • Project mention: Gemini 1.5 outshines GPT-4-Turbo-128K on long code prompts, HVM author | news.ycombinator.com | 2024-02-18
  • imodels

    Interpretable ML package 🔍 for concise, transparent, and accurate predictive modeling (sklearn-compatible).

  • tensor-house

    A collection of reference Jupyter notebooks and demo AI/ML applications for enterprise use cases: marketing, pricing, supply chain, smart manufacturing, and more.

  • Deep-Learning-In-Production

    Build, train, deploy, scale and maintain deep learning models. Understand ML infrastructure and MLOps using hands-on examples.

  • chameleon-llm

    Codes for "Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models".

  • Project mention: Giving GPT “Infinite” Knowledge | news.ycombinator.com | 2023-05-08

    > Do you know any active research in this area? I briefly considered playing with this, but my back-of-the-envelope semi-educated feeling for now is that it won't scale.

    I am aware of a couple of potentially promising research directions. One formally academic called Chameleon [0], and one that's more like a grassroots organic effort that aims to build an actually functional Auto-GPT-like, called Agent-LLM [1]. I have read the Chameleon paper, and I must say I'm quite impressed with their architecture. It added a few bits and pieces that most of the early GPT-based agents didn't have, and I have a strong intuition that these will contribute to these things actually working.

    Auto-GPT is another, relatively famous piece of work in this area. However, at least as of v0.2.2, I found it relatively underwhelming. For any online knowledge retrieval+synthesis and retrieval+usage tasks, it seemed to get stuck, but it did sort-of-kind-of OK on plain online knowledge retrieval. After having a look at the Auto-GPT source code, my intuition (yes, I know - "fuzzy feelings without a solid basis" - but I believe that this is simply due to not having an AI background to explain this with crystal-clear wording) is that the poor performance of the current version of Auto-GPT is insufficient skill in prompt-chain architecture and the surprisingly low quality and at times buggy code.

    I think Auto-GPT has some potential. I think the implementation lets down the concept, but that's just a question of refactoring the prompts and the overall code - which it seems like the upstream Github repo has been quite busy with, so I might give it another go in a couple of weeks to see how far it's moved forward.

    > Specifically, as task complexity grows, the amount of results to combine will quickly exceed the context window size of the "combiner" GPT-4. Sure, you can stuff another layer on top, turning it into a tree/DAG, but eventually, I think the partial result itself will be larger than 8k, or even 32k tokens - and I feel this "eventually" will be hit rather quickly. But maybe my feelings are wrong and there is some mileage in this approach.

    Auto-GPT uses an approach based on summarisation and something I'd term 'micro-agents'. For example, when Auto-GPT is searching for an answer to a particular question online, for each search result it finds, it spins up a sub-chain that gets asked a question 'What does this page say about X?' or 'Based on the contents of this page, how can you do Y?'. Ultimately, intelligence is about lossy compression, and this is a starkly exposed when it comes to LLMs because you have no choice but to lose some information.

    > I think the partial result itself will be larger than 8k, or even 32k tokens - and I feel this "eventually" will be hit rather quickly. But maybe my feelings are wrong and there is some mileage in this approach.

    The solution to that would be to synthesize output section by section, or even as an "output stream" that can be captured and/or edited outside the LLM in whole or in chunks. IMO, I do think there's some mileage to be exploited in a recursive "store, summarise, synthesise" approach, but the problem will be that of signal loss. Every time you pass a subtask to a sub-agent, or summarise the outcome of that sub-agent into your current knowledge base, some noise is introduced. It might be that the signal to noise ratio will dissipate as higher and higher order LLM chains are used - analogously to how terrible it was to use electricity or radio waves before any amplification technology became available.

    One possible avenue to explore to crack down on decreasing SNR (based on my own original research, but I can also see some people disclosing online that they are exploring the same path), is to have a second LLM in the loop, double-checking the result of the first one. This has some limitations, but I have successfully used this approach to verify that, for example, the LLM does not outright refuse to carry out a task. This is currently cost-prohibitive to do in a way that would make me personally satisfied and confident enough in the output to make it run full-auto, but I expect that increasing ability to run AI locally will make people more willing to experiment with massive layering of cooperating LLM chains that check each others' work, cooperate, and/or even repeat work using different prompts to pick the best output a la redundant avionics computers.

    [0]: https://github.com/lupantech/chameleon-llm

  • PConv-Keras

    Unofficial implementation of "Image Inpainting for Irregular Holes Using Partial Convolutions". Try at: www.fixmyphoto.ai

  • Basic-Mathematics-for-Machine-Learning

    The motive behind Creating this repo is to feel the fear of mathematics and do what ever you want to do in Machine Learning , Deep Learning and other fields of AI

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Jupyter Notebook AI related posts

Index

What are some of the best open-source AI projects in Jupyter Notebook? This list will help you:

Project Stars
1 generative-ai-for-beginners 42,394
2 google-research 32,804
3 AI-For-Beginners 31,046
4 learnopencv 20,363
5 h4cker 16,518
6 StableLM 15,853
7 stable-diffusion-webui-colab 15,237
8 dopamine 10,371
9 ML-Papers-of-the-Week 8,692
10 generative-ai 5,396
11 nlpaug 4,252
12 ArtLine 3,531
13 Dreambooth-Stable-Diffusion 3,162
14 examples 2,433
15 clip-retrieval 2,124
16 machine-learning-experiments 1,602
17 vertex-ai-samples 1,342
18 imodels 1,290
19 tensor-house 1,162
20 Deep-Learning-In-Production 1,072
21 chameleon-llm 1,017
22 PConv-Keras 893
23 Basic-Mathematics-for-Machine-Learning 567

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com