SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Python generative-ai Projects
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
haystack
:mag: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
-
NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
-
BentoML
The most flexible way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Inference Graph/Pipelines, Compound AI systems, Multi-Modal, RAG as a Service, and more!
-
krita-ai-diffusion
Streamlined interface for generating images with AI in Krita. Inpaint and outpaint with optional text prompt, no tweaking required.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
h2o-llmstudio
H2O LLM Studio - a framework and no-code GUI for fine-tuning LLMs. Documentation: https://h2oai.github.io/h2o-llmstudio/
-
xTuring
Build, customize and control you own LLMs. From data pre-processing to fine-tuning, xTuring provides an easy way to personalize open-source LLMs. Join our discord community: https://discord.gg/TgHXuSJEk6
-
PyRIT
The Python Risk Identification Tool for generative AI (PyRIT) is an open access automation framework to empower security professionals and machine learning engineers to proactively find risks in their generative AI systems. (by Azure)
-
cognita
RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry
-
quix-streams
A Python library for building containerized ML and Generative AI applications with Apache Kafka.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Depends what model you want to train, and how well you want your computer to keep working while you're doing it.
If you're interested in large language models there's a table of vram requirements for fine-tuning at [1] which says you could do the most basic type of fine-tuning on a 7B parameter model with 8GB VRAM.
You'll find that training takes quite a long time, and as a lot of the GPU power is going on training, your computer's responsiveness will suffer - even basic things like scrolling in your web browser or changing tabs uses the GPU, after all.
Spend a bit more and you'll probably have a better time.
[1] https://github.com/hiyouga/LLaMA-Factory?tab=readme-ov-file#...
Project mention: Haystack DB – 10x faster than FAISS with binary embeddings by default | news.ycombinator.com | 2024-04-28I was confused for a bit but there is no relation to https://haystack.deepset.ai/
Project mention: [P] Making a TTS voice, HK-47 from Kotor using Tortoise (Ideally WaveRNN) | /r/MachineLearning | 2023-07-06I don't test WaveRNN but from the ones that I know the best that is open source is FastPitch. And it's easy to use, here is the tutorial for voice cloning.
Link to GitHub -->
I have been playing a lot with Krita's SD plugin https://github.com/Acly/krita-ai-diffusion - that uses ComfyUI as it's API source.
Project mention: Paid dev gig: develop a basic LLM PEFT finetuning utility | /r/LocalLLaMA | 2023-06-02
Project mention: More Agents Is All You Need: LLMs performance scales with the number of agents | news.ycombinator.com | 2024-04-06I couldn't agree more. You should check out LLMWare's SLIM agents (https://github.com/llmware-ai/llmware/tree/main/examples/SLI...). It's focusing on pretty much exactly this and chaining multiple local LLMs together.
A really good topic that ties in with this is the need for deterministic sampling (I may have the terminology a bit incorrect) depending on what the model is indended for. The LLMWare team did a good 2 part video on this here as well (https://www.youtube.com/watch?v=7oMTGhSKuNY)
I think dedicated miniture LLMs are the way forward.
Disclaimer - Not affiliated with them in any way, just think it's a really cool project.
This notebook is dedicated to a (not so) short jupyterlab/jupyter-ai unboxing so anyone can enjoy this kind of magic (and much much more):
Project mention: I'm developing an open-source AI tool called xTuring, enabling anyone to construct a Language Model with just 5 lines of code. I'd love to hear your thoughts! | /r/machinelearningnews | 2023-09-07Explore the project on GitHub here.
Project mention: YiVal——Unlocking Your Data's Power to Create Customized GenAI Apps | /r/u_YiVal | 2023-11-16- 🤖Github:https://github.com/YiVal/YiVal/pull/189
One thing I wanted to add and call attention to is the importance of licensing in open models. This is often overlooked when we blindly accept the vague branding of models as “open”, but I am noticing that many open weight models are actually using encumbered proprietary licenses rather than standard open source licenses that are OSI approved (https://opensource.org/licenses). As an example, Databricks’s DBRX model has a proprietary license that forces adherence to their highly restrictive Acceptable Use Policy by referencing a live website hosting their AUP (https://github.com/databricks/dbrx/blob/main/LICENSE), which means as they change their AUP, you may be further restricted in the future. Meta’s Llama is similar (https://github.com/meta-llama/llama/blob/main/LICENSE ). I’m not sure who can depend on these models given this flaw.
Can someone help me understand the licensing of this?
https://github.com/sdv-dev/SDV/blob/main/LICENSE
It was MIT licensed up until 2022 where it was changed to what it is now, where they say that it will become MIT again 4 years after release... but is that from when the license was changed or the first release of the software in GitHub?
Coframe
I’m also aware of other OSS initiatives doing similar initiatives, so I wouldn’t say no one has ever done what your doing.
[1] https://github.com/traceloop/openllmetry
We have recently added support to query data from SingleStore to our agent framework, LLMStack (https://github.com/trypromptly/LLMStack). Out of the box performance performance when prompting with just the table schemas is pretty good with GPT-4.
The more domain specific knowledge needed for queries, the harder it has gotten in general. We've had good success `teaching` the model different concepts in relation to the dataset and giving it example questions and queries greatly improved performance.
Project mention: VS Code: Prompt Editor for LLMs (GPT4, Llama, Mistral, etc.) | news.ycombinator.com | 2024-03-08doesn't collect prompts and there's a way to disable telemetry as well - https://github.com/lastmile-ai/aiconfig/blob/8a5a59d47cef474...
Project mention: Show HN: Streaming DataFrames–a Pandas-like syntax for real-time data | news.ycombinator.com | 2024-04-23
Python generative-ai related posts
-
Show HN: Cognita – open-source RAG framework for modular applications
-
Gemini API 102: Next steps beyond "Hello World!"
-
Show HN: Streaming DataFrames–a Pandas-like syntax for real-time data
-
Hello OLMo: A Open LLM
-
Are you looking for free open source alternative to Midjourney and Bing images?
-
100% free Midjourney alternative. Plant trees as you generated realistic images
-
Are you looking for a green yet free Chatgpt4 Alternative?
-
A note from our sponsor - SaaSHub
www.saashub.com | 1 May 2024
Index
What are some of the best open-source generative-ai projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | LLaMA-Factory | 20,248 |
2 | jina | 20,041 |
3 | haystack | 13,633 |
4 | NeMo | 10,084 |
5 | BentoML | 6,537 |
6 | krita-ai-diffusion | 4,586 |
7 | TaskingAI | 4,421 |
8 | h2o-llmstudio | 3,583 |
9 | llmware | 3,127 |
10 | jupyter-ai | 2,857 |
11 | xTuring | 2,523 |
12 | YiVal | 2,429 |
13 | dbrx | 2,397 |
14 | SDV | 2,141 |
15 | coffee | 1,341 |
16 | PyRIT | 1,263 |
17 | openllmetry | 1,271 |
18 | LLMStack | 1,112 |
19 | cognita | 887 |
20 | canopy | 873 |
21 | aiconfig | 840 |
22 | quix-streams | 570 |
23 | Copulas | 505 |
Sponsored