LLM now provides tools for working with embeddings

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

llm-cluster

3 59 4.9 Python

LLM plugin for clustering embeddings

There's a lot of stuff in this release.
Don't miss the new llm-cluster plugin, which can both calculate clusters from embeddings and use another LLM call to generate a name for each cluster: https://github.com/simonw/llm-cluster
Example usage:
Fetch all issues, embed them and store the embeddings and content in SQLite:
    paginate-json 'https://api.github.com/repos/simonw/llm/issues?state=all&filter=all' \

llm-gpt4all

3 180 6.9 Python

Plugin for LLM adding support for the GPT4All collection of models

I'm still iterating on that. Plugins get complete control over the prompts, so they can handle the various weirdnesses of them. Here's some relevant code:
https://github.com/simonw/llm-gpt4all/blob/0046e2bf5d0a9c369...
https://github.com/simonw/llm-mlc/blob/b05eec9ba008e700ecc42...
https://github.com/simonw/llm-llama-cpp/blob/29ee8d239f5cfbf...
I'm not completely happy with this yet. Part of the problem is that different models on the same architecture may have completely different prompting styles.
I expect I'll eventually evolve the plugins to allow them to be configured in an easier and more flexible way. Ideally I'd like you to be able to run new models on existing architectures using an existing plugin.

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
llm-mlc

3 172 5.1 Python

LLM plugin for running models using MLC

I'm still iterating on that. Plugins get complete control over the prompts, so they can handle the various weirdnesses of them. Here's some relevant code:
https://github.com/simonw/llm-gpt4all/blob/0046e2bf5d0a9c369...
https://github.com/simonw/llm-mlc/blob/b05eec9ba008e700ecc42...
https://github.com/simonw/llm-llama-cpp/blob/29ee8d239f5cfbf...
I'm not completely happy with this yet. Part of the problem is that different models on the same architecture may have completely different prompting styles.
I expect I'll eventually evolve the plugins to allow them to be configured in an easier and more flexible way. Ideally I'd like you to be able to run new models on existing architectures using an existing plugin.

llm-llama-cpp

1 133 7.2 Python

LLM plugin for running models using llama.cpp

I'm still iterating on that. Plugins get complete control over the prompts, so they can handle the various weirdnesses of them. Here's some relevant code:
https://github.com/simonw/llm-gpt4all/blob/0046e2bf5d0a9c369...
https://github.com/simonw/llm-mlc/blob/b05eec9ba008e700ecc42...
https://github.com/simonw/llm-llama-cpp/blob/29ee8d239f5cfbf...
I'm not completely happy with this yet. Part of the problem is that different models on the same architecture may have completely different prompting styles.
I expect I'll eventually evolve the plugins to allow them to be configured in an easier and more flexible way. Ideally I'd like you to be able to run new models on existing architectures using an existing plugin.

datasette-faiss

1 32 10.0 Python

Maintain a FAISS index for specified Datasette tables

I experimented with that a few months ago. Building a fresh FAISS index for a few thousand matches is really quick, so o think it's often better to filter first, build a scratch index and then use that for similarity: https://github.com/simonw/datasette-faiss/issues/3

DP_means

1 45 1.7 C++

Dirichlet Process K-means

I found one implementation here: https://github.com/vsmolyakov/DP_means
Alternatively, there is a Bayesian GMM in sklearn. When you restrict it to diagonal Covariance matrices, you should be fine in high dimensions

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Show HN: An Open source platform for building voice first multimodal agents

1 project | news.ycombinator.com | 15 May 2024
Show HN: Tarsier – vision for text-only LLM web agents that beats GPT-4o

4 projects | news.ycombinator.com | 15 May 2024
Viking 7B: open LLM for the Nordic languages trained on AMD GPUs

1 project | news.ycombinator.com | 15 May 2024
Building a Tic-Tac-Toe Game in Python: A Step-by-Step Guide

1 project | dev.to | 15 May 2024
Show HN: Julep: A platform to manage memories, knowledge and tools for LLM apps

2 projects | news.ycombinator.com | 14 May 2024

LLM now provides tools for working with embeddings

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Post date: 4 Sep 2023

llm-cluster

llm-gpt4all

InfluxDB

llm-mlc

llm-llama-cpp

datasette-faiss

DP_means

Related posts

Show HN: An Open source platform for building voice first multimodal agents

Show HN: Tarsier – vision for text-only LLM web agents that beats GPT-4o

Viking 7B: open LLM for the Nordic languages trained on AMD GPUs

Building a Tic-Tac-Toe Game in Python: A Step-by-Step Guide

Show HN: Julep: A platform to manage memories, knowledge and tools for LLM apps