LLM now provides tools for working with embeddings

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • llm-cluster

    LLM plugin for clustering embeddings

  • There's a lot of stuff in this release.

    Don't miss the new llm-cluster plugin, which can both calculate clusters from embeddings and use another LLM call to generate a name for each cluster: https://github.com/simonw/llm-cluster

    Example usage:

    Fetch all issues, embed them and store the embeddings and content in SQLite:

        paginate-json 'https://api.github.com/repos/simonw/llm/issues?state=all&filter=all' \

  • llm-gpt4all

    Plugin for LLM adding support for the GPT4All collection of models

  • I'm still iterating on that. Plugins get complete control over the prompts, so they can handle the various weirdnesses of them. Here's some relevant code:

    https://github.com/simonw/llm-gpt4all/blob/0046e2bf5d0a9c369...

    https://github.com/simonw/llm-mlc/blob/b05eec9ba008e700ecc42...

    https://github.com/simonw/llm-llama-cpp/blob/29ee8d239f5cfbf...

    I'm not completely happy with this yet. Part of the problem is that different models on the same architecture may have completely different prompting styles.

    I expect I'll eventually evolve the plugins to allow them to be configured in an easier and more flexible way. Ideally I'd like you to be able to run new models on existing architectures using an existing plugin.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • llm-mlc

    LLM plugin for running models using MLC

  • I'm still iterating on that. Plugins get complete control over the prompts, so they can handle the various weirdnesses of them. Here's some relevant code:

    https://github.com/simonw/llm-gpt4all/blob/0046e2bf5d0a9c369...

    https://github.com/simonw/llm-mlc/blob/b05eec9ba008e700ecc42...

    https://github.com/simonw/llm-llama-cpp/blob/29ee8d239f5cfbf...

    I'm not completely happy with this yet. Part of the problem is that different models on the same architecture may have completely different prompting styles.

    I expect I'll eventually evolve the plugins to allow them to be configured in an easier and more flexible way. Ideally I'd like you to be able to run new models on existing architectures using an existing plugin.

  • llm-llama-cpp

    LLM plugin for running models using llama.cpp

  • I'm still iterating on that. Plugins get complete control over the prompts, so they can handle the various weirdnesses of them. Here's some relevant code:

    https://github.com/simonw/llm-gpt4all/blob/0046e2bf5d0a9c369...

    https://github.com/simonw/llm-mlc/blob/b05eec9ba008e700ecc42...

    https://github.com/simonw/llm-llama-cpp/blob/29ee8d239f5cfbf...

    I'm not completely happy with this yet. Part of the problem is that different models on the same architecture may have completely different prompting styles.

    I expect I'll eventually evolve the plugins to allow them to be configured in an easier and more flexible way. Ideally I'd like you to be able to run new models on existing architectures using an existing plugin.

  • datasette-faiss

    Maintain a FAISS index for specified Datasette tables

  • I experimented with that a few months ago. Building a fresh FAISS index for a few thousand matches is really quick, so o think it's often better to filter first, build a scratch index and then use that for similarity: https://github.com/simonw/datasette-faiss/issues/3

  • DP_means

    Dirichlet Process K-means

  • I found one implementation here: https://github.com/vsmolyakov/DP_means

    Alternatively, there is a Bayesian GMM in sklearn. When you restrict it to diagonal Covariance matrices, you should be fine in high dimensions

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Show HN: An Open source platform for building voice first multimodal agents

    1 project | news.ycombinator.com | 15 May 2024
  • Show HN: Tarsier – vision for text-only LLM web agents that beats GPT-4o

    4 projects | news.ycombinator.com | 15 May 2024
  • Viking 7B: open LLM for the Nordic languages trained on AMD GPUs

    1 project | news.ycombinator.com | 15 May 2024
  • Building a Tic-Tac-Toe Game in Python: A Step-by-Step Guide

    1 project | dev.to | 15 May 2024
  • Show HN: Julep: A platform to manage memories, knowledge and tools for LLM apps

    2 projects | news.ycombinator.com | 14 May 2024