[D] Hardest thing about building with LLMs?

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • embedchain

    Personalizing LLM Responses

  • Langchain is a big wrapper in itself and people can't be bothered to even use that to write 10 lines of code. Look at the traction this project is getting https://github.com/embedchain/embedchain, at it's heart it's just using few modules from langchain. The whole thing, chunking+embedding+retrieval+promoting can be done in 100 lines without langchain and embedchain.

  • ripgrep

    ripgrep recursively searches directories for a regex pattern while respecting your gitignore

  • This is kinda an oversimplifying statement. The details matter a lot. For example the langchain project I linked consider only markdown files. It doesnt even look at the code while cody includes files like pr, commit messages but it also knows it should ignore binary files or other generated code. It supports a lot more complicated search syntax, it can limit the search context based on organization, repo, language, etc. https://docs.sourcegraph.com/getting-started/github-vs-sourcegraph Its retrieval uses more than just embedding+ann, it also uses keyword based search, ripgrip and some pagerank based algorithm(mentioned in the article I linked. You have to parse the files to find the symbols. They develop a new format to index source code). They also use some custom heuristic to rank the results(I think it is partly based on sourcerank).

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • trulens

    Evaluation and Tracking for LLM Experiments

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Tell HN: Rust Is the Superglue

    3 projects | news.ycombinator.com | 15 Jul 2023
  • [R] Training Machine Learning Models More Efficiently with Dataset Distillation

    2 projects | /r/MachineLearning | 15 Dec 2021
  • What happened to the RTS genre? What's upcoming for the RTS genre?

    5 projects | /r/gamedev | 2 Oct 2021
  • Is there a way to de-amazon an Echo Dot?

    2 projects | /r/selfhosted | 16 Sep 2021
  • Help increase female contributions to Mozilla Common Voice Project

    2 projects | /r/Scientits | 23 Feb 2021