Our great sponsors
-
Weaviate
Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
tree-of-thought-llm
[NeurIPS 2023] Tree of Thoughts: Deliberate Problem Solving with Large Language Models
-
LocalAI
:robot: The free, Open Source OpenAI alternative. Self-hosted, community-driven and local-first. Drop-in replacement for OpenAI running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. It allows to generate Text, Audio, Video, Images. Also with voice cloning capabilities.
-
guidance
Discontinued A guidance language for controlling large language models. [Moved to: https://github.com/guidance-ai/guidance] (by microsoft)
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
2 - Probably too early in testing and development for there to be a 'standard'. A quick google search will find you some stuff to read like https://github.com/dave1010/tree-of-thought-prompting, but your best bet is to read through the stuff other people are doing and try things for yourself. You might end up discovering something new that nobody has thought of yet. Kaio Ken literally just changed the game overnight and figured out how to expand context to 8k for llama-based models with 2 lines of code. Things are evolving fast and the community desperately needs people willing to spend time reading papers on Arxiv, digging through githubs, and testing stuff out.
I tried cromadb but had terrible performance and could not pin down the cause (likely a problem on my end). Weaviate was easy to setup and had excellent performance, this is probably what I will use in the future. Next on my list is txtinstruct, to finetune a model with data that does not change and using a vector db for everything else seems promising.
I have not tried this myself but you can find the original paper here, and here is the implementation. But I have also read that you can simply give 8-10 examples to a instruction following LLM in the prompt. As a sidenote, most inference engines have an option to emulate the OpenAi API. There is also LocalAI explicitly for that purpose.
I tried cromadb but had terrible performance and could not pin down the cause (likely a problem on my end). Weaviate was easy to setup and had excellent performance, this is probably what I will use in the future. Next on my list is txtinstruct, to finetune a model with data that does not change and using a vector db for everything else seems promising.
I have not tried this myself but you can find the original paper here, and here is the implementation. But I have also read that you can simply give 8-10 examples to a instruction following LLM in the prompt. As a sidenote, most inference engines have an option to emulate the OpenAi API. There is also LocalAI explicitly for that purpose.
Most LLMs actually do a decent job out of the box if you ask them for step by step instructions. Tree of tough is one way to improve the results, reflexion is another that can be used separate or additionally. The downside is that most models will run quickly into their token limit (around 2k for most). However the new SuperHot models can handle up to 8k and then there are the RMVK-Raven models, they are RNNs and not transformers like all the other LLMs and can theoretically handle infinite context lengths (but they loose "focus" after a while).
Took a quick look, jsonformer uses the huggingface transformers library, it is focused on research and for performance out of the box. I would take a look at guidance, it is inference engine agnostic and it can generate any structured output you want, not only json. This has the added benefit that you don't need to use additional software (or plugins) for every API you might want to use in the future. Here is how that would look like for json.
Mhh, a viable alternative might be to setup a locally hosted search engine, like SearX and then use a LLM in conjunction with the llama-retrieval-plugin, that also gives you a database that is human readable and the LLM can give direct links to the source.