-
-
Scout Monitoring
Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
-
-
-
-
I agree this is an interesting direction, I think this is on the roadmap for DSPy [https://github.com/stanfordnlp/dspy], but right now they mainly focus on optimizing the in-context examples.
I've bumped into a few of these. I use https://openrouter.ai as a model abstraction, but not as a router. https://withmartian.com does the same thing but with a more enterprise feel. Also https://www.braintrustdata.com/ though it's less clear how committed they are to that feature.
That said, while I've really enjoyed the LLM abstraction (making it easy for me to test different models without changing my code), I haven't felt any desire for a router. I _do_ have some prompts that I send to gpt-3.5-turbo, and could potentially use other models, but it's kind of niche.
In part this is because I try to do as much in a single prompt as I can, meaning I want to use a model that's able to handle the hardest parts of the prompt and then the easy parts come along with. As a result there's not many "easy" prompts. The easy prompts are usually text fixup and routing.
My "routing" prompts are at a different level of abstraction, usually routing some input or activity to one of several prompts (each of which has its own context, and the sum of all contexts across those prompts is too large, hence the routing). I don't know if there's some meaningful crossover between these two routing concepts.
Another issue I have with LLM portability is the use of tools/functions/structured output. Opus and Gemini Pro 1.5 have kind of implemented this OK, but until recently GPT was the only halfway decent implementation of this. This seems to be an "advanced" feature, yet it's also a feature I use even more with smaller prompts, as those small prompts are often inside some larger algorithm and I don't want the fuss of text parsing and exceptions from ad hoc output.
But in the end I'm not price sensitive in my work, so I always come back to the newest GPT model. If I make a switch to Opus it definitely won't be to save money! And I'm probably not going to want to fiddle, but instead make a thoughtful choice and switch the default model in my code.
Thanks for sharing! These are useful toos, but they are a bit different, more based on similarity search in prompt space (a bit like semantic router: https://github.com/aurelio-labs/semantic-router). Our router uses a neural network for the routing decisions, and it can be trained on your own prompts [https://youtu.be/9JYqNbIEac0]. We're also working on adding support for on-prem deployment :)
https://unify.ai/docs/demos/demos/LangChain/RAG_playground/R... under "Usage" step 2.
> "Input your Unify APhttps://github.com/Anteemony/RAG"
Your product looks good in my view, although I have only spend about 10min thus far. The docs look pretty easy to follow along.
I'll probably give this a try soon!
Great to know this!
I have come across Portkey's Open-source AI Gateway which kind of does the same.
https://github.com/portkey-ai/gateway
It looks like with more LLM adoption, resiliency and cost related aspects take off sooner than expected unlike other technological trends in the past.
I'm also thinking that there is a chance if something like this could help build a better RAG pipeline or evals for the GenAI App. Because end of the day you want to reduce hallucinations but still get good generative responses.
Related posts
-
How to create LLM fallback from Gemini Flash to GPT-4o?
-
Show HN: Anthropic's Prompt Engineering Interactive Tutorial (Web Version)
-
Show HN: LLM-powered NPCs running on your hardware
-
Looking for cofounders to build open reliable LLM infra
-
Show HN: Cognita – open-source RAG framework for modular applications