-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
https://github.com/vllm-project/vllm is probably more optimized for that use case.
Refact was made for this: https://github.com/smallcloudai/refact
Setting up a server for multiple users is very different from setting up LLM for yourself. A safe bet would be to just use TGI, which supports continuous batching and is very easy to run via Docker on your server. https://github.com/huggingface/text-generation-inference
I looked into how to deploy an open-source code LLM for a dev team a couple months ago and identified five questions to figure out:
Related posts
-
Hugging Face reverts the license back to Apache 2.0
-
Deploying Llama2 with vLLM vs TGI. Need advice
-
Continuous batch enables 23x throughput in LLM inference and reduce p50 latency
-
HuggingFace Text Generation License No Longer Open-Source
-
HuggingFace Text Generation Library License Changed from Apache 2 to Hfoil