Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
It's true other engines like vLLM are way faster and more optimized. I started with Ollama because its codebase is Go. In reality Ollama does not even take full advantage of llama.cpp itself as it does not implement concurrency plus adds latency using json in a CGO call. I discovered that building the wasm plugin, I was disappointed, and it's not on the ollama priorities to solve that see https://github.com/ollama/ollama/issues/3170
Another advantage of Ollama is it can easily run locally, so does the wasm plugin. Accomplishing the goal of local development environment which uses dreamland.
That's great feedback. I was thinking about fixing the concurrency issue myself, but creating a vLLM wasm plugin is a better idea. The user code won't need to change as long as the plugin exports as the same wasm host module.