Introducing Basaran: self-hosted open-source alternative to the OpenAI text completion API

This page summarizes the projects mentioned and recommended in the original post on /r/LocalLLaMA

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • llama.cpp

    LLM inference in C/C++

  • After https://github.com/ggerganov/llama.cpp/pull/1459 was merged I found clblast to be around the same speed as cublas on my 3080.

  • basaran

    Discontinued Basaran is an open-source alternative to the OpenAI text completion API. It provides a compatible streaming API for your Hugging Face Transformers-based text generation models.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • Dependencies

    A rewrite of the old legacy software "depends.exe" in C# for Windows devs to troubleshoot dll load dependencies issues.

  • I did that, basically. Problem is there is a clblast.dll (on windows) that llama.dll depends on, and it llama-cpp-python always failed dependency resolve to find it. I copied the dll to the right folder, loading it manually via CDLL worked fine, and https://github.com/lucasg/Dependencies also confirmed the dll was findable. When loading DLL's in windows, it checks the same folder for dependency dll's (and a few other places).

  • GPTQ-for-LLaMa

    4 bits quantization of LLaMA using GPTQ

  • Thanks for the explanation. I think some repos, like text generation webui used gptq for llama (I don't know if it's this repo or another one), anyway most repo that I saw use external things (like gptq for llama)

  • gpt-llama.cpp

    A llama.cpp drop-in replacement for OpenAI's GPT endpoints, allowing GPT-powered apps to run off local llama.cpp models instead of OpenAI.

  • sounds like you’re asking for exactly this? https://github.com/keldenl/gpt-llama.cpp

  • AutoGPTQ

    An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

  • Instead of integrating GPTQ-for-Lllama, use AutoGPTQ instead.

  • text-generation-webui

    A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

  • it does everything https://github.com/oobabooga/text-generation-webui

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts