[P] What are the latest "out of the box solutions" for deploying the very large LLMs as API endpoints?

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • transformers-bloom-inference

    Fast Inference Solutions for BLOOM

  • This repo that has some "demos": https://github.com/huggingface/transformers-bloom-inference

  • FlexGen

    Running large language models on a single GPU for throughput-oriented scenarios.

  • FlexGen: https://github.com/FMInference/FlexGen but that only works for opt and is not a model hosting solution but more of an academic PoC

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • text-generation-inference

    Large Language Model Text Generation Inference

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Zephyr 141B, a Mixtral 8x22B fine-tune, is now available in Hugging Chat

    3 projects | news.ycombinator.com | 12 Apr 2024
  • LocalPilot: Open-source GitHub Copilot on your MacBook

    6 projects | news.ycombinator.com | 19 Oct 2023
  • Hugging Face reverts the license back to Apache 2.0

    1 project | news.ycombinator.com | 8 Apr 2024
  • HuggingFace text-generation-inference is reverting to Apache 2.0 License

    2 projects | news.ycombinator.com | 8 Apr 2024
  • FLaNK Stack Weekly 12 February 2024

    52 projects | dev.to | 12 Feb 2024