[P] optimization of Hugging Face Transformer models to get Inference < 1 Millisecond Latency + deployment on production ready inference server

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • triton_transformers

    Discontinued Deploy optimized transformer based models in production [Moved to: https://github.com/ELS-RD/transformer-deploy]

    Will you be adding Openvino for CPU implementation too to the repo?

  • optuna

    A hyperparameter optimization framework

    There are plenty of different options to do that in OSS, the most well known being optuna (https://github.com/optuna/optuna).

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts