[R] LaVIN-lite: Training your own Multimodal Large Language Models on one single GPU with competitive performance! (Technical Details)

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • LaVIN

    [NeurIPS 2023] Official implementations of "Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models"

  • LaVIN code: https://github.com/luogen1996/LaVIN

  • qlora

    QLoRA: Efficient Finetuning of Quantized LLMs

  • 4-bit quantization training mainly refers to qlora. Simply put, qlora quantizes the weights of the LLM into 4-bit for storage, while dequantizing them into 16-bit during the training process to ensure training precision. This method significantly reduces GPU memory overhead during training (the training speed should not vary much). This approach is highly suitable to be combined with parameter-efficient methods. However, the original paper was designed for single-modal LLMs and the code has already been wrapped in HuggingFace's library. Therefore, we extracted the core code from HuggingFace's library and migrated it into LaVIN's code. The main principle is to replace all linear layers in LLM with 4-bit quantized layers. Those interested can refer to our implementation in quantization.py and mm_adaptation.py, which is roughly a dozen lines of code.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts