Show HN: We got fine-tuning Mistral-7B to not suck

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • unsloth

    Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory

  • Unsloth’s colab notebooks for fine-tuning Mistral-7B are super easy to use and run fine in just about any colab instance:

    https://github.com/unslothai/unsloth

    It’s my default now for experimenting and basic training. If I want to get into the weeds with the training, I use axolotl, but 9/10, it’s not really necessary.

  • helix

    Multi-node production AI stack. Run the best of open source AI easily on your own servers. Create your own AI by fine-tuning open source models (by helixml)

  • If you look at the source [1] you can see how they solved their what are the doctors going to do problem. It is literally included in one of the prompts now:

    Users tend to ask broad, vague questions of the document in order to test that the system is working. We want those queries to work well. For example, a user would ask "what are the doctors going to do?" of a document that is about a junior doctors' strike. Take this into account when generating the questions - in particular, refer to noun phrases by less specific descriptions, so for example instead of "junior doctors", say "doctors" in your questions.

    [1]: https://github.com/helixml/helix/blob/main/api/pkg/dataprep/...

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • lamini-sdk

  • https://github.com/lamini-ai/lamini-sdk/tree/main/03_RAG

    You can implement RAG in 80 lines of python and 0 libraries.

  • vllm

    A high-throughput and memory-efficient inference and serving engine for LLMs

  • Great question! scheduling workloads onto GPUs in a way where VRAM is being utilised efficiently was quite the challenge.

    What we found was the IO latency for loading model weights into VRAM will kill responsiveness if you don't "re-use" sessions (i.e. where the model weights remain loaded and you run multiple inference sessions over the same loaded weights).

    Obviously projects like https://github.com/vllm-project/vllm exist but we needed to build out a scheduler that can run a fleet of GPUs for a matrix of text/image vs inference/finetune sessions.

    disclaimer: I work on Helix

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts