Researcher looking for help with how to prepare a finetuning dataset for models like Bloomz and Cerebras-GPT

This page summarizes the projects mentioned and recommended in the original post on /r/ArtificialInteligence

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • xTuring

    Build, customize and control you own LLMs. From data pre-processing to fine-tuning, xTuring provides an easy way to personalize open-source LLMs. Join our discord community: https://discord.gg/TgHXuSJEk6

  • I want to start with a totally freely available model, so again, that excludes things like LLaMA where the weights are only available through a wait list. The two models that most get my attention and (I think, and hope) fit my criteria of open availability are Cerebras-GPT (13b) and Bloomz (7b). The tools to process and fine-tune that seem most feasible to me, from my limit knowledge, are xturing and basaran.

  • basaran

    Discontinued Basaran is an open-source alternative to the OpenAI text completion API. It provides a compatible streaming API for your Hugging Face Transformers-based text generation models.

  • I want to start with a totally freely available model, so again, that excludes things like LLaMA where the weights are only available through a wait list. The two models that most get my attention and (I think, and hope) fit my criteria of open availability are Cerebras-GPT (13b) and Bloomz (7b). The tools to process and fine-tune that seem most feasible to me, from my limit knowledge, are xturing and basaran.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • 1-Jun-2023

    2 projects | /r/dailyainews | 2 Jun 2023
  • Basaran is an open-source alternative to the OpenAI text completion API

    1 project | news.ycombinator.com | 31 May 2023
  • Ask HN: What's the best self hosted/local alternative to GPT-4?

    12 projects | news.ycombinator.com | 31 May 2023
  • Are all the finetunes stupid?

    5 projects | /r/LocalLLaMA | 22 Apr 2023
  • Using the API in Node

    3 projects | /r/Oobabooga | 11 Apr 2023