RedPajama v2 Open Dataset with 30T Tokens for Training LLMs

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • RedPajama-Data

    The RedPajama-Data repository contains code for preparing large datasets for training large language models.

  • Thanks for the suggestion! We will add this in the pool of features for future release. (We are currently running the current 40+ annotations on the `tail` partitions).

    If you are interested in contributing the code for these features, feel free to do a PR to https://github.com/togethercomputer/RedPajama-Data! Otherwise we will try our best effort implementation :) but we hope that this can become a community effort

    (feel free to created more issues on github for us to keep track. I created one for this https://github.com/togethercomputer/RedPajama-Data/issues/76)

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • GPT-4o

    7 projects | news.ycombinator.com | 13 May 2024
  • GPT-4o: Learn how to Implement a RAG on the new model, step-by-step!

    1 project | dev.to | 13 May 2024
  • Tired of Makefiles

    3 projects | news.ycombinator.com | 13 May 2024
  • Python library that provides easy to integrate string token based pagination

    1 project | news.ycombinator.com | 13 May 2024
  • Python FastAPI: Integrating OAuth2 Security with the Application's Own Authentication Process

    4 projects | dev.to | 13 May 2024