Ask HN: If we train an LLM with “data” instead of “language” tokens

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • gretel-synthetics

    Synthetic data generators for structured and unstructured text, featuring differentially private learning.

  • Hey there! Co-founder of Gretel.ai here, and I think I can provide some insights on this topic.

    Firstly, the concept you're hinting at is not purely traditional ML. In traditional machine learning, we often prioritize feature extraction and engineering specific to a given problem space before training.

    What you're describing and what we've been working on at Gretel.ai, is leveraging the power of models like Large Language Models (LLMs) to understand and extrapolate from vast amounts of diverse data without the need for time-consuming feature engineering. Here's a link to our open-source library https://github.com/gretelai/gretel-synthetics for synthetic data generation (currently supporting GAN and RNN-based language models), and also our recent announcement around a Tabular LLM we're training to help people build with data https://gretel.ai/tabular-llm

    A few areas where we've found tabular or Large Data Models to be really useful are:

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Which open source tool for generating synthetic data sets?

    1 project | /r/MLQuestions | 17 Oct 2022
  • Libraries for synthetic data?

    4 projects | /r/algotrading | 3 May 2023
  • Gretel-synthetics: open-source library to create synthetic datasets

    1 project | news.ycombinator.com | 22 Feb 2021
  • Show HN: Qrlew, simple SQL to SQL-with-privacy written in Rust

    2 projects | news.ycombinator.com | 27 Mar 2024
  • LLMs and Programming in the first days of 2024

    8 projects | news.ycombinator.com | 2 Jan 2024