How to create a broad/representative sample from millions of records?

This page summarizes the projects mentioned and recommended in the original post on /r/LanguageTechnology

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • lit

    The Learning Interpretability Tool: Interactively analyze ML models to understand their behavior in an extensible and framework agnostic interface. (by PAIR-code)

  • I'd also suggest looking at your data sample, and how your model handles it, with some kind of exploratory analysis tool. Google's Language Interpretability Tool might work for your scenario. This can give you a lot of ideas about how to prepare the data better.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • StreamingLLM: tiny tweak to KV LRU improves long conversations

    1 project | news.ycombinator.com | 13 Feb 2024
  • [D] Is there a tool that indicates which parts of the input prompt impact the LLM's output the most?

    1 project | /r/MachineLearning | 7 Dec 2023
  • Show HN: Fully client-side GPT2 prediction visualizer

    1 project | news.ycombinator.com | 5 Sep 2023
  • How to visualise LLMs ?

    1 project | /r/LocalLLaMA | 12 Jul 2023
  • Ask HN: Can someone ELI5 Transformers and the “Attention is all we need” paper

    2 projects | news.ycombinator.com | 17 May 2023