Performance of a 2TB comments database

This page summarizes the projects mentioned and recommended in the original post on /r/pushshift

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • Pushshift-Importer

  • If you stick with SQLite, you could try creating your own sequencer. Funnel all your writes into one thread on one process, and have that thread do the writing. That way there is only ever one possible writer on the DB at a time. Here is an example what I did when I built a tool to import comments from pushshift into SQLite. When I do this on an NVME drive and I am CPU bound on decompression and JSON parsing, so the DB isn't even a bottleneck.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Pagefind, a static open-source search library

    1 project | news.ycombinator.com | 9 May 2024
  • Ask HN: How's your experience with Compose/Kotlin multiplatform?

    1 project | news.ycombinator.com | 9 May 2024
  • Aya Rust tutorial Part One

    1 project | dev.to | 9 May 2024
  • Machine learning in Elixir is production-ready

    2 projects | news.ycombinator.com | 9 May 2024
  • Solving the Localhost Development Headache with Nanocl

    2 projects | dev.to | 9 May 2024