Big Data Is Dead

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • duckdb-wasm

    WebAssembly version of DuckDB

  • I witness the overengineering regarding "big" data tools and pipelines since many years... For a lot of use cases, data warehouses and data lakes are only in the gigabytes or single-digit terabytes range, thus their architecture could be much more simplified, e.g. running DuckDB on a decent EC2 instance.

    In my experience, doing this will yield the query results faster than some other systems even starting the query execution (yes, I'm looking at you Athena)...

    I even think that a lot of queries can be run from a browser nowadays, that's why I created https://sql-workbench.com/ with the help of DuckDB WASM (https://github.com/duckdb/duckdb-wasm) and perspective.js (https://github.com/finos/perspective).

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • sql-workbench

    Public issue-tracking and feature suggestion for sql-workbench.com

  • I witness the overengineering regarding "big" data tools and pipelines since many years... For a lot of use cases, data warehouses and data lakes are only in the gigabytes or single-digit terabytes range, thus their architecture could be much more simplified, e.g. running DuckDB on a decent EC2 instance.

    In my experience, doing this will yield the query results faster than some other systems even starting the query execution (yes, I'm looking at you Athena)...

    I even think that a lot of queries can be run from a browser nowadays, that's why I created https://sql-workbench.com/ with the help of DuckDB WASM (https://github.com/duckdb/duckdb-wasm) and perspective.js (https://github.com/finos/perspective).

  • polars

    Dataframes powered by a multithreaded, vectorized query engine, written in Rust

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Polars: alternativa ao Pandas

    2 projects | /r/datasciencebr | 13 Jun 2023
  • I used multiprocessing and multithreading at the same time to drop the execution time of my code from 155+ seconds to just over 2+ seconds

    1 project | /r/Python | 29 May 2023
  • Test On 4 Concurrent Jobs Using Python-Polars 0.17.11 to GroupBy Billion Rows

    3 projects | /r/Python | 7 May 2023
  • Welcome to InfluxDB IOx: InfluxData’s New Storage Engine

    5 projects | news.ycombinator.com | 26 Oct 2022
  • Working with more than 10gb csv

    3 projects | /r/datascience | 5 Oct 2022