-
I witness the overengineering regarding "big" data tools and pipelines since many years... For a lot of use cases, data warehouses and data lakes are only in the gigabytes or single-digit terabytes range, thus their architecture could be much more simplified, e.g. running DuckDB on a decent EC2 instance.
In my experience, doing this will yield the query results faster than some other systems even starting the query execution (yes, I'm looking at you Athena)...
I even think that a lot of queries can be run from a browser nowadays, that's why I created https://sql-workbench.com/ with the help of DuckDB WASM (https://github.com/duckdb/duckdb-wasm) and perspective.js (https://github.com/finos/perspective).
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
I witness the overengineering regarding "big" data tools and pipelines since many years... For a lot of use cases, data warehouses and data lakes are only in the gigabytes or single-digit terabytes range, thus their architecture could be much more simplified, e.g. running DuckDB on a decent EC2 instance.
In my experience, doing this will yield the query results faster than some other systems even starting the query execution (yes, I'm looking at you Athena)...
I even think that a lot of queries can be run from a browser nowadays, that's why I created https://sql-workbench.com/ with the help of DuckDB WASM (https://github.com/duckdb/duckdb-wasm) and perspective.js (https://github.com/finos/perspective).
-
Related posts
-
Building Databases over a Weekend
-
Polars: alternativa ao Pandas
-
I used multiprocessing and multithreading at the same time to drop the execution time of my code from 155+ seconds to just over 2+ seconds
-
Test On 4 Concurrent Jobs Using Python-Polars 0.17.11 to GroupBy Billion Rows
-
Welcome to InfluxDB IOx: InfluxData’s New Storage Engine