-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
As @dcalde mentioned before, apart from the timescaledb extension, there is parallel-copy (https://github.com/timescale/timescaledb-parallel-copy) which uses multiple threads / connections to bulk insert CSV files into postgresql. It works with timescale's hypertables, but also vanilla postgres tables.
I've run datafusion over a collection of parquet files I had converted from csv. Was neat, but on a single VM it wasn't too hard with my dataset to do a group by query that blew through 64GiB ram.
I have used following python utility https://github.com/turicas/rows , it provides cli that can bulk load csv to postgres tables.
If you are doing processing on each row, I would suggest that take a look at this library https://github.com/Claviz/bellboy It’s leveraging node js streams and provides quite a few essentials required for such etl processing.
You can take a look into code if you want to, but it’s heavily wip right now https://github.com/aregee/etlp
Related posts
-
Advice on ETL and Data Sharing work process
-
Best way to introduce a linter?
-
XlOil: The fastest library for writing Excel functions in Python
-
Segítség kérés Excel automatizáláshoz
-
GitHub - cunnane/xloil: xlOil provides framework for interacting with Excel in different programming languages (python & C++ currently)