-
Why?
Most weather and climate datasets - including ERA5 - are highly structured on regular latitude-longitude grids. Even if you were solely doing timeseries analyses for specific locations plucked from this grid, the strength of this sort of dataset is its intrinsic spatiotemporal structure and context, and it makes very little sense to completely destroy the dataset's structure unless you were solely and exclusively to extract point timeseries. And even then, you'd probably want to decimate the data pretty dramatically, since there is very little use case for, say, a point timeseries of surface temperature in the middle of the ocean!
The vast majority of research and operational applications of datasets like ERA5 are probably better suited by leveraging cloud-optimized replicas of the original dataset, such as ARCO-ERA5 published on the Google Public Datasets program [1]. These versions of the dataset preserve the original structure, and chunk it in ways that are amenable to massively parallel access via cloud storage. In almost any case I've encountered in my career, a generically chunked Zarr-based archive of a dataset like this will be more than performant enough for the majority of use cases that one might care about.
[1]: https://cloud.google.com/storage/docs/public-datasets/era5
-
CodeRabbit
CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
-
proton
High-performance, low-footprint SQL database written in C++. Process millions of rows per second from Kafka, Pulsar, or ClickHouse, and seamlessly write results back. Supports powerful features like JOIN, CDC, UPSERT, and LOOKUP, enabling real-time analytics and ETL at scale. (by timeplus-io)
What's the process for adding support for other databases to your tool qStudio?
I'm thinking perhaps you could add support for Timeplus [1]? Timeplus is a streaming-first database built on ClickHouse. The core DB engine Timeplus Proton is open source [2].
It seems that qStudio is open source [3] and written in Java and will need a JDBC driver to add support for a new RDBMS? If yes, Timeplus Proton has an open source JDBC driver [4] based on ClickHouse's driver but with modifications added for streaming use cases.
1: https://www.timeplus.com/
2: https://github.com/timeplus-io/proton
3: https://github.com/timeseries/qstudio
4: https://github.com/timeplus-io/proton-java-driver
-
What's the process for adding support for other databases to your tool qStudio?
I'm thinking perhaps you could add support for Timeplus [1]? Timeplus is a streaming-first database built on ClickHouse. The core DB engine Timeplus Proton is open source [2].
It seems that qStudio is open source [3] and written in Java and will need a JDBC driver to add support for a new RDBMS? If yes, Timeplus Proton has an open source JDBC driver [4] based on ClickHouse's driver but with modifications added for streaming use cases.
1: https://www.timeplus.com/
2: https://github.com/timeplus-io/proton
3: https://github.com/timeseries/qstudio
4: https://github.com/timeplus-io/proton-java-driver
-
What's the process for adding support for other databases to your tool qStudio?
I'm thinking perhaps you could add support for Timeplus [1]? Timeplus is a streaming-first database built on ClickHouse. The core DB engine Timeplus Proton is open source [2].
It seems that qStudio is open source [3] and written in Java and will need a JDBC driver to add support for a new RDBMS? If yes, Timeplus Proton has an open source JDBC driver [4] based on ClickHouse's driver but with modifications added for streaming use cases.
1: https://www.timeplus.com/
2: https://github.com/timeplus-io/proton
3: https://github.com/timeseries/qstudio
4: https://github.com/timeplus-io/proton-java-driver
-
-
Creator of Open-Meteo here. There is small tutorial to setup ERA5 locally: https://github.com/open-meteo/open-data/tree/main/tutorial_d...
Under the hood Open-Meteo is using a custom file format with time-series chunking and specialised compression for low-frequency weather data. General purpose time-series databases do not even get close to this setup.
-
timescaledb-insert-benchmarks
Benchmarking inserting a ~trillion rows of weather data into TimescaleDB
The full dataset is quite huge (~9 petabytes and growing) out of which I'm using just ~8 terabytes. Still quite big to upload.
The data is freely available from the [Climate Change Service](https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysi...) which has a nice API but download speeds can be a bit slow.
[NCAR's Research Data Archive](https://rda.ucar.edu/datasets/ds633-0/) provides some of the data (as pre-generated NetCDF files) but at higher download speeds.
It's not super well documented but I hosted the Python scripts I used to download the data on the accompanying GitHub repository: https://github.com/ali-ramadhan/timescaledb-insert-benchmark...
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
timescaledb-insert-benchmark
Discontinued [GET https://api.github.com/repos/ali-ramadhan/timescaledb-insert-benchmark: 404 - Not Found // See: https://docs.github.com/rest/repos/repos#get-a-repository]
The full dataset is quite huge (~9 petabytes and growing) out of which I'm using just ~8 terabytes. Still quite big to upload.
The data is freely available from the [Climate Change Service](https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysi...) which has a nice API but download speeds can be a bit slow.
[NCAR's Research Data Archive](https://rda.ucar.edu/datasets/ds633-0/) provides some of the data (as pre-generated NetCDF files) but at higher download speeds.
It's not super well documented but I hosted the Python scripts I used to download the data on the accompanying GitHub repository: https://github.com/ali-ramadhan/timescaledb-insert-benchmark...