-
ome-zarr-py
Implementation of next-generation file format (NGFF) specifications for storing bioimaging data in the cloud.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Looks like CZI is a competitor:
https://www.zeiss.com/microscopy/en/products/software/zeiss-...
https://github.com/cgohlke/czifile
http://www.physics.hmc.edu/~gerbode/wppriv/wp-content/upload...
CZI is basically JPEG compression and is probably missing the multidimensional compressor which is the secret sauce missing from the various formats. It also mentions licensing and legal so is maybe lost in the weeds and not open source.
Without more info, it feels like this is still an open problem.
My interest in this is for video games, since so many are multiple GB in size but are also missing this multidimensional compressor. If we had it, I'd guess a 10:1 to 100:1 reduction in size over what we have now, with no perceptual loss in quality.
This article misses one of the coolest things about the Zarr format - that it's flexible enough that it's also becoming widely used in climate science.
In particular the Pangeo project (https://pangeo.io/architecture.html) uses large Zarr stores as a performant format in the cloud which we can analyse in parallel at scale using distributed computing frameworks like dask.
More and more climate science data is being made publicly available as Zarr in the cloud, often through open data partnerships with cloud providers (e.g. on AWS (https://aws.amazon.com/blogs/publicsector/decrease-geospatia...) ERA-5 on GCP(https://cloud.google.com/storage/docs/public-datasets/era5)).
I personally think that the more that common tooling can be shared between scientific disciplines the better.