-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Per-object costs can be tricky with S3 -- it's easy to mentally round costs less than 1/10th of a penny to zero, and then look up a few years later and realize you have hundreds of millions of things and can't afford to do anything with them.
When this bit us on a project I made a tool to solve our particular problem, which tars files, writes csv indexes, and can fetch individual files from the tars if need be.[1] Running on millions of files was janky enough that I also ended up scripting an orchestrator to repeatedly attempt each step of the pipeline.[2] Not tested on data other than ours but could be a useful starting point.
[1] https://github.com/harvard-lil/s3mothball
Yup. And uploading / downloading objects to S3 incurs tons of requests because S3 client does parallel chunking with a small number of other control requests.
Example from go sdk: https://github.com/aws/aws-sdk-go/blob/main/service/s3/s3man....