-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
I'll happily recommend https://syncthing.net/, it's open source, end to end encrypted, and peer to peer (your machines send files directly to each other).
I imagine things like this are underway: https://github.com/jjuliano/aifiles
Honestly I think it's actually a pretty fantastic development if that's the direction. I don't use Dropbox much, nor have I used aifiles (due to sending all of your data to OpenAI) - but the idea of being able to not have to manually and tediously look over terabytes of files to completely reorganize all of your files into a better directory hierarchy, and tag each files with meaningful labels, etc sounds phenomenal.
Obviously there are some implementation details for this to be not awful - for example: 1) only local models for local data, 2) making the changes on e.g. ZFS (to allow rollback) or as some type of optional 'overlay' view to switch back and forth from the original to ai-organized, etc. and 3) having thresholds and logic for what may be considered 'duplicates' to be removed, and how to better compress data
As for the de-duplication and processing: this could be very good for dropbox in that, e.g. if a person wants to completely re-encode all of their image files or video files with AV1, the resulting data could be cut in half or more - which saves Dropbox storage space. After which, neural perceptual hashing could be done on all of the files and a threshold of similarity could do de-duplication on a perceptual basis (for example, keep the bigger size file that is 99% similar to a 2x downsized version, and re-encode it). User preference to keep things like tiff files completely intact or any other lossless encoding of their choosing could be good options as well
There's definitely a strange disparity between the computation cost for deploying a decent model to do this compared to the storage cost - but if a (perhaps even non-LLM!) small model is created to be able to plow through data at fast rates could be deployed it may make sense.
Or perhaps the type of semantic compression that LLMs do are of interest for making a new type of lossy compression algorithm of which Dropbox is interested.