Our great sponsors
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
python-zipstream
Like Python's ZipFile module, except it works as a generator that provides the file in many small chunks. (by longaccess)
-
StreamingUnzip
Given the (end) chunk of a zip file (where you can find the zip file's "table of contents") you will be able to zip in a zip file and have this utility pipe the files out unzipped.
I built a similar project in Go, along with http server. Comes in handy for streaming (as you point out): https://github.com/scosman/zipstreamer
This is really interesting.
My https://datasette.io/ application offers features to export relational data, using Python asyncio under the hood. It can currently stream an arbitrarily large table out as CSV, which is a great format for this because it can be generated without buffering the entire thing in memory.
I wrote a bit about that here: https://simonwillison.net/2021/Jun/25/streaming-large-api-re...
zipfly makes me think that maybe I could do things like "stream all of the tables from this database as a zip file full of CSVs" - and have it work for giant databases again without using a great deal of memory.
Sounds like you need this... a previous submission as linked elsewhere mentioned https://github.com/longaccess/python-zipstream/tree/streamin... which should do it.
Interesting, for the read (decompression) case I wrote this a while back:
https://github.com/d136o/StreamingUnzip
Basically, if you have a big zip file with many files in it (csvs for example), you can pipe out the decompressed data…
It’s a bit obtuse to use since it calls for the end chunk of a zip archive (it may come from s3 for example).
Looks good! I've been thinking about making a writable version of https://github.com/uktrade/stream-unzip, but looks like you beat me to it!
(Full disclosure: I'm the main developer of stream-unzip)