Python – Writing large ZIP archives without memory inflation

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • zipfly

    Python Zip Stream

  • zipstreamer

    Zip File Streaming Microservice - stream zip files on the fly

  • I built a similar project in Go, along with http server. Comes in handy for streaming (as you point out): https://github.com/scosman/zipstreamer

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • datasette

    An open source multi-tool for exploring and publishing data

  • This is really interesting.

    My https://datasette.io/ application offers features to export relational data, using Python asyncio under the hood. It can currently stream an arbitrarily large table out as CSV, which is a great format for this because it can be generated without buffering the entire thing in memory.

    I wrote a bit about that here: https://simonwillison.net/2021/Jun/25/streaming-large-api-re...

    zipfly makes me think that maybe I could do things like "stream all of the tables from this database as a zip file full of CSVs" - and have it work for giant databases again without using a great deal of memory.

  • python-zipstream

    Like Python's ZipFile module, except it works as a generator that provides the file in many small chunks. (by longaccess)

  • Sounds like you need this... a previous submission as linked elsewhere mentioned https://github.com/longaccess/python-zipstream/tree/streamin... which should do it.

  • StreamingUnzip

    Given the (end) chunk of a zip file (where you can find the zip file's "table of contents") you will be able to zip in a zip file and have this utility pipe the files out unzipped.

  • Interesting, for the read (decompression) case I wrote this a while back:

    https://github.com/d136o/StreamingUnzip

    Basically, if you have a big zip file with many files in it (csvs for example), you can pipe out the decompressed data…

    It’s a bit obtuse to use since it calls for the end chunk of a zip archive (it may come from s3 for example).

  • stream-unzip

    Python function to stream unzip all the files in a ZIP archive on the fly

  • Looks good! I've been thinking about making a writable version of https://github.com/uktrade/stream-unzip, but looks like you beat me to it!

    (Full disclosure: I'm the main developer of stream-unzip)

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts