Reads like Parquet.
cozip makes a ZIP behave like Parquet. Random access on every entry, queryable manifest, predicate pushdown, byte-range reads from S3 or GCS.
Still a ZIP. Now queryable. Filter the manifest first, then fetch only the bytes you need.
-- query the manifest before reading any file SELECT name, offset, size FROM read_cozip('s3://bucket/dataset.zip') WHERE name LIKE 'tile_%' LIMIT 3;
Why cozip?
cozip adds a Parquet-powered table of contents to ordinary ZIP archives. Filter, query, and jump straight to the bytes you need with one byte-range request.
cozip makes a ZIP behave like Parquet. Random access on every entry, queryable manifest, predicate pushdown, byte-range reads from S3 or GCS.
cozip conforms to APPNOTE 6.3.10. Every existing ZIP tool opens it without changes. unzip, libarchive, your browser, your collaborators.
Writers run on vendored libzip behind a stable C ABI. Readers run on a DuckDB C++ extension that speaks SQL on your ZIP. Python, R, and Julia bindings share the same core.
How it works
A tiny binary index at byte zero points to a Parquet manifest. Load it, query it locally, then fetch just the bytes you need.
Install
Same archive on disk, four ways to write it. Each page is a focused quickstart.