Open a ZIP like a table.

Still a ZIP. Now queryable. Filter the manifest first, then fetch only the bytes you need.

dataset.zip
-- query the manifest before reading any file
SELECT name, offset, size
FROM read_cozip('s3://bucket/dataset.zip')
WHERE name LIKE 'tile_%'
LIMIT 3;
nameoffsetsize
tile_0001.parquet128 KB8.2 MB
tile_0002.parquet8.5 MB8.4 MB
tile_0003.parquet16.9 MB8.1 MB
O(1)reads per file
0full ZIP downloads
100%ZIP compatible

Why cozip?

Ship archives that read like datasets.

cozip adds a Parquet-powered table of contents to ordinary ZIP archives. Query it in place, then fetch each file with a single byte-range request.

Reads like Parquet.

cozip makes a ZIP behave like Parquet. Random access on every entry, queryable manifest, predicate pushdown, byte-range reads from S3 or GCS.

100% ZIP-spec compliant.

cozip conforms to APPNOTE 6.3.10. Every existing ZIP tool opens it without changes. unzip, libarchive, your browser, your collaborators.

Built on DuckDB and libzip.

Writers run on vendored libzip behind a stable C ABI. Readers run on a DuckDB extension for Python, R, Julia, and SQL — plus a standalone JavaScript package for the browser and Node.

How it works

Bootstrap, query, fetch.

A tiny binary index at byte zero points to a Parquet manifest. Query it in place, then fetch only the files you need.

How cozip reads work: first the cozip index at byte zero, then the Parquet manifest, then every file is a byte-range away.
  1. Bootstrap Fetch the cozip index at byte zero. It points to the Parquet manifest.
  2. Query Query the Parquet manifest in place. Only the chunks your query touches are fetched.
  3. Fetch Pull only the files your query selected. One range request per file.