Open a ZIP like a table.

Still a ZIP. Now queryable. Filter the manifest first, then fetch only the bytes you need.

dataset.zip
-- query the manifest before reading any file
SELECT name, offset, size
FROM read_cozip('s3://bucket/dataset.zip')
WHERE name LIKE 'tile_%'
LIMIT 3;
nameoffsetsize
tile_0001.parquet128 KB8.2 MB
tile_0002.parquet8.5 MB8.4 MB
tile_0003.parquet16.9 MB8.1 MB
2reads to any file
0full ZIP downloads
100%ZIP compatible

Why cozip?

Ship archives that read like datasets.

cozip adds a Parquet-powered table of contents to ordinary ZIP archives. Filter, query, and jump straight to the bytes you need with one byte-range request.

Reads like Parquet.

cozip makes a ZIP behave like Parquet. Random access on every entry, queryable manifest, predicate pushdown, byte-range reads from S3 or GCS.

100% ZIP-spec compliant.

cozip conforms to APPNOTE 6.3.10. Every existing ZIP tool opens it without changes. unzip, libarchive, your browser, your collaborators.

Built on DuckDB and libzip.

Writers run on vendored libzip behind a stable C ABI. Readers run on a DuckDB C++ extension that speaks SQL on your ZIP. Python, R, and Julia bindings share the same core.

How it works

Two reads, then any file.

A tiny binary index at byte zero points to a Parquet manifest. Load it, query it locally, then fetch just the bytes you need.

How cozip reads work: first the cozip index at byte zero, then the Parquet manifest, then every file is a byte-range away.
  1. First read Fetch the cozip index at byte zero. It points to the Parquet manifest.
  2. Second read Load the manifest into memory. Query it locally with DuckDB.
  3. Fetch Pull only the byte ranges your query asked for.

Install

Pick your language.

Same archive on disk, four ways to write it. Each page is a focused quickstart.