Quickstart
cozip in Julia.
Write archives, read them back, query them straight from DuckDB. Copy, paste, run.
using Pkg Pkg.Registry.add(Pkg.RegistrySpec(url="https://github.com/asterisk-labs/AsteriskRegistry")) Pkg.activate(; temp=true) Pkg.add("Cozip")Write
Cozip.create takes a DataFrame (or Arrow.Table) with name and path columns. Anything else rides along as user-defined manifest columns.
using Cozip using DataFrames tmp = mktempdir() paths = String[] for i in 0:2 p = joinpath(tmp, "file_$(lpad(i, 4, '0')).bin") write(p, repeat("hello cozip\n", 1000)) push!(paths, p) end tbl = DataFrame( name = basename.(paths), path = paths, ) Cozip.create("dataset.zip", tbl)
Read
Cozip.read works on a local path or a remote URL with the same call. You get a DataFrame plus an injected cozip:gdal_vsi column ready for ArchGDAL or Rasters.jl.
using Cozip # local archive df = Cozip.read("dataset.zip") # or remote, no full download, two range requests under the hood df = Cozip.read( "https://huggingface.co/datasets/Major-TOM/Core-VIIRS-Nighttime-Light/" * "resolve/main/2024/MAJORTOM-VIIRS-NTL_2024_median_000.zip" ) # hand the cozip:gdal_vsi path straight to ArchGDAL using ArchGDAL dataset = ArchGDAL.read(df[1, "cozip:gdal_vsi"])
Query with DuckDB
The cozip community extension reads the same archive over SQL. Pair it with the DuckDB.jl package, or run it straight from the DuckDB CLI.
-- one-time install INSTALL cozip FROM community; LOAD cozip; -- hello world, first 10 entries of the manifest SELECT * FROM read_cozip('https://huggingface.co/datasets/Major-TOM/Core-VIIRS-Nighttime-Light/resolve/main/2024/MAJORTOM-VIIRS-NTL_2024_median_000.zip') LIMIT 10; -- raw manifest, without the injected /vsisubfile/ column SELECT * FROM read_cozip( 'https://huggingface.co/datasets/Major-TOM/Core-VIIRS-Nighttime-Light/resolve/main/2024/MAJORTOM-VIIRS-NTL_2024_median_000.zip', gdal_vsi := false ) LIMIT 10; -- filter the manifest, keep the /vsisubfile/ paths for the biggest tifs SELECT name, "cozip:gdal_vsi", size FROM read_cozip('https://huggingface.co/datasets/Major-TOM/Core-VIIRS-Nighttime-Light/resolve/main/2024/MAJORTOM-VIIRS-NTL_2024_median_000.zip') WHERE name LIKE '%.tif' ORDER BY size DESC LIMIT 5;