The Problem: ESA WorldCover#
This started with the ESA WorldCover dataset — a global 10m land cover product based on Sentinel-1 and Sentinel-2 data. The full dataset is 2,561 GeoTIFF files totaling 116 GB. I wanted to convert it into a single PMTiles archive for a web-based flight simulator. PMTiles stores all tiles in a single file served via HTTP range requests — no tile server required.
I started with rio-pmtiles, the recommended Python tool built on Rasterio and GDAL. After over a week of runtime, it still hadn’t finished. It consumed so much memory that I had to configure 512 GB of swap space just to keep it alive. So I started building a replacement in Go.
The Result#
Two weeks of on-and-off work later, geotiff2pmtiles converts the full dataset in under 16 hours. rio-pmtiles never finished — so geotiff2pmtiles isn’t just faster, it’s the difference between getting a result and not getting one. I had a working prototype after a few hours, then spent the rest adding support for multiple projections, source formats, and resampling methods.
Most of the processing time goes to zoom level 13 — the highest resolution. At a tile size of 512 px, that’s 45 million tiles. Here’s how each zoom level breaks down:
| Zoom | Tiles | Throughput | Duration |
|---|---|---|---|
| 13 | 45,572,096 | 1,069/s | 710m 14s |
| 12 | 11,395,072 | 1,852/s | 102m 33s |
| 11 | 2,850,816 | 1,667/s | 28m 30s |
| 10 | 712,704 | 1,504/s | 7m 53s |
| 9 | 178,688 | 1,369/s | 2m 10s |
| 8 | 44,800 | 1,235/s | 36s |
| 7 | 11,264 | 995/s | 11s |
| Total | 60,765,440 | 14h 12m 7s |

Auto-Detection#
When you point geotiff2pmtiles at a GeoTIFF, it examines the file structure and GDAL metadata to automatically configure the processing pipeline:
- Float GeoTIFFs (e.g. Copernicus DEM) → Terrarium encoding for elevation data
- Multi-band satellite data (Sentinel-2, PlanetScope, HLS, Google Earth Engine exports) → automatic band ordering (RGB from the right bands), linear or log rescaling from the detected value range
- Categorical rasters (e.g. ESA WorldCover) → mode resampling to preserve class values
The detection reads per-band DESCRIPTION and SCALE/OFFSET items from GDAL metadata (TIFF tag 42112), with fallback to dataset-level strings. For most common satellite products, you can just run geotiff2pmtiles input.tif output.pmtiles and get sensible output without specifying any flags.
Architecture#
geotiff2pmtiles processes zoom levels top-down:
- Max zoom — tiles are generated directly from Cloud Optimized GeoTIFF (COG) files via per-pixel inverse projection and resampling
- Lower zooms — each level is built by downsampling 2×2 groups from the level above, avoiding redundant source I/O
- Output — tiles flow into a two-pass PMTiles writer that handles deduplication, clustering, and directory construction
Memory-Mapped I/O#
The first constraint: handle 116 GB of input without loading it all into memory. The tool uses mmap to access TIFF tiles on demand — the OS manages paging, so only the portions currently being accessed reside in physical memory. Multiple workers read from the same mmap’d file concurrently via pread(2) without locking. Windows is not supported due to the reliance on POSIX mmap semantics.
COG Tile Cache#
Compressed TIFF tiles (JPEG, LZW, Deflate) are decoded on first access and cached as image.Image objects in a shared LRU cache, sized at concurrency × 128 entries. Cache keys are integer triplets (reader_id, ifd_level, tile_index) — switching from string keys eliminated 18% of CPU time.
Tiles are grouped into Hilbert-contiguous batches of 32 for processing. The Hilbert curve ordering keeps the 2D working set compact, so adjacent batches read from nearby source regions — improving cache hit rates compared to row-major or random ordering.
Pyramid Downsampling#
Lower zoom levels are built by combining 2×2 child tiles rather than re-reading source COGs. For elevation data (Terrarium encoding), the downsampler decodes RGB back to elevation values, averages valid samples, and re-encodes. Supported modes: bicubic (default), Lanczos-3, bilinear, nearest-neighbor, and mode (most common value — ideal for categorical data).
Disk-Backed Tile Store#
Encoded tiles (PNG, WebP, JPEG) are held in a hybrid memory/disk store. A dedicated I/O goroutine spills tiles to a temp file when memory usage approaches ~90% of system RAM. Uniform tiles — where every pixel is the same color — are stored as 4-byte objects and never spilled, eliminating a substantial share of disk I/O for datasets with large uniform regions.
Two-Pass PMTiles Writer#
Pass 1 — Collection: Tiles are appended to a temp file and hashed with FNV-64a for deduplication. For datasets with many uniform tiles, this eliminates 30–50% of tile writes.
Pass 2 — Finalization: Entries are sorted by Hilbert tile-ID for clustering (enabling efficient HTTP range-request access), then rewritten in sorted order with gzip-compressed metadata. The root directory is constrained to 16 KiB to fit in the initial HTTP fetch that PMTiles clients make.
Optimizations#
I profiled the tool in a loop — run benchmarks, analyze the profile, identify the bottleneck, apply a fix, repeat. Here’s what that process found:
| Optimization | CPU before | CPU after | Impact |
|---|---|---|---|
| Integer cache keys | 101s (18%) | < 1s | Eliminated string hashing |
| Type-specific pixel reads | 57s (10%) | 0s | Eliminated interface boxing |
| Remove madvise | 48s (9%) | 0s | Eliminated redundant syscalls |
| Per-tile source/level prep | 37s (6%) | < 1s | Eliminated per-pixel recomputation |
| Batch work queue | — | — | Improved parallelism from 3.2× to 3.3× |
| Total | 594s CPU, 188s wall | 248s CPU, 81s wall | 2.3× faster |
Beyond the profiling loop, several resampling-specific optimizations made a difference:
- LUT acceleration — runtime
sin()andpow()calls are replaced with precomputed lookup tables for resampling kernels and gamma correction. For Lanczos-3, each output pixel evaluates 36 kernel positions — a 1,024-entry LUT eliminates thosesin()calls entirely. Bicubic kernels and gamma correction use similar LUTs. - Lon/lat precomputation — since map tiles are axis-aligned, longitude varies only along x and latitude only along y. Precomputing per-axis arrays drops projection cost from O(n²) to O(n) per tile.
- Fast paths by image type — specialized implementations for YCbCr, RGBA, and Gray avoid the generic
image.Imageinterface dispatch. For Lanczos-3, a single-tile fast path skips per-pixel cache lookups when all 36 kernel positions fall within one source tile.
Where the remaining CPU time goes:
The remaining CPU time is dominated by fundamental work: page faults for reading mmap’d COG data (19%), writing output pixels (16%), and YCbCr→RGB color conversion (14%). These represent the actual cost of decoding JPEG tiles and compositing output images — not overhead.
Source Code#
The project is open source: pspoerri/geotiff2pmtiles. Pre-built binaries for Linux and macOS (amd64/arm64) are on the releases page.
If you’re working with large raster datasets and PMTiles, give it a try — I’d welcome feedback and contributions.
