The Starting Point: ESA WorldCover#
This project started with the ESA WorldCover dataset - a global land cover product at 10 m resolution based on Sentinel-1 and Sentinel-2 data. The full dataset consists of 2,561 GeoTIFF files totaling 116 GB. I wanted to convert it into a single PMTiles archive for use in a web-based flight simulator.
PMTiles is a single-file tile archive format designed for the cloud. Instead of hosting millions of individual tile files or running a tile server, you store one .pmtiles file on any static hosting (e.g. S3). Clients fetch tiles via HTTP range requests. There is no server-side logic required, which makes deployment trivially simple.
I used the recommended tool rio-pmtiles, a Python-based tool built on top of Rasterio and GDAL. I let it run for over a week - and it never finished. Worse, the process consumed so much memory that I had to configure 512 GB of swap space just to keep it alive. Thus I got frustrated and I decided to leverage Cursor to create a new tool based on Go.
geotiff2pmtiles#
Two weeks working on and off the project and $100 in Opus 4.6 tokens later I have geotiff2pmtiles that does the whole conversion in 16 hours. I already had a working solution after a couple of hours but I spent some time fleshing out the project. The project supports multiple projections, GeoTIFF source types, and a slew of resampling methods.
Most of the time spent computing the ESA WorldCover dataset is spent on zoom level 13 - the highest available resolution. With a tile size of 512 px an impressive 45 million tiles are computed.
Here’s how long each zoom level takes to process:
| Zoom | Tiles | Throughput | Duration |
|---|---|---|---|
| 13 | 45,572,096 | 1,069/s | 710m 14s |
| 12 | 11,395,072 | 1,852/s | 102m 33s |
| 11 | 2,850,816 | 1,667/s | 28m 30s |
| 10 | 712,704 | 1,504/s | 7m 53s |
| 9 | 178,688 | 1,369/s | 2m 10s |
| 8 | 44,800 | 1,235/s | 36s |
| 7 | 11,264 | 995/s | 11s |
| Total | 21,211,681 | 15h 49m 42s |
I could have had a tool that does the job after a single week, but I decided to add additional features and improve the performance.

Architecture#
geotiff2pmtiles reads Cloud Optimized GeoTIFF (COG) tiles on demand from disk, generates map tiles in parallel, and writes them into a PMTiles archive. The pipeline flows from the maximum zoom level downward: at the highest zoom, tiles are generated directly from the source COG files via per-pixel inverse projection and resampling. Each subsequent lower zoom level is built by downsampling 2x2 groups of tiles from the level above, avoiding redundant I/O. All generated tiles are collected into a two-pass PMTiles writer that handles deduplication, clustering, and directory construction.
Memory-Mapped I/O#
Instead of loading entire GeoTIFF files into memory, the tool uses memory-mapped I/O (mmap) to access TIFF tiles on demand. The operating system manages paging - only the portions of the file currently being accessed reside in physical memory. This keeps the memory footprint bounded regardless of input file size. Multiple workers can read from the same mmap’d file concurrently without locking, since pread(2) is inherently thread-safe.
This does mean that Windows is not supported - the tool relies on POSIX mmap semantics and runs on Linux and macOS only.
COG Tile Cache#
Raw TIFF tiles (JPEG, LZW, or Deflate compressed) are decoded on first access and cached as image.Image objects in a shared LRU cache. The cache is sized at concurrency × 128 entries - large enough that Hilbert-batched workers rarely evict tiles their neighbors still need, but small enough to avoid excessive memory use. Cache keys are integer triplets (reader_id, ifd_level, tile_index) - an early optimization that eliminated string-hashing overhead that was consuming 18% of CPU time.
For non-COG (strip-based) TIFFs, strips are promoted to virtual tiles at open time by merging small strips into blocks of at least 256 rows. This ensures that resampling kernels (up to 6x6 for Lanczos-3) never span more than a 2x2 tile grid.
Parallel Tile Generation#
Tiles are grouped into Hilbert-contiguous batches of 32 and distributed across a configurable worker pool. This batch size balances spatial locality (good cache reuse) with load distribution (minimal tail imbalance). Each worker performs per-pixel inverse projection and resampling from the cached COG tiles.
The Hilbert curve ordering is important: it keeps the 2D working set compact, so workers processing adjacent batches read from nearby regions of the source data. This dramatically improves COG tile cache hit rates compared to row-major or random ordering.
Pyramid Downsampling#
Lower zoom levels are generated by combining 2x2 child tiles into parent tiles, rather than re-reading from the source COGs. This avoids redundant I/O and resampling work. For elevation data (Terrarium encoding), the downsampler decodes RGB back to elevation values, averages valid samples, and re-encodes - ensuring correct interpolation even across tile boundaries.
A fast path handles single-channel (grayscale) data, which is about 15x faster than the general RGBA path since it can skip color-space conversions.
Different downsampling modes are supported:
- Bicubic (4x4 Catmull-Rom) - default, smooth and sharp
- Lanczos-3 (6x6) - highest quality, best for photographic data
- Bilinear - fast, reasonable quality
- Nearest-neighbor - no interpolation, preserves exact values
- Mode (most common value) - ideal for categorical/classified data where interpolation is meaningless
Disk-Backed Tile Store#
Encoded tiles (PNG, WebP, JPEG bytes - typically 5-25x smaller than raw pixels) are held in a hybrid memory/disk store. A dedicated I/O goroutine continuously spills encoded tiles to a temp file when memory usage approaches the configured limit (~90% of system RAM by default). Blocked workers wait on a sync.Cond until the I/O goroutine frees enough memory.
Uniform tiles - tiles where every pixel is the same color - are stored as 4-byte objects and never spilled to disk. For datasets like ESA WorldCover with large uniform regions (oceans, deserts), this optimization alone eliminates a significant fraction of disk I/O.
The temp file is published via an atomic pointer after the first write, allowing concurrent lock-free reads via pread(2) from worker goroutines that need to retrieve tiles for downsampling.
Two-Pass PMTiles Writer#
The PMTiles v3 archive is assembled in two passes:
Pass 1 - Collection: Tiles are appended to a temp file as they arrive. Each tile is hashed with FNV-64a; if a tile with the same hash and length already exists, the writer reuses the existing offset instead of writing duplicate data. For datasets with many uniform tiles, this deduplication can eliminate 30-50% of tile writes.
Pass 2 - Finalization: Entries are sorted by Hilbert tile-ID for clustering (this enables efficient HTTP range-request access in readers). Tile data is rewritten in sorted order to a new temp file, the directory structure is built, metadata is gzip-compressed, and the final archive is assembled: [Header] [Root Directory] [Metadata] [Leaf Directories] [Tile Data]. The root directory is constrained to 16 KiB to fit in the initial HTTP fetch that PMTiles clients make.
Resampling#
The resampling implementation went through several iterations to eliminate performance bottlenecks.
LUT-Accelerated Kernels#
Both the Lanczos-3 and bicubic kernels use precomputed lookup tables (1,024 entries each) instead of evaluating the kernel functions at runtime. The Lanczos-3 kernel involves sin(πx) · sin(πx/3) / (π²x²) - two sin() calls per evaluation - which was a measurable CPU cost. The LUT replaces these with a table lookup and linear interpolation between entries.
Similarly, gamma correction uses a 4,096-entry LUT that maps [0, 4095] to uint8 values via power-law encoding. The step size (~0.06) is below the uint8 quantization threshold, so there’s no visible quality loss.
Lon/Lat Precomputation#
The naive approach computes longitude and latitude for every pixel independently - O(n²) trigonometric operations per tile. Since map tiles are axis-aligned, longitude only varies along x and latitude only varies along y. By precomputing arrays of tileSize longitude and latitude values, the projection cost drops to O(n) per tile. The arrays are recycled via sync.Pool to avoid allocation overhead.
Fast Paths by Image Type#
Resampling kernels have specialized implementations for each source image type (YCbCr, NYCBCRA, RGBA, Gray). Instead of going through the generic image.Image interface - which involves interface method dispatch and boxing per pixel - the specialized paths type-assert once and then access pixel data directly via struct fields. This eliminated another 10% of CPU time that was spent on interface boxing.
For Lanczos-3, when all 36 kernel positions (6x6) fall within a single source tile - the common case - a single-tile fast path avoids per-pixel tile cache lookups entirely.
Profiling-Driven Optimizations#
To achieve the desired performance I let Cursor run and profile the tool in a loop. It then decided what to focus on and improve. Amongst other improvements I ended up with this list of optimizations:
| Optimization | CPU before | CPU after | Impact |
|---|---|---|---|
| Integer cache keys | 101s (18%) | < 1s | Eliminated string hashing |
| Type-specific pixel reads | 57s (10%) | 0s | Eliminated interface boxing |
| Remove madvise | 48s (9%) | 0s | Eliminated redundant syscalls |
| Per-tile source/level prep | 37s (6%) | < 1s | Eliminated per-pixel recomputation |
| Batch work queue | - | - | Improved parallelism from 3.2x to 3.3x |
| Total | 594s CPU, 188s wall | 248s CPU, 81s wall | 2.3x faster |
I also got this nice summary of where the remaining CPU time is spent:
The remaining CPU time is dominated by fundamental work: page faults for reading mmap’d COG data (19%), writing output pixels (16%), and YCbCr→RGB color conversion (14%). These represent the actual cost of decoding JPEG tiles and compositing output images - not overhead.
I should probably re-check the performance since I added a lot of features which I haven’t benchmarked yet.
Auto-Detection#
One feature I’m particularly happy with is the auto-detection system. When you point geotiff2pmtiles at a GeoTIFF, it examines the file structure and GDAL metadata to automatically configure the right processing pipeline:
- Float GeoTIFFs (e.g. Copernicus DEM) → Terrarium encoding for elevation data
- Multi-band satellite data (Sentinel-2, PlanetScope, HLS, Google Earth Engine exports) → automatic band ordering (RGB from the right bands), linear or log rescaling from the detected value range
- Categorical rasters (e.g. ESA WorldCover land cover) → mode resampling to preserve class values
The detection reads per-band DESCRIPTION and SCALE/OFFSET items from GDAL metadata (TIFF tag 42112), with fallback to dataset-level strings like "bands". For most common satellite products, you can just run geotiff2pmtiles input.tif output.pmtiles and get sensible output without specifying any flags.
pmtransform#
The project includes a companion tool, pmtransform, for modifying existing PMTiles archives without going back to the source GeoTIFFs. It operates in three modes:
- Passthrough - raw byte copy, no re-encoding. Used when only metadata changes.
- Re-encode - decode and re-encode tiles with a new format or quality setting. Useful for converting a JPEG tileset to WebP, or applying a fill color to empty tiles.
- Rebuild pyramid - full pyramid reconstruction from the max-zoom tiles. Required when changing zoom levels or resampling methods.
Since the PMTiles v3 header doesn’t store tile pixel dimensions, pmtransform decodes a sample tile at max zoom to discover the tile size before processing. Transform provenance is tracked in the PMTiles description field, so the full processing history is preserved across multiple transformations.
Source Code#
The project is open source and available on GitHub: pspoerri/geotiff2pmtiles.
Pre-built binaries for Linux and macOS (amd64/arm64) are available on the releases page. Windows is not supported due to the reliance on mmap.
