Browse Source

fix: stream DB-IP gunzip with size cap (SEC_REVIEW F49)

`DbipDownloader::gunzip` previously did `gzdecode(file_get_contents(
$gzPath))` — both calls allocate the full decompressed payload
before anything looks at the result. A malicious or compromised
DB-IP endpoint (or a TLS-trust misconfiguration) could serve a tiny
gzip whose decompressed form is multi-GB, OOM-killing the api or
filling /data before MmdbVerifier saw the file.

Replace with a streaming `gzopen` / `gzread` loop that pulls
`GUNZIP_CHUNK_BYTES = 64 KiB` at a time and aborts the moment the
running total exceeds `MAX_DECOMPRESSED_BYTES = 400 MiB`. The cap
matches the MaxMind tarball total cap from F48 so both downloaders
agree on what "too big" looks like; real `dbip-country-lite-*.mmdb`
is ~10 MiB, so 400 MiB is generous against future growth while
bounding the worst-case at "no single gunzip can fill /data".

Peak memory under the new path is the 64 KiB chunk plus the gzip
inflate window — orders of magnitude below the previous
"two copies of the entire decompressed payload in RAM" pattern.

On cap breach OR any other gunzip error, the partial output file is
unlinked so the caller never sees a half-decoded MMDB on disk; the
gz input is left in place so the operator can inspect what was
served. (`gzclose` / `fclose` on both handles is wrapped in a
shared `$cleanup` closure so each error branch tears down the same
way.)

Refactored into:

  - `gunzip()` (private, production: uses MAX_DECOMPRESSED_BYTES).
  - `gunzipWithCap()` (public @internal, accepts the cap as a
    parameter so the unit test can drive it with small fixtures
    instead of building >400 MiB of test data).
  - `gunzipImpl()` (private static, the actual streaming loop).

Regression tests in `api/tests/Unit/Enrichment/DbipDownloaderTest.php`:

  - `testNormalGunzipPasses` — small plaintext gzip round-trips.
  - `testOutputOverCapIsRejectedAndCleanedUp` — 4 KiB plaintext
    fails under a 1 KiB cap; partial output file is unlinked; gz
    input is preserved for operator inspection.
  - `testEmptyGzipIsRejected` — gzipped empty string still fails.
  - `testMissingInputIsRejected` — gzopen on a nonexistent path.
  - `testLargeInputStreamsCorrectly` — 256 KiB plaintext (4 chunks)
    streams through the loop and accumulates byte-for-byte.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
chiappa 3 days ago
parent
commit
6580a5b3cd

+ 106 - 10
api/src/Infrastructure/Enrichment/Downloaders/DbipDownloader.php

@@ -23,6 +23,23 @@ final class DbipDownloader implements GeoIpDownloader
 {
     private const BASE_URL = 'https://download.db-ip.com/free';
 
+    /**
+     * SEC_REVIEW F49: cap on the total bytes we'll decompress out of a
+     * DB-IP gzip. Real `dbip-country-lite-*.mmdb` is ~10 MiB and
+     * `dbip-asn-lite-*.mmdb` is ~5 MiB; 400 MiB is generous against
+     * future growth while bounding the worst case at "no single
+     * gunzip can fill /data". Mirrors the MaxMind tarball total cap
+     * so both downloaders agree on what "too big" looks like.
+     */
+    private const MAX_DECOMPRESSED_BYTES = 400 * 1024 * 1024;
+
+    /**
+     * Stream chunk size for `gzread`. Decoupled from the cap so the
+     * peak memory footprint of a download is the chunk, not the
+     * decompressed file.
+     */
+    private const GUNZIP_CHUNK_BYTES = 64 * 1024;
+
     public function __construct(
         private readonly ClientInterface $http,
         private readonly Clock $clock,
@@ -105,23 +122,102 @@ final class DbipDownloader implements GeoIpDownloader
         );
     }
 
+    /**
+     * Streaming gunzip with a cap on the decompressed output
+     * (SEC_REVIEW F49). Reads {@see GUNZIP_CHUNK_BYTES} at a time so
+     * the peak memory footprint is the chunk, not the file; aborts
+     * before exceeding {@see MAX_DECOMPRESSED_BYTES}. A malicious or
+     * compromised DB-IP endpoint serving a gzip whose decompressed
+     * form is multi-GB no longer OOMs the api or fills the disk.
+     *
+     * Public-but-`@internal` so the unit test can drive it directly
+     * with a small fixture; production call site uses the default
+     * cap. Wraps the production `gunzip()` method so the real call
+     * site stays a single-arg path.
+     *
+     * @internal
+     */
+    public static function gunzipWithCap(string $gzPath, string $outPath, int $maxBytes): void
+    {
+        self::gunzipImpl($gzPath, $outPath, $maxBytes);
+    }
+
     private function gunzip(string $gzPath, string $outPath): void
     {
-        $compressed = @file_get_contents($gzPath);
-        if ($compressed === false) {
-            throw new DownloaderException(sprintf('cannot read downloaded gz at %s', $gzPath));
+        self::gunzipImpl($gzPath, $outPath, self::MAX_DECOMPRESSED_BYTES);
+    }
+
+    private static function gunzipImpl(string $gzPath, string $outPath, int $maxBytes): void
+    {
+        $in = @gzopen($gzPath, 'rb');
+        if ($in === false) {
+            throw new DownloaderException(sprintf('cannot gzopen %s', $gzPath));
         }
+        $out = @fopen($outPath, 'wb');
+        if ($out === false) {
+            gzclose($in);
+
+            throw new DownloaderException(sprintf('cannot fopen %s for write', $outPath));
+        }
+
+        $written = 0;
+        $cleanup = static function (mixed $in, mixed $out, string $outPath): void {
+            if ($in !== false) {
+                @gzclose($in);
+            }
+            if ($out !== false) {
+                @fclose($out);
+            }
+            // Don't leave a half-decoded file on disk: caller treats
+            // existence of `$outPath` as "verified ready to swap".
+            @unlink($outPath);
+        };
+
         try {
-            $plain = @gzdecode($compressed);
+            while (!gzeof($in)) {
+                $chunk = @gzread($in, self::GUNZIP_CHUNK_BYTES);
+                if ($chunk === false) {
+                    $cleanup($in, $out, $outPath);
+
+                    throw new DownloaderException(sprintf('gzread failed on %s', $gzPath));
+                }
+                if ($chunk === '') {
+                    continue;
+                }
+                $written += strlen($chunk);
+                if ($written > $maxBytes) {
+                    $cleanup($in, $out, $outPath);
+
+                    throw new DownloaderException(sprintf(
+                        'dbip gunzip output exceeds cap %d bytes',
+                        $maxBytes,
+                    ));
+                }
+                if (@fwrite($out, $chunk) === false) {
+                    $cleanup($in, $out, $outPath);
+
+                    throw new DownloaderException(sprintf('write to %s failed', $outPath));
+                }
+            }
         } catch (Throwable $e) {
-            throw new DownloaderException(sprintf('gzdecode failed for %s', $gzPath), 0, $e);
-        }
-        if ($plain === false || $plain === '') {
-            throw new DownloaderException(sprintf('gzdecode produced empty output for %s', $gzPath));
+            if (!$e instanceof DownloaderException) {
+                $cleanup($in, $out, $outPath);
+
+                throw new DownloaderException(sprintf('gunzip of %s failed', $gzPath), 0, $e);
+            }
+
+            throw $e;
         }
-        if (@file_put_contents($outPath, $plain) === false) {
-            throw new DownloaderException(sprintf('cannot write decoded mmdb to %s', $outPath));
+
+        @gzclose($in);
+        @fclose($out);
+
+        if ($written === 0) {
+            @unlink($outPath);
+
+            throw new DownloaderException(sprintf('gunzip produced empty output for %s', $gzPath));
         }
+
         @unlink($gzPath);
     }
 }

+ 127 - 0
api/tests/Unit/Enrichment/DbipDownloaderTest.php

@@ -0,0 +1,127 @@
+<?php
+
+declare(strict_types=1);
+
+namespace App\Tests\Unit\Enrichment;
+
+use App\Infrastructure\Enrichment\Downloaders\DbipDownloader;
+use App\Infrastructure\Enrichment\Downloaders\DownloaderException;
+use PHPUnit\Framework\TestCase;
+
+/**
+ * SEC_REVIEW F49 — `DbipDownloader::gunzipWithCap` streams via
+ * `gzopen`/`gzread` and aborts past a configurable cap so a
+ * compromised DB-IP endpoint serving a tiny gzip whose decompressed
+ * form is multi-GB cannot OOM the api or fill the disk.
+ */
+final class DbipDownloaderTest extends TestCase
+{
+    private string $tmpDir = '';
+
+    protected function setUp(): void
+    {
+        $this->tmpDir = sys_get_temp_dir() . '/irdb-dbip-' . bin2hex(random_bytes(6));
+        mkdir($this->tmpDir);
+    }
+
+    protected function tearDown(): void
+    {
+        if ($this->tmpDir !== '' && is_dir($this->tmpDir)) {
+            foreach (glob($this->tmpDir . '/*') ?: [] as $f) {
+                @unlink($f);
+            }
+            @rmdir($this->tmpDir);
+        }
+    }
+
+    public function testNormalGunzipPasses(): void
+    {
+        $gzPath = $this->writeGzip('hello world');
+        $outPath = $this->tmpDir . '/out.bin';
+
+        DbipDownloader::gunzipWithCap($gzPath, $outPath, maxBytes: 1024);
+
+        self::assertFileExists($outPath);
+        self::assertSame('hello world', file_get_contents($outPath));
+        // gz file is removed on success.
+        self::assertFileDoesNotExist($gzPath);
+    }
+
+    public function testOutputOverCapIsRejectedAndCleanedUp(): void
+    {
+        // 4 KiB of plaintext, cap at 1 KiB.
+        $gzPath = $this->writeGzip(str_repeat('A', 4096));
+        $outPath = $this->tmpDir . '/out.bin';
+
+        $threw = false;
+        try {
+            DbipDownloader::gunzipWithCap($gzPath, $outPath, maxBytes: 1024);
+        } catch (DownloaderException $e) {
+            $threw = true;
+            self::assertStringContainsString('exceeds cap 1024 bytes', $e->getMessage());
+        }
+        self::assertTrue($threw, 'expected DownloaderException for over-cap output');
+        // Partial output must NOT be left on disk — caller treats
+        // existence of `$outPath` as "verified ready to swap".
+        self::assertFileDoesNotExist($outPath);
+        // gz input was NOT cleaned up (caller decides whether to retry).
+        self::assertFileExists($gzPath);
+    }
+
+    public function testEmptyGzipIsRejected(): void
+    {
+        // gzip of an empty string produces a valid gzip with 0 bytes
+        // payload — the verifier downstream needs SOMETHING; an empty
+        // file is suspicious enough to bail on.
+        $gzPath = $this->writeGzip('');
+        $outPath = $this->tmpDir . '/out.bin';
+
+        $threw = false;
+        try {
+            DbipDownloader::gunzipWithCap($gzPath, $outPath, maxBytes: 1024);
+        } catch (DownloaderException $e) {
+            $threw = true;
+            self::assertStringContainsString('empty output', $e->getMessage());
+        }
+        self::assertTrue($threw);
+        self::assertFileDoesNotExist($outPath);
+    }
+
+    public function testMissingInputIsRejected(): void
+    {
+        $this->expectException(DownloaderException::class);
+        DbipDownloader::gunzipWithCap(
+            $this->tmpDir . '/does-not-exist.gz',
+            $this->tmpDir . '/out.bin',
+            maxBytes: 1024,
+        );
+    }
+
+    /**
+     * Streams >chunk-size of plaintext through the gunzip helper to
+     * prove the chunked read loop accumulates correctly.
+     */
+    public function testLargeInputStreamsCorrectly(): void
+    {
+        // 256 KiB of distinguishable bytes (4 × 64 KiB chunks).
+        $payload = str_repeat('abcd', 256 * 1024 / 4);
+        self::assertSame(256 * 1024, strlen($payload));
+        $gzPath = $this->writeGzip($payload);
+        $outPath = $this->tmpDir . '/big.bin';
+
+        DbipDownloader::gunzipWithCap($gzPath, $outPath, maxBytes: 512 * 1024);
+
+        self::assertFileExists($outPath);
+        self::assertSame($payload, file_get_contents($outPath));
+    }
+
+    private function writeGzip(string $plain): string
+    {
+        $path = $this->tmpDir . '/' . bin2hex(random_bytes(4)) . '.gz';
+        $compressed = gzencode($plain, 6);
+        self::assertNotFalse($compressed);
+        file_put_contents($path, $compressed);
+
+        return $path;
+    }
+}