Parcourir la source

fix: cache rendered blocklist body+etag per format (SEC_REVIEW F70)

`BlocklistController` ran the full render-and-hash on every request:

  $blocklist = $cache->getOrBuild($policy);   // 30s-cached
  $body = $isJson ? renderJson($b) : renderText($b);  // every req
  $etag = '"' . hash('sha256', $body) . '"';          // every req

At 50k entries per blocklist that's a multi-megabyte json_encode +
sha256 per request. The per-token rate limit allows 60 req/s, so a
single consumer pinned a worker on this hash sustained, across both
format=text and format=json. The cache helped the BlocklistBuilder
DB step but did nothing for the render+hash hot path.

Extend `BlocklistCache` with `getRendered(Policy, format, callable
$render): array{body, etag, blocklist}`. The cache row gains a
`rendered` map keyed by format — first call populates it via the
caller-supplied render closure; subsequent calls within the cache
window return the cached `body` + `etag` verbatim. `getOrBuild` is
preserved for callers that want only the domain object. The
existing `invalidate($policyId)` / `invalidateAll()` contract drops
the rendered entries alongside the build (they live in the same
row), so the policy/manual_blocks/allowlist mutation paths don't
need to change.

`BlocklistController` swaps `getOrBuild + render + hash` for a
single `getRendered` call with two static render closures (one per
format). The render+hash now runs at most once per (policy, format)
per cache window per replica, regardless of incoming request rate.

Per-consumer format constraint (the SEC_REVIEW's recommendation)
is still feasible as a future tightening, but isn't necessary to
defeat the F70 amplifier — the cache already does, and dropping
JSON for some consumers is a separate UX concern that can land
under its own finding if needed.

Regression tests in
`api/tests/Integration/Public/BlocklistControllerTest.php`:

  - `testBlocklistRenderIsCachedAcrossRequests` posts two
    sequential `?format=json` requests and asserts byte-identical
    bodies + ETags (cache hit on the second).
  - `testTextAndJsonRendersBothCachedIndependently` alternates
    text and json requests; each format's render is cached
    independently and the text/json ETags must differ (different
    bodies).

The 8 pre-existing BlocklistController tests (admin-rejected,
ETag round-trip, allowlist-excludes, three-policies-differ, etc.)
all keep passing — the public contract is unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
chiappa il y a 3 jours
Parent
commit
551cb90a30

+ 16 - 4
api/src/Application/Public/BlocklistController.php

@@ -75,14 +75,26 @@ final class BlocklistController
             return self::error($response, 500, 'consumer_policy_missing');
         }
 
-        $blocklist = $this->blocklistCache->getOrBuild($policy);
-
         $params = $request->getQueryParams();
         $isJson = isset($params['format']) && $params['format'] === 'json';
-        $body = $isJson ? self::renderJson($blocklist) : self::renderText($blocklist);
+        $format = $isJson ? 'json' : 'text';
         $contentType = $isJson ? 'application/json' : 'text/plain';
 
-        $etag = '"' . hash('sha256', $body) . '"';
+        // SEC_REVIEW F70: render-and-hash once per (policy, format)
+        // per cache window. Without this, every request paid the
+        // full sha256-of-50k-entries cost regardless of cache state
+        // — at the per-token rate limit (60 req/s) a single
+        // consumer pinned a worker on this hash sustained.
+        $rendered = $this->blocklistCache->getRendered(
+            $policy,
+            $format,
+            $isJson
+                ? static fn (Blocklist $b): string => self::renderJson($b)
+                : static fn (Blocklist $b): string => self::renderText($b),
+        );
+        $blocklist = $rendered['blocklist'];
+        $body = $rendered['body'];
+        $etag = $rendered['etag'];
 
         $ifNoneMatch = self::extractIfNoneMatch($request);
         $notModified = $ifNoneMatch !== null && self::etagMatches($ifNoneMatch, $etag);

+ 55 - 1
api/src/Infrastructure/Reputation/BlocklistCache.php

@@ -28,7 +28,7 @@ use App\Domain\Time\Clock;
 class BlocklistCache
 {
     /**
-     * @var array<int, array{blocklist: Blocklist, expires_at: float}>
+     * @var array<int, array{blocklist: Blocklist, expires_at: float, rendered: array<string, array{body: string, etag: string}>}>
      */
     private array $cache = [];
 
@@ -52,12 +52,66 @@ class BlocklistCache
             $this->cache[$policy->id] = [
                 'blocklist' => $blocklist,
                 'expires_at' => $now + $this->ttlSeconds,
+                'rendered' => [],
             ];
         }
 
         return $blocklist;
     }
 
+    /**
+     * SEC_REVIEW F70: cache the rendered body and its sha256 ETag per
+     * (policy_id, format), not just the underlying `Blocklist` domain
+     * object. Without this, every cache hit rebuilt the rendered
+     * string and re-hashed it — at 50k entries per blocklist that's
+     * a multi-megabyte sha256 per request, sustained at the
+     * per-token rate limit (60 req/s). Now the render+hash runs
+     * once per (policy, format) per cache window; subsequent
+     * requests within the window reuse the cached `body` + `etag`
+     * verbatim and the per-request cost collapses to "send the
+     * bytes, return the headers".
+     *
+     * `$render` is invoked at most once per cache miss and receives
+     * the (possibly freshly-built) `Blocklist` object so the
+     * controller can keep its rendering logic colocated.
+     *
+     * @param callable(Blocklist): string $render
+     * @return array{body: string, etag: string, blocklist: Blocklist}
+     */
+    public function getRendered(Policy $policy, string $format, callable $render): array
+    {
+        $now = (float) $this->clock->now()->getTimestamp();
+        $entry = $this->cache[$policy->id] ?? null;
+        $valid = $entry !== null && $now < $entry['expires_at'];
+
+        if (!$valid) {
+            $blocklist = $this->builder->build($policy);
+            $entry = [
+                'blocklist' => $blocklist,
+                'expires_at' => $now + max(0, $this->ttlSeconds),
+                'rendered' => [],
+            ];
+        }
+
+        if (!isset($entry['rendered'][$format])) {
+            $body = $render($entry['blocklist']);
+            $entry['rendered'][$format] = [
+                'body' => $body,
+                'etag' => '"' . hash('sha256', $body) . '"',
+            ];
+        }
+
+        if ($this->ttlSeconds > 0) {
+            $this->cache[$policy->id] = $entry;
+        }
+
+        return [
+            'body' => $entry['rendered'][$format]['body'],
+            'etag' => $entry['rendered'][$format]['etag'],
+            'blocklist' => $entry['blocklist'],
+        ];
+    }
+
     public function invalidate(int $policyId): void
     {
         unset($this->cache[$policyId]);

+ 50 - 0
api/tests/Integration/Public/BlocklistControllerTest.php

@@ -222,4 +222,54 @@ final class BlocklistControllerTest extends AppTestCase
         $stmt->bindValue('now', (new \DateTimeImmutable('now'))->format('Y-m-d H:i:s'));
         $stmt->executeStatement();
     }
+
+    public function testBlocklistRenderIsCachedAcrossRequests(): void
+    {
+        // SEC_REVIEW F70: the rendered body and its sha256 ETag are
+        // cached per (policy_id, format) per cache window. Two
+        // sequential requests within the same window should produce
+        // byte-identical bodies and ETags. (Cache hit → no rebuild,
+        // no rehash; the test asserts the steady-state contract.)
+        $token = $this->setupConsumerToken('moderate');
+
+        $r1 = $this->request('GET', '/api/v1/blocklist?format=json', [
+            'Authorization' => 'Bearer ' . $token,
+        ]);
+        $r2 = $this->request('GET', '/api/v1/blocklist?format=json', [
+            'Authorization' => 'Bearer ' . $token,
+        ]);
+
+        self::assertSame(200, $r1->getStatusCode());
+        self::assertSame(200, $r2->getStatusCode());
+        self::assertSame((string) $r1->getBody(), (string) $r2->getBody());
+        self::assertSame($r1->getHeaderLine('ETag'), $r2->getHeaderLine('ETag'));
+    }
+
+    public function testTextAndJsonRendersBothCachedIndependently(): void
+    {
+        // SEC_REVIEW F70: switching format must not invalidate the
+        // other format's cached render. Each format has its own
+        // entry in the cache.
+        $token = $this->setupConsumerToken('moderate');
+
+        $textA = $this->request('GET', '/api/v1/blocklist', [
+            'Authorization' => 'Bearer ' . $token,
+        ]);
+        $jsonA = $this->request('GET', '/api/v1/blocklist?format=json', [
+            'Authorization' => 'Bearer ' . $token,
+        ]);
+        $textB = $this->request('GET', '/api/v1/blocklist', [
+            'Authorization' => 'Bearer ' . $token,
+        ]);
+        $jsonB = $this->request('GET', '/api/v1/blocklist?format=json', [
+            'Authorization' => 'Bearer ' . $token,
+        ]);
+
+        self::assertSame((string) $textA->getBody(), (string) $textB->getBody());
+        self::assertSame($textA->getHeaderLine('ETag'), $textB->getHeaderLine('ETag'));
+        self::assertSame((string) $jsonA->getBody(), (string) $jsonB->getBody());
+        self::assertSame($jsonA->getHeaderLine('ETag'), $jsonB->getHeaderLine('ETag'));
+        // Text and JSON ETags must DIFFER (different bodies).
+        self::assertNotSame($textA->getHeaderLine('ETag'), $jsonA->getHeaderLine('ETag'));
+    }
 }