ソースを参照

fix: assert TokenIssuer base32 input length, remove dead pad (SEC_REVIEW F39)

The review flagged the `if (strlen($chunk) < 5) { $chunk = str_pad($chunk,
5, '0'); }` branch in `TokenIssuer::base32Encode` as a trailing-bit
ambiguity: a base32 character whose unused low bit is set could decode
to the same prefix as a canonical one, so multiple distinct token
strings could in theory map to the same underlying bytes.

In practice the branch was unreachable. The only call site passes
`random_bytes(20)` = 160 bits, and 160 ÷ 5 = 32 with zero remainder —
every base32 character in the 32-char output carries a full 5 useful
bits and there is exactly one canonical encoding per input. The
review concern was a false positive triggered by reading the
defensive-looking padding code path that never fires for the actual
input length.

Codify the invariant by hard-asserting the input length up front and
removing the misleading dead branch:

  - New `TokenIssuer::ENTROPY_BYTES = 20` constant.
  - `base32Encode` throws `InvalidArgumentException` if the input
    isn't exactly 20 bytes — any future caller that drifts (e.g.
    `random_bytes(19)` would have made the original padding branch
    fire) crashes loudly rather than silently emitting a 31-char
    output or a 32-char output whose final character had unused
    bits.
  - The encoder loop is rewritten with literal `0..160 step 5` to
    make the "zero remainder, no trailing chunk" property explicit
    in the source rather than implicit in the input length.
  - Inline comment on the helper documents the math and references
    F39 so a future reviewer reading the same lines doesn't reach
    the same false-positive conclusion.

`Token::parse` keeps its `^[A-Z2-7]{32}$` body pattern; every 32-char
base32 string is canonical for this scheme so no additional
canonicality check at the parser is meaningful.

Regression test
`TokenIssuerTest::testIssuedBodyAlwaysExactlyThirtyTwoBase32Chars`
issues 100 admin tokens and asserts each body is exactly 32 chars and
fully matches the canonical base32 alphabet.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
chiappa 4 日 前
コミット
0c79c1bb2b

+ 27 - 9
api/src/Domain/Auth/TokenIssuer.php

@@ -15,28 +15,46 @@ final class TokenIssuer
 {
     private const BASE32_ALPHABET = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ234567';
 
+    /** Required input length for the base32 encoder. 20 × 8 = 160 = 32 × 5. */
+    private const ENTROPY_BYTES = 20;
+
     public function issue(TokenKind $kind): string
     {
-        return sprintf('irdb_%s_%s', $kind->code(), self::base32Encode(random_bytes(20)));
+        return sprintf('irdb_%s_%s', $kind->code(), self::base32Encode(random_bytes(self::ENTROPY_BYTES)));
     }
 
     /**
-     * Encodes 20 raw bytes (160 bits) as 32 base32 chars (no padding).
+     * Encodes exactly 20 raw bytes (160 bits) as 32 base32 chars.
+     *
+     * 160 ÷ 5 = 32 with zero remainder — every base32 char in the output
+     * carries 5 useful bits, there is no trailing-bit ambiguity, no
+     * padding, and exactly one canonical encoding per input. SEC_REVIEW
+     * F39 noted a "the trailing 5-bit chunk is zero-padded" concern; the
+     * concern was based on the previous unreachable `if (strlen($chunk)
+     * < 5)` branch in this method, which never fires for a 20-byte
+     * input. We now assert the length up front and refuse anything else
+     * — any future caller that passes the wrong length crashes loudly
+     * rather than silently emitting a non-canonical / shorter encoding.
      */
     private static function base32Encode(string $bytes): string
     {
+        if (strlen($bytes) !== self::ENTROPY_BYTES) {
+            throw new \InvalidArgumentException(sprintf(
+                'TokenIssuer::base32Encode requires exactly %d bytes; got %d',
+                self::ENTROPY_BYTES,
+                strlen($bytes),
+            ));
+        }
+
         $bits = '';
-        for ($i = 0, $n = strlen($bytes); $i < $n; ++$i) {
+        for ($i = 0; $i < self::ENTROPY_BYTES; ++$i) {
             $bits .= str_pad(decbin(ord($bytes[$i])), 8, '0', STR_PAD_LEFT);
         }
 
         $out = '';
-        for ($i = 0, $n = strlen($bits); $i < $n; $i += 5) {
-            $chunk = substr($bits, $i, 5);
-            if (strlen($chunk) < 5) {
-                $chunk = str_pad($chunk, 5, '0');
-            }
-            $out .= self::BASE32_ALPHABET[bindec($chunk)];
+        // 160 bits / 5 = 32 iterations exactly; no partial trailing chunk.
+        for ($i = 0; $i < 160; $i += 5) {
+            $out .= self::BASE32_ALPHABET[bindec(substr($bits, $i, 5))];
         }
 
         return $out;

+ 16 - 0
api/tests/Unit/Auth/TokenIssuerTest.php

@@ -43,4 +43,20 @@ final class TokenIssuerTest extends TestCase
             self::assertSame($raw, $parsed->raw);
         }
     }
+
+    public function testIssuedBodyAlwaysExactlyThirtyTwoBase32Chars(): void
+    {
+        // SEC_REVIEW F39: 20 bytes (160 bits) divides exactly by 5 → 32
+        // base32 chars with zero trailing-bit ambiguity. The previous dead
+        // `str_pad` branch in `base32Encode` (which prompted the F39
+        // finding) is gone, but the property it implied — every char
+        // carries 5 useful bits — must hold across many random samples.
+        $issuer = new TokenIssuer();
+        for ($i = 0; $i < 100; ++$i) {
+            $raw = $issuer->issue(TokenKind::Admin);
+            $body = substr($raw, strlen('irdb_adm_'));
+            self::assertSame(32, strlen($body), "issued body must be 32 chars: {$raw}");
+            self::assertSame(1, preg_match('/^[A-Z2-7]{32}$/', $body), "issued body must use canonical base32: {$raw}");
+        }
+    }
 }