PROGRESS.md 19 KB

M01 — Monorepo skeleton (done)

Built: repo layout per SPEC §11, both Dockerfiles, compose stack, toolchain.

Notes for next milestone:

  • DB schema empty; M02 owns all tables and seeds.
  • entrypoint.sh for api supports migrate mode and calls vendor/bin/phinx.
  • Healthcheck payloads are stubs; later milestones extend them.
  • Service-token bootstrap deferred to M03 (needs api_tokens table first).
  • CI runs locally via ./scripts/ci.sh (Docker-based, no host PHP/Node needed). No GitHub Actions workflow per project decision.
  • composer.json config pins platform.php to 8.3 in both subprojects so dependency resolution matches the FrankenPHP runtime image even when the build host's composer:2 image ships a newer PHP.

Deviations from SPEC: none. Added dependencies beyond SPEC §2: none.

M02 — Database & migrations (done)

Built: all SPEC §4 tables; idempotent seeds; IP/CIDR value objects.

Schema notes for next milestone:

  • users.password_hash is NOT in the schema (per SPEC §4; UI owns local-admin credentials).
  • api_tokens.kind enum values: reporter, consumer, admin, service (CHECK constraint enforced on both SQLite and MySQL: kind=reporter→reporter_id set & consumer_id null; kind=consumer→consumer_id set & reporter_id null; kind∈{admin,service}→both null).
  • All timestamps stored UTC. ISO 8601 strings on SQLite, DATETIME(6) on MySQL. Default CURRENT_TIMESTAMP / CURRENT_TIMESTAMP(6) accordingly.
  • ip_bin always 16 bytes; v4 mapped to ::ffff:0:0/96. Use App\Domain\Ip\IpAddress::fromString() for normalization and Cidr::fromString() for subnets. Internally CIDRs store v4 prefixes as 96 + originalPrefix for unified containment math.
  • DBAL Connection is wired through App\App\Container::build() and applies the four SQLite PRAGMAs (journal_mode=WAL, synchronous=NORMAL, busy_timeout=5000, foreign_keys=ON) on every new SQLite connection.
  • Phinx migrations extend App\Infrastructure\Db\Migrations\BaseMigration for adapter-aware timestamp/binary column helpers. The phinxlog table is unaffected.

Decisions made:

  • FK ON DELETE semantics:
    • policy_category_thresholds.policy_id → CASCADE (thresholds belong to policy).
    • policy_category_thresholds.category_id → RESTRICT (cannot drop a category in active use).
    • consumers.policy_id → RESTRICT (cannot drop a policy in active use).
    • reporters/consumers/manual_blocks/allowlist.created_by_user_id → SET NULL (preserve provenance after user delete).
    • api_tokens.{reporter_id,consumer_id} → CASCADE (deleting a reporter/consumer revokes its tokens).
    • reports.{category_id,reporter_id} → RESTRICT (preserve audit trail per SPEC hint).
    • ip_scores.category_id → CASCADE (scores meaningless without their category).
  • api_tokens is created via raw CREATE TABLE per adapter so the CHECK constraint on kind works on SQLite (which cannot ADD CHECK via ALTER TABLE) and on MySQL.
  • BINARY(16) on MySQL is implemented as Phinx's portable binary type with limit => 16 (yields VARBINARY(16)); this is functionally identical for our fixed-width 16-byte payload and avoids per-adapter raw SQL.
  • Fixed an M01 bug in config/phinx.php where rtrim($path, '.sqlite') mangled the SQLite path because rtrim's second arg is a character set; switched to passing the full path verbatim with empty suffix.

Deviations from SPEC: none. Added dependencies: none beyond SPEC §2.

M03 — API auth foundations (done)

Built: token kinds, hashing, RBAC, impersonation pattern, auth endpoints, service token bootstrap.

API contract decisions:

  • 401 = bad/expired/revoked/wrong-kind token (uniform body {"error":"unauthorized"})
  • 403 = authenticated but wrong role
  • 400 = service token without (or malformed) X-Acting-User-Id header
  • last_used_at updated synchronously (move to async in M14 if perf demands)
  • /api/v1/auth/* is service-token-only with no impersonation — these endpoints exist to bootstrap user records the UI can later impersonate, so requiring impersonation would be circular. The controller enforces kind=service directly.
  • X-Acting-User-Id is silently ignored on non-service tokens (per SPEC §8); only its absence on a service token triggers 400.

Notes for next milestone:

  • Reporter and consumer tokens have no role column; their auth carries reporter_id / consumer_id only. Reading principal->reporterId from request attrs is how M04's report endpoint will identify the reporter.
  • Admin endpoints in later milestones can use RbacMiddleware::require($responseFactory, Role::Operator) etc. — the factory takes the role; the response factory is in the container.
  • AuthenticatedPrincipal carries an optional userId so M14 can introduce admin-token-bound-to-user without churn.

Schema deviation: api_tokens.role (nullable VARCHAR(32)) was added in migration 20260428130000_add_role_to_api_tokens.php. SPEC §4 doesn't enumerate it but SPEC §6 mandates that admin tokens carry a role; the column stores it. Non-admin token rows leave it NULL.

Token format: irdb_<kind3>_<32 base32 chars>, where kind3 is one of rep|con|adm|svc. 160 bits of entropy from random_bytes(20). The whole raw string is SHA-256 hashed for storage; token_prefix keeps the first 8 chars (irdb_<kind3>) for log readability. The .env.example documents how to generate a valid UI_SERVICE_TOKEN via TokenIssuer.

Service-token rotation: out of scope this milestone — ServiceTokenBootstrap only handles "set or not set". Rotation means: deploy with the new value, restart api, manually revoke the old hash via a future tool. The bootstrap logs a warning when it inserts a new service token while another already exists.

Added dependencies: none.

M04 — Token system & ingest (done)

Built: reporter/consumer/token CRUD; POST /api/v1/report end-to-end; rate limiter; decay functions.

Notes for next milestone:

  • Synchronous score updates are correct but only touch the (ip, category) pair just reported. Bulk decay re-application is M05's recompute job.
  • PairScorer (api/src/Domain/Reputation/PairScorer.php) is the authoritative single-pair scorer; the bulk recompute job in M05 should call into it (or a near-clone) so behavior stays consistent. It depends on Clock, CategoryRepository, and ReportRepository::forScoring().
  • Decay shapes live as pure functions in Decay::value(DecayFunction, ageDays, decayParam) with seven unit tests against hand-computed reference values. M05's recompute will reuse this.
  • Rate limiter is in-process (PHP array on a singleton RateLimiter); document this in README. Multi-replica deployments need a shared store. The bucket capacity is API_RATE_LIMIT_PER_SECOND × 2 with refill = API_RATE_LIMIT_PER_SECOND per second; on exhaustion the middleware emits 429 with Retry-After: 1. Skipped on admin/auth routes.
  • Service tokens cannot be created via the admin API (kind=service → 400) and are filtered out of the list endpoint unconditionally; only the bootstrap path makes them. Revoke on a service token returns 403 from DELETE /api/v1/admin/tokens/{id}.
  • Tokens raw value appears only in the create response payload (raw_token); we persist its SHA-256 hash and the 8-char prefix.
  • ip_scores upsert is per-driver: SQLite uses ON CONFLICT(ip_bin, category_id) DO UPDATE, MySQL uses ON DUPLICATE KEY UPDATE. Single helper in IpScoreRepository::upsert().
  • Clock interface (App\Domain\Time\Clock) wraps wall-time for received_at, decay age, and rate-limit refill. SystemClock in production; FixedClock (with advance()) in tests.

API contract decisions:

  • Admin endpoints (/api/v1/admin/{reporters,consumers,tokens}) require Admin role. RBAC is enforced via RbacMiddleware::require($rf, Role::Admin) on the route group.
  • Validation errors return 400 with {"error":"validation_failed","details":{"field":"reason"}}. Hand-rolled validators per controller — small surface, no third-party validator added.
  • DELETE on a reporter with existing reports returns 409 and flips is_active=false (soft delete) rather than removing the row; the audit trail is preserved per the FK RESTRICT semantics on reports.reporter_id.
  • Public POST /api/v1/report — wrong-kind tokens (admin/consumer/service) and inactive reporters both return 401 with the uniform {"error":"unauthorized"} envelope, matching the M03 convention. Bad IP / unknown category / oversized metadata return 400 with the validation envelope.
  • Metadata size limit: 4 KB after json_encode. Non-object metadata (arrays, scalars) is rejected.

Deviations from SPEC: none. Added dependencies: none (chose hand-rolled validation over respect/validation).

M05 — Reputation engine & jobs (done)

Built: decay math (already in M04, edge-cases re-verified); job framework with atomic locks (JobLockRepository), run history (JobRunRepository), runner abstraction (JobRunner), registry (JobRegistry); concrete jobs RecomputeScoresJob (full + incremental), CleanupAuditJob, EnrichPendingJob (skeleton); TickJob dispatcher; /internal/jobs/{recompute-scores,cleanup-audit,enrich-pending,tick,refresh-geoip,status} endpoints behind InternalNetworkMiddleware + InternalTokenMiddleware; CLI jobs:run, jobs:status, scores:rebuild.

Notes for next milestone:

  • PairScorer (from M04) is reused by RecomputeScoresJob — both produce identical scores for the same pair.
  • EnrichPendingJob is a skeleton — M11 fills it in.
  • refresh-geoip endpoint returns 412 with {"error":"not_implemented"} — M11 wires it up.
  • Job results are returned synchronously; long jobs may exceed default request timeout. The /internal/* routes need an extended max_execution_time in production FrankenPHP config (deferred — current default is sufficient for the recompute's 240s ceiling).
  • Drop rule: score < 0.01 AND last_report_at < now − 90 days. RecomputeScoresJob backdates last_report_at to now − 366 days for orphan ip_scores rows (no surviving reports) so the same drop pass prunes them.
  • triggered_by convention: HTTP /internal/jobs/* calls use 'schedule' (assumed cron-driven); CLI uses 'manual'. The admin-API wrapper in M12 will pass 'manual' through for UI button triggers.
  • TickJob takes a Closure(): iterable<Job> rather than a direct JobRegistry reference — needed to break a build-time cycle in PHP-DI (registry holds tick; tick iterates registry). The closure is invoked at run time.
  • JobsController resolves jobs via JobRegistry::get($name), and the registry is populated lazily in the container factory in registration order: recompute, cleanup, enrich, tick.
  • Lock owner format: <pid>/<random hex>. Release verifies owner matches before deleting — defensive against expires_at-reclaim races.
  • Internal token middleware fails closed when INTERNAL_JOB_TOKEN is empty — better than silently exposing endpoints to anything inside the docker network.

Deviations from SPEC: none. Added dependencies: none.

M06 — Manual blocks, allowlist (done)

Built: CRUD for manual_blocks and allowlist (single-IP and CIDR, v4 + v6); CidrEvaluator (in-process containment over a snapshot); CidrEvaluatorFactory (60s TTL cache + invalidate on writes); EffectiveStatusService (allowlist + manual; score+policy lands in M07); SPEC §M06 acceptance script passes end-to-end.

Notes for next milestone:

  • M07 wires CidrEvaluatorFactory into the distribution endpoint and finishes EffectiveStatusService by adding score-vs-policy evaluation. Inject CategoryRepository, IpScoreRepository, and the per-policy thresholds into the service alongside the existing evaluator.
  • Cache TTL is CIDR_EVALUATOR_TTL_SECONDS (default 60s); mutation endpoints invalidate explicitly and force a synchronous rebuild (get()) so an overlap WARNING fires inside the same request — operators see immediate feedback. Multi-replica deployments will see up to 60s of staleness across replicas — accepted.
  • Manual-block expiration cleanup: data model has expires_at, repo has findExpired($now) returning ids, but no job runs. Add in M14 hardening if desired, or leave as a documented limitation.
  • CIDR canonicalization picks recommendation (c) from the milestone doc: non-canonical input is silently normalized; the response body echoes normalized_from: <original> only when the normalization changed the input. Canonical input omits the field.
  • Repository inserts go through RepositoryBase::insertRow() for the binary-column ergonomics, but insertRow() returns executeStatement()'s row count — not the new id. The repos call (int) $this->connection()->lastInsertId() after insertRow() to recover the id. Same pattern ReportRepository::insert uses — kept consistent.
  • Cidr::fromBinary($networkBin, $unifiedPrefix) was added so repositories can hydrate stored rows back into the value object. The v4-vs-v6 heuristic mirrors what IpAddress::fromBinary does (v4-mapped IPv6 prefix + unified prefix ≥ 96 ⇒ render as v4).
  • CidrEvaluatorFactory is intentionally not finalEffectiveStatusServiceTest substitutes an in-memory stub via subclass to avoid spinning up the DB.
  • RBAC split per SPEC §6: list/show ⇒ Viewer, create/delete ⇒ Operator. Achieved with per-route RbacMiddleware::require(...) rather than group-level — a small departure from the all-Admin pattern used by reporters/consumers/tokens but the cleanest expression of "the same URL has different role requirements per method".

Deviations from SPEC: none. Added dependencies: none.

M07 — Policies & distribution (done)

Built: policy CRUD with thresholds (replaces wholesale on PATCH); GET /api/v1/blocklist (text/plain + JSON) with ETag/If-None-Match round-trip; per-policy in-memory cache (30s TTL, invalidated on relevant mutations); BlocklistBuilder with allowlist filtering, manual-block dedup (broader CIDR wins), v4-then-v6 stable sort; per-policy preview endpoint; perf test 50k entries <500 ms (SQLite + JIT).

Notes for next milestone:

  • Per-policy cache TTL = 30 s (BLOCKLIST_CACHE_TTL_SECONDS). Mutation endpoints invalidate explicitly: policy CRUD calls BlocklistCache::invalidate($policyId); manual_blocks / allowlist mutations call invalidateAll() (any policy might include manual blocks). Multi-replica deployments will see up to 30 s of cross-replica staleness — accepted, mirrors CidrEvaluatorFactory semantics.
  • The text/plain format is universal (one IP/CIDR per line, no comments). Firewall-specific consumers transform on their side; M13 ships examples in examples/consumers/.
  • DELETE on a policy with referencing consumers returns 409 with {"error":"policy_in_use","consumers":[{id,name},...]}. Cascade is wrong here per SPEC §M07.2.
  • Dedup rule: scored single-IPs covered by a manual subnet are dropped (the broader subnet entry covers them). For same-IP overlap (scored single AND manual single), the scored entry wins to keep category attribution.
  • Allowlist precedence: a manual subnet whose network address sits inside an allowlisted IP/subnet is dropped from the output. Manual single IPs on the allowlist are filtered too. The CidrEvaluator already logs a WARNING when the two lists overlap.
  • ETag stability: SHA-256 over the rendered body (excluding generated_at). Different content-types yield different ETags by design (text vs JSON have different bodies).
  • If-None-Match parsing handles weak validators (W/"…") and the wildcard *.
  • Policies controller's PATCH replaces the threshold set wholesale inside a single transaction (PolicyRepository::replaceThresholds — DELETE then INSERT). Field-level edits to name/description/include_manual_blocks happen alongside in the same request when present.
  • Threshold body shape: {<category_slug>: <number>}; the controller resolves slugs to category ids. Unknown slug returns a 400 with the offending slug in the error message.
  • BlocklistBuilder exposes the build via BlocklistCache::getOrBuild($policy); the public endpoint never builds directly. Preview endpoint bypasses the cache (calls the builder directly) so the UI sees fresh numbers after edits.
  • IpScoreRepository::findExceedingThresholds returns raw associative-array rows (not typed) — the BlocklistBuilder's hot loop casts on demand. Saves ~25 % off the perf budget at 50k rows.

Performance:

  • SPEC §M07.5 budget: 50k entries < 500 ms. Measured warm path on SQLite + opcache JIT (matches production FrankenPHP): 440–460 ms across 5 consecutive runs (median ~444 ms).
  • Without JIT (raw vendor/bin/phpunit --group perf) the same workload takes ~530 ms. The composer test-perf script enables JIT (-d opcache.enable_cli=1 -d opcache.jit_buffer_size=64M -d opcache.jit=tracing) so CI matches the production runtime.
  • Three key optimisations beat the budget: (a) subnets indexed by prefix length so containment is applyMaskFast + isset() rather than per-pair Cidr::contains(); (b) ksort on binary keys (one per family) instead of usort with a closure — closure dispatch dominates at 50k entries; (c) parallel hashes (ipText, categoriesByIp, maxScoreByIp) keyed on ip_bin instead of nested [] rows, so the row-merge loop avoids the per-iteration nested-array allocation.
  • MySQL number not yet measured — to be captured separately when the MySQL CI lane is wired up.

Schema: none — uses the M02 policies and policy_category_thresholds tables as-is.

Test surface added: tests/Unit/Reputation/PolicyEvaluatorTest.php, tests/Integration/Admin/PoliciesControllerTest.php, tests/Integration/Public/BlocklistControllerTest.php, tests/Integration/Reputation/BlocklistBuilderTest.php, tests/Integration/Perf/BlocklistPerfTest.php. Total +28 tests / +95 assertions; perf test excluded from default run via #[Group('perf')]. Suite passes 271 tests / 723 assertions, 0 deprecations.

Acceptance script: ran end-to-end against compose stack. Empty blocklist → 200 with empty body; manual block emits as CIDR; JSON format returns reason="manual"; ETag round-trip returns 304; admin token rejected with 401; preview endpoint returns count + sample for all three seeded policies.

Deviations from SPEC:

  • The migrate container's entrypoint runs Phinx migrations only; SPEC §10 says it should also run seeds. Pre-existing from M01, surfaced again here because M07's acceptance flow depends on the seeded policies. Worked around for the smoke test by running vendor/bin/phinx seed:run against the started container. Flagged for M13 polish (or earlier if another milestone is bitten by it).
  • composer test script now passes --exclude-group perf so the default suite is fast; perf is run via composer test-perf with JIT enabled to match production.
  • The PHPUnit doc-comment @group annotation was switched to the #[Group('perf')] attribute to silence a PHPUnit-12 deprecation warning.

Added dependencies: none.