M07-policies-and-distribution.md 11 KB

M07 — Policies & Distribution API

Fresh Claude Code agent prompt. M06 must be complete and committed. Estimated effort: medium.

Mission

Implement policy CRUD, the policy-vs-score evaluator, the public GET /api/v1/blocklist endpoint with caching/ETag/text-and-JSON formats, and a per-policy preview endpoint for the UI. By the end, three different policies produce three different blocklists from identical underlying data, and the endpoint serves 50k entries in <500 ms.

Before you start

  1. Verify M06:

    git log --oneline -6
    cd api && composer test && composer stan && cd ..
    
  2. Read SPEC.md §4 (policies, policy_category_thresholds), §5 (output rule for an IP appearing on a policy's blocklist), §6 (Public API: /api/v1/blocklist; Admin API: policies + preview).

  3. Confirm the seed policies from M02 exist with sensible thresholds.

Tasks

1. Policy domain

In api/src/Domain/Policy/:

  • Policy.php — value object: id, name, description, includeManualBlocks, thresholds: array<int, float> (categoryId => threshold).
  • PolicyEvaluator.php:
    • Constructor takes a Policy and the current CidrEvaluator from M06.
    • evaluate(IpAddress $ip, array $scoresByCategory): EvaluationResult — returns one of: EXCLUDED_BY_ALLOWLIST, INCLUDED_BY_MANUAL_BLOCK, INCLUDED_BY_SCORE (with the matching categories), or EXCLUDED.
    • The score-side rule: an IP is included if any category in the policy meets its threshold. policy_category_thresholds rows define inclusion; absent rows mean "this category is ignored by this policy."

In api/src/Infrastructure/Db/PolicyRepository.php:

  • CRUD over policies and policy_category_thresholds (the join is small; load thresholds eagerly with each policy).
  • byName(string): ?Policy, byId(int): ?Policy.
  • Concurrent threshold updates: replace all thresholds for a policy in a single transaction.

2. Admin endpoints

In api/src/Application/Admin/PoliciesController.php:

  • GET /api/v1/admin/policies
  • GET /api/v1/admin/policies/{id} — includes thresholds.
  • POST /api/v1/admin/policies — body {name, description, include_manual_blocks, thresholds: {<category_slug>: <number>}}.
  • PATCH /api/v1/admin/policies/{id} — same body shape; replaces thresholds wholesale.
  • DELETE /api/v1/admin/policies/{id} — refuse if any consumer references this policy (409 with {"error":"policy_in_use","consumers":[...]}); cascade is wrong here.
  • GET /api/v1/admin/policies/{id}/preview — returns {count: int, sample: [string], generated_at}. Sample = first 50 entries. Same calculation as the distribution endpoint.

RBAC: Admin for write, Viewer for read.

3. Distribution endpoint

In api/src/Application/Public/BlocklistController.php:

  • GET /api/v1/blocklist — token must be kind=consumer. Resolves the consumer's policy, evaluates, returns the blocklist.
  • Output formats:
    • Default: text/plain. One entry per line. No comments. Lines are bare IPs (203.0.113.42, 2001:db8::1) or CIDRs (203.0.113.0/24, 2001:db8::/32).
    • ?format=json: JSON array of {ip_or_cidr, categories: [string], score: number|null, reason: "scored"|"manual"}. Allowlisted IPs never appear in either format.
  • Headers (both formats):
    • ETag: SHA-256 hex of the response body. Honor If-None-Match304 with empty body.
    • X-Blocklist-Generated-At: ISO 8601.
    • X-Blocklist-Entries: count.
    • X-Blocklist-Policy: policy name.
  • Caching: 30-second per-policy in-memory cache (key: policyId). Cache invalidation triggers: any mutation to policies, policy_category_thresholds, manual_blocks, allowlist, or a manual flag from M12's "rebuild scores" trigger. For simplicity now, just TTL — invalidation hooks into mutations come for free if you respect the same CidrEvaluator invalidation pattern from M06.

4. Blocklist computation

In api/src/Domain/Reputation/BlocklistBuilder.php:

  • build(Policy $policy): Blocklist — returns a list of entries with metadata.
  • Algorithm:
    1. Read all ip_scores rows joined to categories where the score column meets at least one threshold for this policy. Single SQL query with a UNION across category thresholds, OR a simpler "select all, filter in PHP" if policy has few categories. Pick whichever is faster on a 50k-row dataset; benchmark.
    2. Filter out IPs in the allowlist (CidrEvaluator::isAllowlisted).
    3. If include_manual_blocks, append all manual block entries (single IPs and CIDRs), filtering allowlisted ones.
    4. Deduplicate (an IP might be both scored and manually blocked).
    5. Sort: IPv4 first, then IPv6; lexical within each. Stable order so the ETag is stable.
  • Returns entries with the exact representation needed for both formats.

Blocklist value object: a list of BlocklistEntry { ipOrCidr, isCidr, categories?, score?, reason }.

5. Performance

Add a perf test in api/tests/Integration/Perf/BlocklistPerfTest.php:

  • Seed 50k ip_scores rows (mixed v4 and v6, varied scores) plus 100 manual subnet blocks.
  • Time the blocklist build for the paranoid policy.
  • Assert <500 ms wall-clock.
  • Skip in default test runs (mark @group perf); run in CI as a separate job.

If you can't hit 500 ms, the bottleneck is almost certainly the SQL query. Options:

  • Add a covering index on ip_scores(category_id, score DESC) so threshold-filter scans are cheap.
  • Pre-aggregate per-IP "max score across all categories" into a derived column in ip_scores (mild denormalization). Out of scope unless 500ms is unreachable; document if you take this route.

Implementation notes

  • Cache vs eviction: per-policy 30s cache key by policy_id. Memory bound: if a deployment has 100 policies × 50k entries × ~50 bytes each, that's ~250 MB. Acceptable for default; flag in PROGRESS.md as a known footprint.
  • JSON format: keep it small. Don't include audit/timestamp fields per entry; that's what the admin API is for.
  • Empty blocklist: 200 with empty body in text mode, [] in JSON. Still emit ETag.
  • ETag stability: the ETag must depend only on the data, not on time. Don't include generated_at in the body.
  • If-None-Match: parse standard format including weak validators (W/"..."). Strict comparison on the strong hash is fine.
  • Deduplication subtlety: if an IP is in ip_scores AND inside a manually blocked /24, you have two ways to include it (single + subnet). Prefer the broader one (the /24 subnet entry covers the IP); drop the single entry to keep the list compact.
  • Subnet expansion: never expand a /16 to 65k entries. Emit as CIDR.

Out of scope (DO NOT)

  • UI changes — M08 onward.
  • Audit emission — M12.
  • Format generators for specific firewalls (iptables, nginx, HAProxy). The text/plain output is universal; per-firewall transformation is a client-side concern, with examples shipped in M13's examples/consumers/.
  • Compression (gzip) — let FrankenPHP/Caddy handle it via standard headers if needed; don't roll your own.
  • Streaming responses — buffered text response is fine at 50k entries.
  • New dependencies.

Acceptance

cd api && composer cs && composer stan && composer test && cd ..
cd api && vendor/bin/phpunit --group perf && cd ..

docker compose down -v
cp .env.example .env
docker compose up -d
sleep 15

ADMIN_TOKEN=$(docker compose exec -T api php bin/console auth:create-token --kind=admin --role=admin --quiet)

# Create a consumer + token (requires a policy_id; use the seeded "moderate")
POLICY_ID=$(curl -s -H "Authorization: Bearer $ADMIN_TOKEN" \
  http://localhost:8081/api/v1/admin/policies \
  | php -r '$j=json_decode(stream_get_contents(STDIN),true); foreach($j["items"] as $p){if($p["name"]==="moderate"){echo $p["id"];break;}}')
CONSUMER=$(curl -s -X POST -H "Authorization: Bearer $ADMIN_TOKEN" -H "Content-Type: application/json" \
  -d "{\"name\":\"firewall-1\",\"description\":\"edge\",\"policy_id\":$POLICY_ID}" \
  http://localhost:8081/api/v1/admin/consumers)
CONSUMER_ID=$(echo "$CONSUMER" | php -r 'echo json_decode(stream_get_contents(STDIN),true)["id"];')
TOKEN_RESP=$(curl -s -X POST -H "Authorization: Bearer $ADMIN_TOKEN" -H "Content-Type: application/json" \
  -d "{\"kind\":\"consumer\",\"consumer_id\":$CONSUMER_ID}" \
  http://localhost:8081/api/v1/admin/tokens)
CONSUMER_TOKEN=$(echo "$TOKEN_RESP" | php -r 'echo json_decode(stream_get_contents(STDIN),true)["raw_token"];')

# Empty blocklist initially
curl -s -H "Authorization: Bearer $CONSUMER_TOKEN" http://localhost:8081/api/v1/blocklist
# -> empty body, 200

# Insert a manual block; blocklist now contains it
curl -s -X POST -H "Authorization: Bearer $ADMIN_TOKEN" -H "Content-Type: application/json" \
  -d '{"kind":"subnet","cidr":"198.51.100.0/24","reason":"x"}' \
  http://localhost:8081/api/v1/admin/manual-blocks > /dev/null
sleep 1
curl -s -H "Authorization: Bearer $CONSUMER_TOKEN" http://localhost:8081/api/v1/blocklist | grep -q "198.51.100.0/24"

# JSON format
curl -s -H "Authorization: Bearer $CONSUMER_TOKEN" \
  "http://localhost:8081/api/v1/blocklist?format=json" | grep -q '"reason":"manual"'

# ETag round-trip
ETAG=$(curl -s -D - -H "Authorization: Bearer $CONSUMER_TOKEN" \
  http://localhost:8081/api/v1/blocklist -o /dev/null | grep -i '^etag:' | cut -d' ' -f2 | tr -d '\r')
test "$(curl -s -o /dev/null -w '%{http_code}' -H "Authorization: Bearer $CONSUMER_TOKEN" \
  -H "If-None-Match: $ETAG" http://localhost:8081/api/v1/blocklist)" = "304"

# Three policies, three different counts after seeding scored data
# (Seed at least one IP with a high enough score that paranoid catches it but strict doesn't.)
# Detailed seeding handled by an integration test; here just verify the preview endpoint differs:
for P in strict moderate paranoid; do
  PID=$(curl -s -H "Authorization: Bearer $ADMIN_TOKEN" http://localhost:8081/api/v1/admin/policies \
    | php -r "\$j=json_decode(stream_get_contents(STDIN),true); foreach(\$j['items'] as \$p){if(\$p['name']==='$P'){echo \$p['id'];break;}}")
  curl -s -H "Authorization: Bearer $ADMIN_TOKEN" \
    http://localhost:8081/api/v1/admin/policies/$PID/preview
  echo
done

# Token wrong kind: admin can't pull blocklist
test "$(curl -s -o /dev/null -w '%{http_code}' -H "Authorization: Bearer $ADMIN_TOKEN" \
  http://localhost:8081/api/v1/blocklist)" = "401"

docker compose down -v

Handoff

  1. Commit:

    feat(M07): policies, blocklist distribution endpoint
    
    - policy CRUD with thresholds (replaces wholesale on PATCH)
    - GET /api/v1/blocklist (text + json), ETag with If-None-Match round-trip
    - per-policy 30s cache, invalidated on relevant mutations
    - BlocklistBuilder with allowlist filtering and manual-block dedup
    - perf test: 50k entries < 500ms (sqlite)
    
  2. Append to PROGRESS.md:

    ## M07 — Policies & distribution (done)
    
    **Built:** policy CRUD, blocklist endpoint, preview endpoint, ETag, perf-tested at 50k entries.
    
    **Notes for next milestone:**
    - Per-policy cache TTL = 30s. Mutation endpoints invalidate the cache for affected policies.
    - The text/plain format is universal; firewall-specific consumers transform on their side. Examples land in M13.
    - DELETE on a policy with consumers returns 409 with the consumer list.
    - Performance: SQLite hits the 500ms target with [add measured number]. MySQL [add measured number].
    
    **Deviations from SPEC:** [list any, e.g. additional index added]
    **Added dependencies:** none.
    
  3. Stop. Do not start M08.