# M14 — Security Hardening

> Fresh Claude Code agent prompt. M13 must be complete and committed.
> Estimated effort: medium.

## Mission

Harden both containers: security headers, full brute-force lockout for local admin, audit secret-scrubbing in logs, token entropy verification, backup guidance verification, expired-manual-block cleanup. By the end, a security review checklist passes.

## Before you start

1. Verify M13:
   ```bash
   git log --oneline -13
   ```
2. Read `SPEC.md` §8 (auth, especially CSRF/sessions), §10 (backup notes), §12 M14 (the hardening milestone — your reference).
3. The OWASP top 10 is a useful mental model for this milestone. Don't take it as a checklist; do treat it as "did I think about each of these?"

## Tasks

### 1. Security headers

In both `api` and `ui` Caddy configs, add a header bundle on every response:

- `Strict-Transport-Security: max-age=31536000; includeSubDomains` — only when `APP_ENV=production`. Don't HSTS in dev or you'll lock yourself out of localhost.
- `X-Content-Type-Options: nosniff`
- `X-Frame-Options: DENY` (UI) / `X-Frame-Options: SAMEORIGIN` (api)
- `Referrer-Policy: strict-origin-when-cross-origin`
- `Permissions-Policy: geolocation=(), microphone=(), camera=()`
- **CSP for the UI**:
  - `default-src 'self'`
  - `script-src 'self' 'wasm-unsafe-eval'` (Alpine doesn't need `unsafe-eval`; only allow it if a build dep demands it)
  - `style-src 'self' 'unsafe-inline'` (Tailwind compiled, but inline styles for dynamic things like score bars)
  - `img-src 'self' data:` (data: for tiny inline icons)
  - `connect-src 'self' <API_BASE_URL>` if the UI ever does direct browser→api calls (it doesn't today; but htmx might add one)
  - `frame-ancestors 'none'`
  - `base-uri 'self'`
  - `form-action 'self'`
  - Test that the UI doesn't violate its own CSP. Run a browser, check the console, fix any violations by either tightening the page's HTML or relaxing CSP minimally with comment justification.

- **CSP for the api**: very restrictive (`default-src 'none'; frame-ancestors 'none'`) since the api serves only JSON, the OpenAPI viewer, and YAML. The `/api/docs` page does need styles+scripts for RapiDoc/Elements; relax CSP only on that route.

### 2. Local admin brute-force lockout (full)

Replace M08's basic 5/30s throttle with a persistent lockout:

- Track failed attempts per `(LOCAL_ADMIN_USERNAME, source_ip)` pair in a small in-memory store (singleton service in the ui container) plus the session.
- Failure progression: 1–4 failures fast retry; 5 failures → 1-minute lockout; 10 → 5-minute lockout; 15+ → 30-minute lockout. Reset the counter on a successful login.
- Lock by username AND by IP separately so attackers can't lock out the legitimate admin from another IP.
- Log every failure at WARN, every lockout at ERROR, with the source IP. Don't log the attempted password.
- Document in `doc/auth-flows.md` (update from M13) — including how to clear a lockout (restart the ui container, since this is in-memory; the lockout is intentionally short enough that this is rarely needed).

### 3. Token entropy verification

In `api/tests/Unit/Auth/`:

- `TokenEntropyTest.php` — generates 1000 tokens, asserts ≥160 bits of unique randomness (in practice, all-distinct).
- Verifies the format `irdb_<3>_<32 base32 chars>`.
- Confirms `random_bytes` (CSPRNG) is the source.

### 4. Logs scrubbed of secrets

- Audit all log output paths. Search the codebase for places that might log:
  - Bearer tokens (any `Authorization` header content).
  - `LOCAL_ADMIN_PASSWORD_HASH`.
  - `OIDC_CLIENT_SECRET`.
  - `MAXMIND_LICENSE_KEY`.
  - Database passwords.
- Add a Monolog processor that scrubs known-sensitive keys from the context array before formatting. Pattern:
  ```
  ['authorization' => 'Bearer abc...'] → ['authorization' => 'Bearer ***']
  ```
- Add a test that constructs a log record with a Bearer token in context and asserts the formatted output is scrubbed.

### 5. Expired manual block cleanup

A small loose end from M06: manual blocks have `expires_at` but nothing prunes expired ones. Two approaches:

- **Filter at read time**: every read of `manual_blocks` ignores rows with `expires_at < now`. The CidrEvaluator already could do this — verify and fix if not. Pros: zero new infrastructure. Cons: rows accumulate.
- **Add a cleanup job**: register `CleanupExpiredManualBlocksJob` that deletes them daily.

Recommended: do both. Filter at read for correctness, prune in a daily job for tidiness.

If adding a job: register it, add an audit entry per delete, verify with a test.

### 6. Rate limiting beyond the public API

- The current rate limiter applies only to public API endpoints. Add a soft limit to login attempts on the UI (covered by §2 above).
- Consider whether admin endpoints need a limit. Real abuse on admin endpoints is rare (Bearer-authed humans/UI). Leave admin unrated unless you measure a problem.
- Document the rate-limit posture in `doc/api-overview.md` (update from M13).

### 7. Backups

Verify M13's README has clear instructions for:

- **SQLite + Docker volume**: `docker run --rm -v irdb-data:/data -v $(pwd):/backup alpine tar czf /backup/irdb-backup.tar.gz -C /data .` — describe the equivalent restore.
- **MySQL**: `mysqldump` example via `docker compose exec`.
- **Restore**: the inverse, with the api container stopped during restore.
- **What to NOT back up**: rotating tokens (they're recoverable), GeoIP DBs (re-downloadable).

Add to `doc/architecture.md` (update from M13): a "Disaster Recovery" subsection covering the same.

### 8. Secrets at rest verification

- Confirm tokens are never stored in plaintext (M03 work; verify with a manual SQL inspection).
- Confirm no secret values appear in `audit_log.payload`.
- Confirm `/api/v1/admin/config` masks all the secrets it should (M12).
- Add a regression test that scans the schema for any column literally named `password` or containing `_secret` and asserts none store unhashed values (best-effort sanity check).

### 9. Dependency vulnerability scan

- Add a CI job: `composer audit` (PHP) and `npm audit --omit=dev` (UI). Fail on critical/high.
- Document the policy: when an audit fails, an admin reviews and either patches or accepts with a documented exception.

### 10. Final security review checklist

Add `doc/security.md` capturing the actual posture: authn, authz, transport, data at rest, secrets management, logging, rate limits, supply chain. Concrete, factual, ≤300 lines. Do **not** make claims you can't back up.

## Implementation notes

- **CSP iteration**: enable in "Report-Only" mode first if you want a faster cycle (`Content-Security-Policy-Report-Only`), check the browser console, then switch to enforcing.
- **HSTS gotcha**: HSTS is sticky in browsers. If you turn it on in dev with `localhost`, you may break local development for yourself. Gate strictly on `APP_ENV=production`.
- **Brute-force lockout vs UX**: too aggressive = legit admins lock themselves out. The 1/5/30 progression is moderate. Don't go to "permanent ban" — the local admin path is a recovery channel, not a daily-use channel.
- **Auditing the auditor**: changes to `audit_log` config (retention, etc.) should themselves be audited. Verify the M12 emitter wraps any settings endpoint that touches audit retention.
- **Don't introduce new attack surface in the name of "hardening"**: e.g., don't add a "lockout-clear" endpoint reachable from the API. Reset is via container restart; that's safer.

## Out of scope (DO NOT)

- WAF rules, IPS integration, fail2ban for the admin UI itself. Out of scope.
- 2FA on local admin. Use OIDC for that.
- mTLS between containers. The Docker network isolation is the trust boundary; documenting that is enough.
- Penetration test report. The agent is not a pentester.
- Encryption at rest of the SQLite file. The volume's host-level disk encryption is the right layer.
- Audit log signing / tamper-evidence. Future work.

## Acceptance

```bash
cd api && composer cs && composer stan && composer test && cd ..
cd ui  && composer cs && composer stan && composer test && cd ..

# composer + npm audit
cd api && composer audit && cd ..
cd ui  && npm ci && npm audit --omit=dev && cd ..

docker compose down -v
cp .env.example .env
docker compose up -d
sleep 15

# Security headers present on UI
HEADERS=$(curl -sI http://localhost:8080/login)
echo "$HEADERS" | grep -qi "X-Content-Type-Options: nosniff"
echo "$HEADERS" | grep -qi "X-Frame-Options: DENY"
echo "$HEADERS" | grep -qi "Content-Security-Policy:"
echo "$HEADERS" | grep -qi "Referrer-Policy:"

# Headers on API
HEADERS=$(curl -sI http://localhost:8081/healthz)
echo "$HEADERS" | grep -qi "X-Content-Type-Options: nosniff"
echo "$HEADERS" | grep -qi "X-Frame-Options:"

# In production mode, HSTS appears (skip if not testing prod)
# HEADERS=$(APP_ENV=production curl -sI ...) — manual

# Local admin lockout: 5 fails should trigger lockout
COOKIE=$(mktemp)
for i in 1 2 3 4 5; do
  CSRF=$(curl -s -c $COOKIE http://localhost:8080/login | grep -oE 'name="csrf_token" value="[^"]+"' | cut -d'"' -f4)
  curl -s -b $COOKIE -c $COOKIE -X POST \
    -d "csrf_token=$CSRF&username=admin&password=WRONG" \
    http://localhost:8080/login/local > /dev/null
done
CSRF=$(curl -s -c $COOKIE http://localhost:8080/login | grep -oE 'name="csrf_token" value="[^"]+"' | cut -d'"' -f4)
RESP=$(curl -s -b $COOKIE -c $COOKIE -X POST \
  -d "csrf_token=$CSRF&username=admin&password=test1234" \
  http://localhost:8080/login/local -L)
echo "$RESP" | grep -qi "locked\|too many\|wait"

# Bearer tokens never appear unmasked in logs
docker compose logs 2>&1 | grep -E "Bearer irdb_(rep|con|adm|svc)_[A-Z2-7]+" && \
  { echo "TOKEN LEAKED IN LOGS"; exit 1; } || true

# Token entropy test passes
cd api && vendor/bin/phpunit --filter TokenEntropyTest && cd ..

# Expired manual block test (insert one with a past expires_at, run cleanup, verify it's gone or filtered)
ADMIN_TOKEN=$(docker compose exec -T api php bin/console auth:create-token --kind=admin --role=admin --quiet)
INTERNAL_TOKEN=$(grep ^INTERNAL_JOB_TOKEN= .env | cut -d= -f2)
curl -s -X POST -H "Authorization: Bearer $ADMIN_TOKEN" -H "Content-Type: application/json" \
  -d '{"kind":"ip","ip":"203.0.113.250","reason":"expired test","expires_at":"2020-01-01T00:00:00Z"}' \
  http://localhost:8081/api/v1/admin/manual-blocks > /dev/null
# Run cleanup if you added a job; otherwise just verify the read-time filter:
curl -s -H "Authorization: Bearer $ADMIN_TOKEN" \
  http://localhost:8081/api/v1/admin/manual-blocks | grep -v "203.0.113.250"

# Quick CSP smoke test: load the UI in headless chrome (manual or via puppeteer in CI), no CSP violations
# (omit if no headless browser available; rely on developer manual verification)

docker compose down -v
```

## Handoff

1. Commit:
   ```
   feat(M14): security hardening

   - CSP, HSTS (prod), X-Content-Type-Options, X-Frame-Options, Referrer-Policy
   - local admin brute-force lockout (1/5/30 progression, by user+ip)
   - log scrubbing of Bearer tokens and known secrets via Monolog processor
   - token entropy regression test
   - expired manual block read-time filter + daily cleanup job
   - composer audit + npm audit in CI
   - doc/security.md describing posture; backup/restore in README and architecture.md
   ```

2. Append to `PROGRESS.md`:
   ```markdown
   ## M14 — Hardening (done)

   **Built:** security headers, lockout, log scrubbing, audits, doc/security.md.

   **Production checklist (run before exposing to internet):**
   - APP_ENV=production
   - Real OIDC tenant configured
   - Strong LOCAL_ADMIN_PASSWORD_HASH or LOCAL_ADMIN_ENABLED=false
   - Reverse proxy with TLS in front
   - Backups configured
   - composer audit / npm audit clean
   - Logs piped to your aggregator
   - MAXMIND_LICENSE_KEY set so refresh-geoip works
   - Scheduler running (host cron / systemd / sidecar)

   **Known limitations:**
   - In-process rate limiter and lockout state are per-replica.
   - Audit log is append-only but not tamper-evident; sign+chain is future work.
   - No 2FA on local admin (use OIDC instead).

   **Build complete.** All 14 milestones executed.
   ```

3. **Stop.** Final milestone reached.