M14-hardening.md 12 KB

M14 — Security Hardening

Fresh Claude Code agent prompt. M13 must be complete and committed. Estimated effort: medium.

Mission

Harden both containers: security headers, full brute-force lockout for local admin, audit secret-scrubbing in logs, token entropy verification, backup guidance verification, expired-manual-block cleanup. By the end, a security review checklist passes.

Before you start

  1. Verify M13:

    git log --oneline -13
    
  2. Read SPEC.md §8 (auth, especially CSRF/sessions), §10 (backup notes), §12 M14 (the hardening milestone — your reference).

  3. The OWASP top 10 is a useful mental model for this milestone. Don't take it as a checklist; do treat it as "did I think about each of these?"

Tasks

1. Security headers

In both api and ui Caddy configs, add a header bundle on every response:

  • Strict-Transport-Security: max-age=31536000; includeSubDomains — only when APP_ENV=production. Don't HSTS in dev or you'll lock yourself out of localhost.
  • X-Content-Type-Options: nosniff
  • X-Frame-Options: DENY (UI) / X-Frame-Options: SAMEORIGIN (api)
  • Referrer-Policy: strict-origin-when-cross-origin
  • Permissions-Policy: geolocation=(), microphone=(), camera=()
  • CSP for the UI:

    • default-src 'self'
    • script-src 'self' 'wasm-unsafe-eval' (Alpine doesn't need unsafe-eval; only allow it if a build dep demands it)
    • style-src 'self' 'unsafe-inline' (Tailwind compiled, but inline styles for dynamic things like score bars)
    • img-src 'self' data: (data: for tiny inline icons)
    • connect-src 'self' <API_BASE_URL> if the UI ever does direct browser→api calls (it doesn't today; but htmx might add one)
    • frame-ancestors 'none'
    • base-uri 'self'
    • form-action 'self'
    • Test that the UI doesn't violate its own CSP. Run a browser, check the console, fix any violations by either tightening the page's HTML or relaxing CSP minimally with comment justification.
  • CSP for the api: very restrictive (default-src 'none'; frame-ancestors 'none') since the api serves only JSON, the OpenAPI viewer, and YAML. The /api/docs page does need styles+scripts for RapiDoc/Elements; relax CSP only on that route.

2. Local admin brute-force lockout (full)

Replace M08's basic 5/30s throttle with a persistent lockout:

  • Track failed attempts per (LOCAL_ADMIN_USERNAME, source_ip) pair in a small in-memory store (singleton service in the ui container) plus the session.
  • Failure progression: 1–4 failures fast retry; 5 failures → 1-minute lockout; 10 → 5-minute lockout; 15+ → 30-minute lockout. Reset the counter on a successful login.
  • Lock by username AND by IP separately so attackers can't lock out the legitimate admin from another IP.
  • Log every failure at WARN, every lockout at ERROR, with the source IP. Don't log the attempted password.
  • Document in doc/auth-flows.md (update from M13) — including how to clear a lockout (restart the ui container, since this is in-memory; the lockout is intentionally short enough that this is rarely needed).

3. Token entropy verification

In api/tests/Unit/Auth/:

  • TokenEntropyTest.php — generates 1000 tokens, asserts ≥160 bits of unique randomness (in practice, all-distinct).
  • Verifies the format irdb_<3>_<32 base32 chars>.
  • Confirms random_bytes (CSPRNG) is the source.

4. Logs scrubbed of secrets

  • Audit all log output paths. Search the codebase for places that might log:
    • Bearer tokens (any Authorization header content).
    • LOCAL_ADMIN_PASSWORD_HASH.
    • OIDC_CLIENT_SECRET.
    • MAXMIND_LICENSE_KEY.
    • Database passwords.
  • Add a Monolog processor that scrubs known-sensitive keys from the context array before formatting. Pattern:

    ['authorization' => 'Bearer abc...'] → ['authorization' => 'Bearer ***']
    
  • Add a test that constructs a log record with a Bearer token in context and asserts the formatted output is scrubbed.

5. Expired manual block cleanup

A small loose end from M06: manual blocks have expires_at but nothing prunes expired ones. Two approaches:

  • Filter at read time: every read of manual_blocks ignores rows with expires_at < now. The CidrEvaluator already could do this — verify and fix if not. Pros: zero new infrastructure. Cons: rows accumulate.
  • Add a cleanup job: register CleanupExpiredManualBlocksJob that deletes them daily.

Recommended: do both. Filter at read for correctness, prune in a daily job for tidiness.

If adding a job: register it, add an audit entry per delete, verify with a test.

6. Rate limiting beyond the public API

  • The current rate limiter applies only to public API endpoints. Add a soft limit to login attempts on the UI (covered by §2 above).
  • Consider whether admin endpoints need a limit. Real abuse on admin endpoints is rare (Bearer-authed humans/UI). Leave admin unrated unless you measure a problem.
  • Document the rate-limit posture in doc/api-overview.md (update from M13).

7. Backups

Verify M13's README has clear instructions for:

  • SQLite + Docker volume: docker run --rm -v irdb-data:/data -v $(pwd):/backup alpine tar czf /backup/irdb-backup.tar.gz -C /data . — describe the equivalent restore.
  • MySQL: mysqldump example via docker compose exec.
  • Restore: the inverse, with the api container stopped during restore.
  • What to NOT back up: rotating tokens (they're recoverable), GeoIP DBs (re-downloadable).

Add to doc/architecture.md (update from M13): a "Disaster Recovery" subsection covering the same.

8. Secrets at rest verification

  • Confirm tokens are never stored in plaintext (M03 work; verify with a manual SQL inspection).
  • Confirm no secret values appear in audit_log.payload.
  • Confirm /api/v1/admin/config masks all the secrets it should (M12).
  • Add a regression test that scans the schema for any column literally named password or containing _secret and asserts none store unhashed values (best-effort sanity check).

9. Dependency vulnerability scan

  • Add a CI job: composer audit (PHP) and npm audit --omit=dev (UI). Fail on critical/high.
  • Document the policy: when an audit fails, an admin reviews and either patches or accepts with a documented exception.

10. Final security review checklist

Add doc/security.md capturing the actual posture: authn, authz, transport, data at rest, secrets management, logging, rate limits, supply chain. Concrete, factual, ≤300 lines. Do not make claims you can't back up.

Implementation notes

  • CSP iteration: enable in "Report-Only" mode first if you want a faster cycle (Content-Security-Policy-Report-Only), check the browser console, then switch to enforcing.
  • HSTS gotcha: HSTS is sticky in browsers. If you turn it on in dev with localhost, you may break local development for yourself. Gate strictly on APP_ENV=production.
  • Brute-force lockout vs UX: too aggressive = legit admins lock themselves out. The 1/5/30 progression is moderate. Don't go to "permanent ban" — the local admin path is a recovery channel, not a daily-use channel.
  • Auditing the auditor: changes to audit_log config (retention, etc.) should themselves be audited. Verify the M12 emitter wraps any settings endpoint that touches audit retention.
  • Don't introduce new attack surface in the name of "hardening": e.g., don't add a "lockout-clear" endpoint reachable from the API. Reset is via container restart; that's safer.

Out of scope (DO NOT)

  • WAF rules, IPS integration, fail2ban for the admin UI itself. Out of scope.
  • 2FA on local admin. Use OIDC for that.
  • mTLS between containers. The Docker network isolation is the trust boundary; documenting that is enough.
  • Penetration test report. The agent is not a pentester.
  • Encryption at rest of the SQLite file. The volume's host-level disk encryption is the right layer.
  • Audit log signing / tamper-evidence. Future work.

Acceptance

cd api && composer cs && composer stan && composer test && cd ..
cd ui  && composer cs && composer stan && composer test && cd ..

# composer + npm audit
cd api && composer audit && cd ..
cd ui  && npm ci && npm audit --omit=dev && cd ..

docker compose down -v
cp .env.example .env
docker compose up -d
sleep 15

# Security headers present on UI
HEADERS=$(curl -sI http://localhost:8080/login)
echo "$HEADERS" | grep -qi "X-Content-Type-Options: nosniff"
echo "$HEADERS" | grep -qi "X-Frame-Options: DENY"
echo "$HEADERS" | grep -qi "Content-Security-Policy:"
echo "$HEADERS" | grep -qi "Referrer-Policy:"

# Headers on API
HEADERS=$(curl -sI http://localhost:8081/healthz)
echo "$HEADERS" | grep -qi "X-Content-Type-Options: nosniff"
echo "$HEADERS" | grep -qi "X-Frame-Options:"

# In production mode, HSTS appears (skip if not testing prod)
# HEADERS=$(APP_ENV=production curl -sI ...) — manual

# Local admin lockout: 5 fails should trigger lockout
COOKIE=$(mktemp)
for i in 1 2 3 4 5; do
  CSRF=$(curl -s -c $COOKIE http://localhost:8080/login | grep -oE 'name="csrf_token" value="[^"]+"' | cut -d'"' -f4)
  curl -s -b $COOKIE -c $COOKIE -X POST \
    -d "csrf_token=$CSRF&username=admin&password=WRONG" \
    http://localhost:8080/login/local > /dev/null
done
CSRF=$(curl -s -c $COOKIE http://localhost:8080/login | grep -oE 'name="csrf_token" value="[^"]+"' | cut -d'"' -f4)
RESP=$(curl -s -b $COOKIE -c $COOKIE -X POST \
  -d "csrf_token=$CSRF&username=admin&password=test1234" \
  http://localhost:8080/login/local -L)
echo "$RESP" | grep -qi "locked\|too many\|wait"

# Bearer tokens never appear unmasked in logs
docker compose logs 2>&1 | grep -E "Bearer irdb_(rep|con|adm|svc)_[A-Z2-7]+" && \
  { echo "TOKEN LEAKED IN LOGS"; exit 1; } || true

# Token entropy test passes
cd api && vendor/bin/phpunit --filter TokenEntropyTest && cd ..

# Expired manual block test (insert one with a past expires_at, run cleanup, verify it's gone or filtered)
ADMIN_TOKEN=$(docker compose exec -T api php bin/console auth:create-token --kind=admin --role=admin --quiet)
INTERNAL_TOKEN=$(grep ^INTERNAL_JOB_TOKEN= .env | cut -d= -f2)
curl -s -X POST -H "Authorization: Bearer $ADMIN_TOKEN" -H "Content-Type: application/json" \
  -d '{"kind":"ip","ip":"203.0.113.250","reason":"expired test","expires_at":"2020-01-01T00:00:00Z"}' \
  http://localhost:8081/api/v1/admin/manual-blocks > /dev/null
# Run cleanup if you added a job; otherwise just verify the read-time filter:
curl -s -H "Authorization: Bearer $ADMIN_TOKEN" \
  http://localhost:8081/api/v1/admin/manual-blocks | grep -v "203.0.113.250"

# Quick CSP smoke test: load the UI in headless chrome (manual or via puppeteer in CI), no CSP violations
# (omit if no headless browser available; rely on developer manual verification)

docker compose down -v

Handoff

  1. Commit:

    feat(M14): security hardening
    
    - CSP, HSTS (prod), X-Content-Type-Options, X-Frame-Options, Referrer-Policy
    - local admin brute-force lockout (1/5/30 progression, by user+ip)
    - log scrubbing of Bearer tokens and known secrets via Monolog processor
    - token entropy regression test
    - expired manual block read-time filter + daily cleanup job
    - composer audit + npm audit in CI
    - doc/security.md describing posture; backup/restore in README and architecture.md
    
  2. Append to PROGRESS.md:

    ## M14 — Hardening (done)
    
    **Built:** security headers, lockout, log scrubbing, audits, doc/security.md.
    
    **Production checklist (run before exposing to internet):**
    - APP_ENV=production
    - Real OIDC tenant configured
    - Strong LOCAL_ADMIN_PASSWORD_HASH or LOCAL_ADMIN_ENABLED=false
    - Reverse proxy with TLS in front
    - Backups configured
    - composer audit / npm audit clean
    - Logs piped to your aggregator
    - MAXMIND_LICENSE_KEY set so refresh-geoip works
    - Scheduler running (host cron / systemd / sidecar)
    
    **Known limitations:**
    - In-process rate limiter and lockout state are per-replica.
    - Audit log is append-only but not tamper-evident; sign+chain is future work.
    - No 2FA on local admin (use OIDC instead).
    
    **Build complete.** All 14 milestones executed.
    
  3. Stop. Final milestone reached.