M11-enrichment.md 10 KB

M11 — GeoIP / ASN Enrichment

Fresh Claude Code agent prompt. M07 must be complete (M08–M10 not strictly required, but recommended order). Estimated effort: small to medium.

Mission

Wire up MaxMind GeoLite2 enrichment: a wrapper service, a working enrich-pending job (replacing the M05 skeleton), the refresh-geoip job (replacing the M05 stub that returned 412), and UI display of country flag and ASN on the IP detail page.

Before you start

  1. Verify previous milestones (especially M05, M07, M09):

    git log --oneline -10
    cd api && composer test && cd ..
    
  2. Read SPEC.md §2 (GeoIP/ASN section), §4 (ip_enrichment table), §6 (refresh-geoip and enrich-pending job endpoints), §10 (where the DBs live; /data/geoip/), §15 (note out-of-scope items).

  3. Decide whether to test with a real MaxMind license. If not, the agent uses small fixture .mmdb files committed to the repo for tests. The php-maxmind/MaxMind-DB-Reader-php library can read fixtures.

Tasks

1. MaxMind wrapper

In api/src/Domain/Enrichment/:

  • EnrichmentResult.php — value object: countryCode: ?string, asn: ?int, asOrg: ?string, enrichedAt: DateTimeImmutable.
  • EnrichmentService.php interface: enrich(IpAddress $ip): EnrichmentResult.

In api/src/Infrastructure/Enrichment/:

  • MaxMindEnrichmentService.php — implements the interface using geoip2/geoip2. Accepts paths to two .mmdb files (Country and ASN). Lazy-loads the readers; if a file is missing, log a warning once and return a result with all-null fields. Add geoip2/geoip2 to api/composer.json if it isn't already (allowed; SPEC §2 names MaxMind).
  • EnrichmentRepository.php:
    • find(string $ipBin): ?EnrichmentRow
    • upsert(string $ipBin, EnrichmentResult)
    • findPending(int $limit): array<string> — returns ip_bin values that exist in reports or manual_blocks but not in ip_enrichment. Order by MIN(received_at) so older entries get caught up first.
    • Used by the job and by the admin endpoint GET /api/v1/admin/ips/{ip} (already returning the field, was null until now).

2. enrich-pending job — full implementation

Replace the skeleton in api/src/Application/Jobs/EnrichPendingJob.php:

  • Pulls a batch from EnrichmentRepository::findPending(limit=200).
  • For each ip: calls EnrichmentService::enrich, upserts the result.
  • If the MaxMind DBs aren't present (e.g. MAXMIND_LICENSE_KEY never set, no fallback .mmdbs):
    • The service returns all-null results. Don't store them — that would create poison rows. Instead, log a single warning per job run and exit cleanly with items_processed=0.
  • Default interval: 300s. Max runtime: 60s.
  • Idempotent: if an IP is already enriched, skip it (the findPending query already excludes them).

3. refresh-geoip job — full implementation

Replace the stub in api/src/Application/Jobs/RefreshGeoipJob.php:

  • If MAXMIND_LICENSE_KEY is empty: return 412 Precondition Failed from the HTTP handler with {"error":"no_license_key"}. The job itself shouldn't be invoked — the controller short-circuits.
  • Otherwise:
    • Download GeoLite2-Country.tar.gz and GeoLite2-ASN.tar.gz from MaxMind's permalink endpoint using HTTPS + license key.
    • Verify the tarball's SHA-256 against the matching .sha256 URL.
    • Extract to a temp dir.
    • Atomic-replace the existing .mmdb files at GEOIP_COUNTRY_DB and GEOIP_ASN_DB. Use rename within the same filesystem.
    • Reload the in-process readers (clear any cached singleton).
  • Default interval: 7 days (JOB_GEOIP_REFRESH_INTERVAL_DAYS). Max runtime: 5 minutes.
  • On HTTP/network failure: write a failure run entry, log clearly, don't leave partial files.
  • Use Guzzle (already in api deps).

4. UI: IP detail enrichment panel

The endpoint GET /api/v1/admin/ips/{ip} already returns the enrichment block; from M09 the field is null. After this milestone the data fills in.

Update ui/resources/views/pages/ips/detail.twig:

  • If enrichment.country_code is null, show "Unknown" greyed out.
  • Otherwise show the country flag (Unicode regional indicator) + country name (use a small mapping or a JSON lookup table).
  • ASN: show as AS{asn} {as_org}, link to bgp.he.net or similar (target=_blank, rel=noopener) — optional but nice.
  • Add enriched_at as a small timestamp footer ("Enriched 4 hours ago").

5. Search filters

The IPs list page already accepts country and asn filters from M09. They should now actually filter results — the api joins ip_enrichment on the search query. Add a simple country dropdown using the populated set of countries seen so far (one extra endpoint or just compute on the fly).

6. Update healthz

/healthz on api now reports GeoIP DB status:

{
  "status": "ok",
  "db": {"connected": true, "driver": "sqlite"},
  "geoip": {
    "country_db_present": true,
    "asn_db_present": true,
    "country_db_modified": "2026-04-20T...",
    "asn_db_modified": "2026-04-20T..."
  }
}

Missing DBs don't make /healthz unhealthy (the system still works without enrichment). Just report the state.

Implementation notes

  • Build-time vs runtime DBs: The Dockerfile may bake DBs in at build time if MAXMIND_LICENSE_KEY is set as a build arg; otherwise they're absent until refresh-geoip runs. Either way, the runtime path is /data/geoip/. The Dockerfile copies build-time DBs into /data/geoip/ if present.
  • License key handling: never log it. Don't include it in error messages or job_runs.details. Mask in any echoed config.
  • Atomic file replace: tempnam() in /data/geoip/, write the new file, rename() to the target. Avoid leaving partials if the process crashes.
  • MaxMind library: use geoip2/geoip2. Don't roll your own .mmdb parser. Don't use a service that calls back to MaxMind on every lookup — the local DB is the point.
  • IPv6: the same DBs cover both families. Verify with a v6 lookup test.
  • Large batches: 200 per tick is a safe default. Each lookup is fast; 200 takes well under a second.
  • Tests: ship two small fixture .mmdb files (the geoip2/geoip2 test fixtures are publicly licensed and small; you can vendor them in api/tests/Fixtures/geoip/). Use them in unit tests.

Out of scope (DO NOT)

  • Other enrichment sources (Spamhaus, IPInfo, AbuseIPDB). MaxMind only.
  • Per-request enrichment lookups in the report endpoint. Enrichment is a background concern.
  • Reverse-DNS / WHOIS enrichment.
  • Auditing the enrichment job (M12 owns audit emission generally; this job logs to its job_runs row).
  • New API endpoints beyond what's listed.
  • Mass re-enrichment of all IPs on every refresh-geoip run. New DB ⇒ existing rows stay. Add a ?reenrich=true flag to refresh-geoip that, if true, also nulls the enriched_at so findPending re-picks them up — but only run that on explicit request.

Acceptance

cd api && composer cs && composer stan && composer test && cd ..

docker compose down -v
cp .env.example .env
# DO NOT set MAXMIND_LICENSE_KEY for the first part of the test
docker compose up -d
sleep 15

ADMIN_TOKEN=$(docker compose exec -T api php bin/console auth:create-token --kind=admin --role=admin --quiet)
INTERNAL_TOKEN=$(grep ^INTERNAL_JOB_TOKEN= .env | cut -d= -f2)

# Without DBs / license key: refresh-geoip returns 412
test "$(curl -s -o /dev/null -w '%{http_code}' \
  -H "Authorization: Bearer $INTERNAL_TOKEN" \
  -X POST http://localhost:8081/internal/jobs/refresh-geoip)" = "412"

# enrich-pending no-ops cleanly when DBs are missing
RESP=$(curl -s -X POST -H "Authorization: Bearer $INTERNAL_TOKEN" \
  http://localhost:8081/internal/jobs/enrich-pending)
echo "$RESP" | grep -q '"status":"success"'
echo "$RESP" | grep -q '"items_processed":0'

# /healthz reports geoip status
curl -s http://localhost:8081/healthz | grep -q '"country_db_present":false'

# With fixture DBs present (copy them into the volume)
docker compose cp api/tests/Fixtures/geoip/. api:/data/geoip/
# Submit a report for an IP that's in the fixture
RID=$(curl -s -X POST -H "Authorization: Bearer $ADMIN_TOKEN" -H "Content-Type: application/json" \
  -d '{"name":"test","trust_weight":1.0}' \
  http://localhost:8081/api/v1/admin/reporters | php -r 'echo json_decode(stream_get_contents(STDIN),true)["id"];')
RT=$(curl -s -X POST -H "Authorization: Bearer $ADMIN_TOKEN" -H "Content-Type: application/json" \
  -d "{\"kind\":\"reporter\",\"reporter_id\":$RID}" \
  http://localhost:8081/api/v1/admin/tokens | php -r 'echo json_decode(stream_get_contents(STDIN),true)["raw_token"];')
curl -s -X POST -H "Authorization: Bearer $RT" -H "Content-Type: application/json" \
  -d '{"ip":"81.2.69.142","category":"brute_force"}' \
  http://localhost:8081/api/v1/report > /dev/null

# Run enrichment
curl -s -X POST -H "Authorization: Bearer $INTERNAL_TOKEN" \
  http://localhost:8081/internal/jobs/enrich-pending | grep -q '"items_processed":1'

# IP detail returns enrichment fields populated
curl -s -H "Authorization: Bearer $ADMIN_TOKEN" \
  http://localhost:8081/api/v1/admin/ips/81.2.69.142 | grep -qE '"country_code":"(GB|US)"'

# /healthz reflects DB presence
curl -s http://localhost:8081/healthz | grep -q '"country_db_present":true'

docker compose down -v

Handoff

  1. Commit:

    feat(M11): MaxMind GeoLite2 enrichment
    
    - EnrichmentService backed by geoip2/geoip2
    - enrich-pending job (replaces M05 skeleton): 200 per tick, no-ops cleanly without DBs
    - refresh-geoip job: download + verify + atomic replace, 412 without license key
    - IP detail UI shows country flag + ASN (graceful when null)
    - /healthz reports geoip db status
    - country/asn filters on IPs list now functional
    
  2. Append to PROGRESS.md:

    ## M11 — Enrichment (done)
    
    **Built:** GeoIP wrapper, both jobs, UI display, healthz fields.
    
    **Notes for next milestone:**
    - DBs live at /data/geoip/. Without MAXMIND_LICENSE_KEY they must be present before the container starts (mount or copy in).
    - License key never logged.
    - Re-enrichment is opt-in via ?reenrich=true on refresh-geoip.
    
    **Deviations from SPEC:** none.
    **Added dependencies:** geoip2/geoip2 (mentioned in SPEC §2 as the planned library).
    
  3. Stop. Do not start M12.