M11-enrichment.md 20 KB

M11 — GeoIP / ASN Enrichment

Fresh Claude Code agent prompt. M07 must be complete (M08–M10 not strictly required, but recommended order). Estimated effort: small to medium.

Mission

Wire up MMDB-based GeoIP/ASN enrichment with three pluggable providers — DB-IP Lite (default, no auth required), MaxMind GeoLite2 (opt-in, license key), IPinfo Lite (opt-in, token). Build a single lookup wrapper, a working enrich-pending job (replacing the M05 skeleton), the refresh-geoip job (replacing the M05 stub that returned 412), and UI display of country flag and ASN on the IP detail page.

The provider abstraction is intentionally narrow: only the download path forks per provider. The on-disk format (MMDB) and the lookup path are common.

Before you start

  1. Verify previous milestones (especially M05, M07, M09):

    git log --oneline -10
    cd api && composer test && cd ..
    
  2. Read SPEC.md §2 (GeoIP/ASN section), §4 (ip_enrichment table), §6 (refresh-geoip and enrich-pending job endpoints), §10 (where the DBs live; /data/geoip/), §15 (note out-of-scope items).

  3. Pick a provider for development. All three speak MMDB; the lookup code does not care which is on disk. The default for fresh installs is DB-IP because it needs no credentials.

| Provider | Auth | License | Update cadence | Compression | Integrity check published | Attribution required | |---|---|---|---|---|---|---| | DB-IP Lite (default) | none | CC BY 4.0 | monthly (1st) | .mmdb.gz (single file) | no | yes — "IP Geolocation by DB-IP" | | MaxMind GeoLite2 (opt-in) | license key | MaxMind EULA, free tier | twice weekly | .tar.gz (directory) | yes — .sha256 companion | no | | IPinfo Lite (opt-in) | token | IPinfo TOS, free tier | weekly | .mmdb (uncompressed) | no | yes — "powered by IPinfo" |

  1. Test fixtures live in api/tests/Fixtures/geoip/ and are committed to the repo. They use the public GeoLite2-City-Test.mmdb / GeoLite2-ASN-Test.mmdb style fixtures from the maxmind/MaxMind-DB repo (Apache-2.0, vendorable). They cover IP 81.2.69.142 (GB) and a small IPv6 set. Acceptance does not depend on a real provider being reachable.

Tasks

1. MMDB wrapper

In api/src/Domain/Enrichment/:

  • EnrichmentResult.php — value object: countryCode: ?string, asn: ?int, asOrg: ?string, enrichedAt: DateTimeImmutable.
  • EnrichmentService.php interface: enrich(IpAddress $ip): EnrichmentResult.

In api/src/Infrastructure/Enrichment/:

  • MmdbEnrichmentService.php — implements EnrichmentService against any MMDB file. Accepts paths to two .mmdb files (Country and ASN) plus a RecordAdapter keyed on the configured provider. Lazy-loads readers; if a file is missing or unreadable, log a warning once per process lifetime and return an all-null result.
    • Use MaxMind\Db\Reader::get($ip) directly (the lower-level open-format reader; ships as a transitive dep of geoip2/geoip2). Avoid the higher-level Geoip2\Database\Reader::country() accessor — it's MaxMind-shape-specific and breaks on IPinfo's flat record schema.
    • Add geoip2/geoip2 to api/composer.json (allowed; SPEC §2 names MaxMind, and the package is the canonical PHP MMDB reader).
  • RecordAdapter.php — small interface with extractCountryCode(array $record): ?string, extractAsn(array $record): ?int, extractAsOrg(array $record): ?string. Three implementations:
    • MaxMindRecordAdapter — country: $record['country']['iso_code']; ASN: $record['autonomous_system_number'], $record['autonomous_system_organization']. (DB-IP shares this schema.)
    • IpinfoRecordAdapter — country: $record['country_code'] (uppercase ISO-3166); ASN: $record['asn'] (string like "AS13335" — strip prefix, cast to int), $record['as_name'].
  • EnrichmentRepository.php (new file under api/src/Infrastructure/Reputation/ to live next to IpEnrichmentRepository, OR replace the existing read-only IpEnrichmentRepository — pick the latter; keep one class):
    • find(string $ipBin): ?array — keep the existing M09 shape.
    • upsert(string $ipBin, string $ipText, EnrichmentResult $result): void — driver-aware UPSERT (mirrors IpScoreRepository::upsert for SQLite/MySQL split).
    • findPending(int $limit): array<string>ip_bin values that exist in reports or manual_blocks but not in ip_enrichment. Order by MIN(received_at) so older entries get caught up first. Use UNION over the two source tables, GROUP BY ip_bin, LEFT JOIN ip_enrichment filtering nulls.
    • clearAllEnrichedAt(): int — used only by the ?reenrich=true flag on refresh-geoip. Sets enriched_at = NULL so findPending re-picks rows up. Returns affected row count for the job's items_processed.

2. enrich-pending job — full implementation

Replace the skeleton in api/src/Application/Jobs/EnrichPendingJob.php:

  • Pulls a batch from EnrichmentRepository::findPending(limit=200).
  • For each ip: calls EnrichmentService::enrich, upserts the result.
  • If the configured MMDBs aren't present (e.g. opt-in provider whose credential was never set, or refresh-geoip hasn't run yet, or the fixtures weren't mounted):
    • The service returns all-null results. Don't store them — that would create poison rows. Detect by countryCode === null && asn === null and skip.
    • Log a single warning per job run (not per IP) and exit cleanly with items_processed=0.
  • Default interval: 300s. Max runtime: 60s.
  • Idempotent: if an IP is already enriched, skip it (the findPending query already excludes them).

3. refresh-geoip job — full implementation

Replace the stub in api/src/Application/Jobs/RefreshGeoipJob.php:

  • The job is provider-agnostic. Provider-specific logic sits behind a GeoIpDownloader interface in api/src/Infrastructure/Enrichment/Downloaders/:

    interface GeoIpDownloader {
      public function name(): string;          // "dbip" | "maxmind" | "ipinfo"
      public function requiresCredential(): bool;
      public function hasCredential(): bool;   // false ⇒ controller short-circuits 412
      /** @return array{country: string, asn: string} paths to verified .mmdb files in $tempDir */
      public function download(string $tempDir): array;
    }
    
  • Three implementations:

    • DbipDownloader (default)
    • URLs: https://download.db-ip.com/free/dbip-country-lite-YYYY-MM.mmdb.gz and …asn-lite….
    • On 404 (early-month rollover edge: monthly cuts publish on/around the 1st), fall back to previous month. Cap at one fallback step.
    • Verify each file by: (a) gzip-integrity (gzdecode round-trip), (b) opening the decoded MMDB with MaxMind\Db\Reader and reading metadata (fails fast on truncation/corruption), (c) sane row count: metadata.nodeCount > 100_000 for country, > 50_000 for ASN. No SHA-256 published; this stack is the substitute.
    • requiresCredential() returns false; hasCredential() always true.

    • MaxMindDownloader (opt-in)

    • URLs: MaxMind's permalink endpoint https://download.maxmind.com/app/geoip_download?edition_id=GeoLite2-Country&license_key=…&suffix=tar.gz (and GeoLite2-ASN).

    • Verify the tarball's SHA-256 against the matching …&suffix=tar.gz.sha256 URL.

    • Extract the .tar.gz, walk the resulting directory for the .mmdb file (MaxMind's tarball nests one).

    • requiresCredential() true; hasCredential() checks MAXMIND_LICENSE_KEY !== ''.

    • IPinfoDownloader (opt-in)

    • URLs: https://ipinfo.io/data/free/country.mmdb?token=… and …/free/asn.mmdb?token=…. Direct MMDB, no compression.

    • Verify identically to DB-IP (no integrity file published; metadata + node-count sanity check).

    • requiresCredential() true; hasCredential() checks IPINFO_TOKEN !== ''.

  • Job flow (provider-independent):

    • At the HTTP-handler level: if the selected downloader has requiresCredential() && !hasCredential(), return 412 Precondition Failed with {"error":"no_credential","provider":"<name>","missing":"MAXMIND_LICENSE_KEY"} (or IPINFO_TOKEN). Don't even start the job. For provider=dbip this 412 path is unreachable, since DB-IP needs no credential.
    • Otherwise the job:
    • Acquires its lock (default interval 7 days, JOB_GEOIP_REFRESH_INTERVAL_DAYS; max runtime 5 minutes).
    • Calls $downloader->download($tempDir).
    • Atomic-replaces the existing files at GEOIP_COUNTRY_DB and GEOIP_ASN_DB. tempnam() in the same filesystem as the target, write, rename() to the target. Avoid leaving partials if the process crashes.
    • Reloads in-process readers (MmdbEnrichmentService::reloadReaders() clears its cached MaxMind\Db\Reader instances).
    • On success: items_processed = sum of metadata.nodeCount from both files (rough indicator).
    • Optional ?reenrich=true query flag: after a successful refresh, also call EnrichmentRepository::clearAllEnrichedAt(). Reflect the count in the response. Default off.
  • On HTTP/network failure: write a failure run entry, log clearly with provider name (no credential in any log line), don't leave partial files.

  • Use Guzzle (already in api deps).

4. UI: IP detail enrichment panel

The endpoint GET /api/v1/admin/ips/{ip} already returns the enrichment block; from M09 the field is null. After this milestone the data fills in.

Update ui/resources/views/pages/ips/detail.twig:

  • If enrichment.country_code is null, show "Unknown" greyed out.
  • Otherwise show the country flag (Unicode regional indicator) + country name (use a small mapping or a JSON lookup table).
  • ASN: show as AS{asn} {as_org}, link to bgp.he.net or similar (target=_blank, rel=noopener) — optional but nice.
  • Add enriched_at as a small timestamp footer ("Enriched 4 hours ago").
  • Attribution footer under the panel: read the configured provider from the dashboard config endpoint (or expose via GET /api/v1/admin/config if not already; or pass through Twig globals) and render:
    • dbipIP Geolocation by <a href="https://db-ip.com">DB-IP</a> (CC BY 4.0).
    • ipinfoIP data powered by <a href="https://ipinfo.io">IPinfo</a>.
    • maxmind → no attribution required; render nothing.

5. Search filters

The IPs list page already accepts country and asn filters from M09. They should now actually filter results — the api joins ip_enrichment on the search query (already wired in IpScoreRepository::searchIps). Add a simple country dropdown using the populated set of countries seen so far via a new GET /api/v1/admin/ips/countries endpoint (returns [{code, count}] from SELECT country_code, COUNT(*) FROM ip_enrichment WHERE country_code IS NOT NULL GROUP BY country_code ORDER BY country_code).

6. Update healthz

/healthz on api now reports GeoIP DB status:

{
  "status": "ok",
  "db": {"connected": true, "driver": "sqlite"},
  "geoip": {
    "provider": "dbip",
    "provider_configured": true,
    "country_db_present": true,
    "asn_db_present": true,
    "country_db_modified": "2026-04-20T...",
    "asn_db_modified": "2026-04-20T..."
  }
}
  • provider_configured is true for dbip always, true for maxmind/ipinfo when the credential is set.
  • Missing DBs don't make /healthz unhealthy (the system still works without enrichment). Just report the state.

Implementation notes

Cross-provider

  • Stable on-disk filenames. Whatever provider supplied them, the runtime paths are GEOIP_COUNTRY_DB=/data/geoip/country.mmdb and GEOIP_ASN_DB=/data/geoip/asn.mmdb (generalize the SPEC §9 defaults — see "Deviations from SPEC" in the handoff). Downloaders write to a temp dir and the job atomic-renames to these stable paths. The lookup service never sees provider details.
  • Atomic file replace. tempnam() in /data/geoip/, write the new file, rename() to the target. Avoid leaving partials if the process crashes.
  • MMDB library. Use geoip2/geoip2 for the package; use the underlying MaxMind\Db\Reader class directly so the same code reads MaxMind, DB-IP, and IPinfo files. Don't roll your own .mmdb parser. Don't use a service that calls back to a remote API on every lookup — the local DB is the point.
  • IPv6. All three providers' DBs cover both families. Verify with a v6 lookup test against the fixtures.
  • Large batches. 200 per tick is a safe default. Each lookup is microseconds; 200 takes well under a second.
  • Tests. The fixture path is provider-independent: ship two small .mmdb files in api/tests/Fixtures/geoip/ and have the test harness point GEOIP_COUNTRY_DB/GEOIP_ASN_DB at them. Use the MaxMindRecordAdapter for fixture-based tests since the public test MMDBs use MaxMind's schema.

Provider-specific

  • DB-IP: monthly cadence — flag if country_db_modified is older than 45 days in healthz (warning, not error). License is CC BY 4.0; the UI footer + README must credit DB-IP. URL pattern is date-stamped; downloader composes from now() and falls back one month on 404.
  • MaxMind: never log the license key. Don't include it in error messages, job_runs.details, or any echoed config. Mask in the masked-config endpoint.
  • IPinfo: same — never log the token. Same masking treatment.
  • Build-time vs runtime DBs. The Dockerfile may bake DBs in at build time when an opt-in provider's credential is set as a build arg; otherwise they're absent until refresh-geoip runs. With DB-IP default, the entrypoint can optionally trigger an initial refresh-geoip on first boot if the files are missing — out of scope for this milestone; leave for M14 hardening.

Out of scope (DO NOT)

  • Other enrichment sources (Spamhaus, AbuseIPDB, internal corporate feeds). Three providers is the cap; the abstraction is enough.
  • Per-request enrichment lookups in the report endpoint. Enrichment is a background concern.
  • Reverse-DNS / WHOIS enrichment.
  • Auditing the enrichment job (M12 owns audit emission generally; this job logs to its job_runs row).
  • New API endpoints beyond what's listed (the /admin/ips/countries endpoint is the only addition).
  • Mass re-enrichment of all IPs on every refresh-geoip run. New DB ⇒ existing rows stay. The ?reenrich=true flag opts into clearing enriched_at so findPending re-picks them up — only on explicit request.
  • A fourth provider. Pick from the three above.
  • Auto-bootstrapping the DB on first container start. The job runs on schedule; first-run will populate.

Acceptance

The acceptance script is structured into three blocks: default provider (DB-IP, no credentials), then opt-ins (MaxMind, IPinfo). The fixture-based assertions are provider-independent and are the load-bearing checks for correctness.

cd api && composer cs && composer stan && composer test && cd ..

docker compose down -v
cp .env.example .env
# Default config: GEOIP_PROVIDER=dbip, no MAXMIND_LICENSE_KEY, no IPINFO_TOKEN
docker compose up -d
sleep 15

ADMIN_TOKEN=$(docker compose exec -T api php bin/console auth:create-token --kind=admin --role=admin --quiet)
INTERNAL_TOKEN=$(grep ^INTERNAL_JOB_TOKEN= .env | cut -d= -f2)

# --- Block A: default provider (DB-IP) ---

# DB-IP needs no credential — refresh-geoip does NOT 412.
# (Skip the live download in CI; assert the controller doesn't short-circuit.)
test "$(curl -s -o /dev/null -w '%{http_code}' \
  -H "Authorization: Bearer $INTERNAL_TOKEN" \
  -X POST 'http://localhost:8081/internal/jobs/refresh-geoip?dry_run=1')" != "412"

# enrich-pending no-ops cleanly when DBs are missing (regardless of provider)
RESP=$(curl -s -X POST -H "Authorization: Bearer $INTERNAL_TOKEN" \
  http://localhost:8081/internal/jobs/enrich-pending)
echo "$RESP" | grep -q '"status":"success"'
echo "$RESP" | grep -q '"items_processed":0'

# /healthz reports geoip status with provider name
curl -s http://localhost:8081/healthz | grep -q '"provider":"dbip"'
curl -s http://localhost:8081/healthz | grep -q '"country_db_present":false'

# Fixture-based functional check (provider-independent path)
docker compose cp api/tests/Fixtures/geoip/. api:/data/geoip/
RID=$(curl -s -X POST -H "Authorization: Bearer $ADMIN_TOKEN" -H "Content-Type: application/json" \
  -d '{"name":"test","trust_weight":1.0}' \
  http://localhost:8081/api/v1/admin/reporters | php -r 'echo json_decode(stream_get_contents(STDIN),true)["id"];')
RT=$(curl -s -X POST -H "Authorization: Bearer $ADMIN_TOKEN" -H "Content-Type: application/json" \
  -d "{\"kind\":\"reporter\",\"reporter_id\":$RID}" \
  http://localhost:8081/api/v1/admin/tokens | php -r 'echo json_decode(stream_get_contents(STDIN),true)["raw_token"];')
curl -s -X POST -H "Authorization: Bearer $RT" -H "Content-Type: application/json" \
  -d '{"ip":"81.2.69.142","category":"brute_force"}' \
  http://localhost:8081/api/v1/report > /dev/null

curl -s -X POST -H "Authorization: Bearer $INTERNAL_TOKEN" \
  http://localhost:8081/internal/jobs/enrich-pending | grep -q '"items_processed":1'

curl -s -H "Authorization: Bearer $ADMIN_TOKEN" \
  http://localhost:8081/api/v1/admin/ips/81.2.69.142 | grep -qE '"country_code":"(GB|US)"'

curl -s http://localhost:8081/healthz | grep -q '"country_db_present":true'

docker compose down -v

# --- Block B: MaxMind opt-in ---

cp .env.example .env
echo 'GEOIP_PROVIDER=maxmind' >> .env
# Leave MAXMIND_LICENSE_KEY empty
docker compose up -d
sleep 15
INTERNAL_TOKEN=$(grep ^INTERNAL_JOB_TOKEN= .env | cut -d= -f2)

# Missing license key now triggers 412 (not under DB-IP default)
test "$(curl -s -o /dev/null -w '%{http_code}' \
  -H "Authorization: Bearer $INTERNAL_TOKEN" \
  -X POST http://localhost:8081/internal/jobs/refresh-geoip)" = "412"
curl -s http://localhost:8081/healthz | grep -q '"provider":"maxmind"'
curl -s http://localhost:8081/healthz | grep -q '"provider_configured":false'

docker compose down -v

# --- Block C: IPinfo opt-in ---

cp .env.example .env
echo 'GEOIP_PROVIDER=ipinfo' >> .env
# Leave IPINFO_TOKEN empty
docker compose up -d
sleep 15
INTERNAL_TOKEN=$(grep ^INTERNAL_JOB_TOKEN= .env | cut -d= -f2)

test "$(curl -s -o /dev/null -w '%{http_code}' \
  -H "Authorization: Bearer $INTERNAL_TOKEN" \
  -X POST http://localhost:8081/internal/jobs/refresh-geoip)" = "412"
curl -s http://localhost:8081/healthz | grep -q '"provider":"ipinfo"'

docker compose down -v

Handoff

  1. Commit:

    feat(M11): MMDB enrichment with DB-IP / MaxMind / IPinfo providers
    
    - EnrichmentService backed by MaxMind\Db\Reader (open MMDB format)
    - GeoIpDownloader abstraction; DB-IP default, MaxMind & IPinfo opt-in
    - enrich-pending job (replaces M05 skeleton): 200 per tick, no-ops cleanly without DBs
    - refresh-geoip job: provider-aware download + verify + atomic replace
     - 412 only when an opt-in provider's credential is unset
    - IP detail UI shows country flag + ASN with provider attribution (graceful when null)
    - /healthz reports provider, configured state, DB presence + mtimes
    - country/asn filters on IPs list now functional; /admin/ips/countries dropdown source
    
  2. Append to PROGRESS.md:

    ## M11 — Enrichment (done)
    
    **Built:** MMDB wrapper, three pluggable downloaders (DB-IP / MaxMind / IPinfo),
    both jobs, UI display + attribution, healthz fields, country dropdown source.
    
    **Notes for next milestone:**
    - DBs live at /data/geoip/{country,asn}.mmdb (renamed from SPEC §9 defaults to be
     provider-agnostic; see "Deviations" below).
    - Default provider is DB-IP — no credential required, never returns 412.
    - MaxMind and IPinfo paths return 412 when their credential is empty.
    - License key / IPinfo token never logged.
    - Re-enrichment is opt-in via ?reenrich=true on refresh-geoip.
    - DB-IP and IPinfo: no upstream integrity file; verification is gzip-decode
     (DB-IP only) + MMDB metadata + node-count sanity. MaxMind keeps SHA-256.
    - Attribution rendered in UI for DB-IP and IPinfo per their license terms.
    
    **Deviations from SPEC:**
    - SPEC §9 named GEOIP_COUNTRY_DB=/data/geoip/GeoLite2-Country.mmdb. Renamed
     to /data/geoip/country.mmdb so the path is provider-agnostic. Documented
     in .env.example.
    - SPEC §2 names MaxMind GeoLite2 specifically; we keep MaxMind as a first-class
     provider but default to DB-IP (also MMDB) for friction-free self-hosting.
    
    **Added dependencies:** geoip2/geoip2 (mentioned in SPEC §2 as the planned
    library; we use its underlying MaxMind\Db\Reader for cross-provider support).
    
    **Added env vars:** GEOIP_PROVIDER (default `dbip`; values `dbip|maxmind|ipinfo`),
    IPINFO_TOKEN (used only when provider=ipinfo). MAXMIND_LICENSE_KEY was already
    in .env.example.
    
  3. Stop. Do not start M12.