# M11 — GeoIP / ASN Enrichment > Fresh Claude Code agent prompt. M07 must be complete (M08–M10 not strictly required, but recommended order). > Estimated effort: small to medium. ## Mission Wire up MMDB-based GeoIP/ASN enrichment with three pluggable providers — **DB-IP Lite (default, no auth required)**, **MaxMind GeoLite2 (opt-in, license key)**, **IPinfo Lite (opt-in, token)**. Build a single lookup wrapper, a working `enrich-pending` job (replacing the M05 skeleton), the `refresh-geoip` job (replacing the M05 stub that returned 412), and UI display of country flag and ASN on the IP detail page. The provider abstraction is intentionally narrow: only the **download** path forks per provider. The on-disk format (MMDB) and the lookup path are common. ## Before you start 1. Verify previous milestones (especially M05, M07, M09): ```bash git log --oneline -10 cd api && composer test && cd .. ``` 2. Read `SPEC.md` §2 (GeoIP/ASN section), §4 (`ip_enrichment` table), §6 (`refresh-geoip` and `enrich-pending` job endpoints), §10 (where the DBs live; `/data/geoip/`), §15 (note out-of-scope items). 3. **Pick a provider for development.** All three speak MMDB; the lookup code does not care which is on disk. The default for fresh installs is DB-IP because it needs no credentials. | Provider | Auth | License | Update cadence | Compression | Integrity check published | Attribution required | |---|---|---|---|---|---|---| | **DB-IP Lite** (default) | none | CC BY 4.0 | monthly (1st) | `.mmdb.gz` (single file) | no | yes — "IP Geolocation by DB-IP" | | MaxMind GeoLite2 (opt-in) | license key | MaxMind EULA, free tier | twice weekly | `.tar.gz` (directory) | yes — `.sha256` companion | no | | IPinfo Lite (opt-in) | token | IPinfo TOS, free tier | weekly | `.mmdb` (uncompressed) | no | yes — "powered by IPinfo" | 4. Test fixtures live in `api/tests/Fixtures/geoip/` and are committed to the repo. They use the public `GeoLite2-City-Test.mmdb` / `GeoLite2-ASN-Test.mmdb` style fixtures from the `maxmind/MaxMind-DB` repo (Apache-2.0, vendorable). They cover IP `81.2.69.142` (GB) and a small IPv6 set. Acceptance does not depend on a real provider being reachable. ## Tasks ### 1. MMDB wrapper In `api/src/Domain/Enrichment/`: - `EnrichmentResult.php` — value object: `countryCode: ?string`, `asn: ?int`, `asOrg: ?string`, `enrichedAt: DateTimeImmutable`. - `EnrichmentService.php` interface: `enrich(IpAddress $ip): EnrichmentResult`. In `api/src/Infrastructure/Enrichment/`: - `MmdbEnrichmentService.php` — implements `EnrichmentService` against any MMDB file. Accepts paths to two `.mmdb` files (Country and ASN) plus a `RecordAdapter` keyed on the configured provider. Lazy-loads readers; if a file is missing or unreadable, log a warning **once per process lifetime** and return an all-null result. - Use `MaxMind\Db\Reader::get($ip)` directly (the lower-level open-format reader; ships as a transitive dep of `geoip2/geoip2`). Avoid the higher-level `Geoip2\Database\Reader::country()` accessor — it's MaxMind-shape-specific and breaks on IPinfo's flat record schema. - Add `geoip2/geoip2` to `api/composer.json` (allowed; SPEC §2 names MaxMind, and the package is the canonical PHP MMDB reader). - `RecordAdapter.php` — small interface with `extractCountryCode(array $record): ?string`, `extractAsn(array $record): ?int`, `extractAsOrg(array $record): ?string`. Three implementations: - `MaxMindRecordAdapter` — country: `$record['country']['iso_code']`; ASN: `$record['autonomous_system_number']`, `$record['autonomous_system_organization']`. (DB-IP shares this schema.) - `IpinfoRecordAdapter` — country: `$record['country_code']` (uppercase ISO-3166); ASN: `$record['asn']` (string like `"AS13335"` — strip prefix, cast to int), `$record['as_name']`. - `EnrichmentRepository.php` (new file under `api/src/Infrastructure/Reputation/` to live next to `IpEnrichmentRepository`, OR replace the existing read-only `IpEnrichmentRepository` — pick the latter; keep one class): - `find(string $ipBin): ?array` — keep the existing M09 shape. - `upsert(string $ipBin, string $ipText, EnrichmentResult $result): void` — driver-aware UPSERT (mirrors `IpScoreRepository::upsert` for SQLite/MySQL split). - `findPending(int $limit): array` — `ip_bin` values that exist in `reports` or `manual_blocks` but not in `ip_enrichment`. Order by `MIN(received_at)` so older entries get caught up first. Use `UNION` over the two source tables, GROUP BY ip_bin, LEFT JOIN `ip_enrichment` filtering nulls. - `clearAllEnrichedAt(): int` — used only by the `?reenrich=true` flag on `refresh-geoip`. Sets `enriched_at = NULL` so `findPending` re-picks rows up. Returns affected row count for the job's `items_processed`. ### 2. `enrich-pending` job — full implementation Replace the skeleton in `api/src/Application/Jobs/EnrichPendingJob.php`: - Pulls a batch from `EnrichmentRepository::findPending(limit=200)`. - For each ip: calls `EnrichmentService::enrich`, upserts the result. - If the configured MMDBs aren't present (e.g. opt-in provider whose credential was never set, or `refresh-geoip` hasn't run yet, or the fixtures weren't mounted): - The service returns all-null results. **Don't store them** — that would create poison rows. Detect by `countryCode === null && asn === null` and skip. - Log a single warning per job run (not per IP) and exit cleanly with `items_processed=0`. - Default interval: 300s. Max runtime: 60s. - Idempotent: if an IP is already enriched, skip it (the `findPending` query already excludes them). ### 3. `refresh-geoip` job — full implementation Replace the stub in `api/src/Application/Jobs/RefreshGeoipJob.php`: - The job is provider-agnostic. Provider-specific logic sits behind a `GeoIpDownloader` interface in `api/src/Infrastructure/Enrichment/Downloaders/`: ```php interface GeoIpDownloader { public function name(): string; // "dbip" | "maxmind" | "ipinfo" public function requiresCredential(): bool; public function hasCredential(): bool; // false ⇒ controller short-circuits 412 /** @return array{country: string, asn: string} paths to verified .mmdb files in $tempDir */ public function download(string $tempDir): array; } ``` - Three implementations: - **`DbipDownloader`** (default) - URLs: `https://download.db-ip.com/free/dbip-country-lite-YYYY-MM.mmdb.gz` and `…asn-lite…`. - On 404 (early-month rollover edge: monthly cuts publish on/around the 1st), fall back to previous month. Cap at one fallback step. - Verify each file by: (a) gzip-integrity (`gzdecode` round-trip), (b) opening the decoded MMDB with `MaxMind\Db\Reader` and reading metadata (fails fast on truncation/corruption), (c) sane row count: `metadata.nodeCount > 100_000` for country, `> 50_000` for ASN. No SHA-256 published; this stack is the substitute. - `requiresCredential()` returns false; `hasCredential()` always true. - **`MaxMindDownloader`** (opt-in) - URLs: MaxMind's permalink endpoint `https://download.maxmind.com/app/geoip_download?edition_id=GeoLite2-Country&license_key=…&suffix=tar.gz` (and `GeoLite2-ASN`). - Verify the tarball's SHA-256 against the matching `…&suffix=tar.gz.sha256` URL. - Extract the `.tar.gz`, walk the resulting directory for the `.mmdb` file (MaxMind's tarball nests one). - `requiresCredential()` true; `hasCredential()` checks `MAXMIND_LICENSE_KEY !== ''`. - **`IPinfoDownloader`** (opt-in) - URLs: `https://ipinfo.io/data/free/country.mmdb?token=…` and `…/free/asn.mmdb?token=…`. Direct MMDB, no compression. - Verify identically to DB-IP (no integrity file published; metadata + node-count sanity check). - `requiresCredential()` true; `hasCredential()` checks `IPINFO_TOKEN !== ''`. - Job flow (provider-independent): - At the HTTP-handler level: if the selected downloader has `requiresCredential() && !hasCredential()`, return `412 Precondition Failed` with `{"error":"no_credential","provider":"","missing":"MAXMIND_LICENSE_KEY"}` (or `IPINFO_TOKEN`). Don't even start the job. **For provider=dbip this 412 path is unreachable**, since DB-IP needs no credential. - Otherwise the job: - Acquires its lock (default interval 7 days, `JOB_GEOIP_REFRESH_INTERVAL_DAYS`; max runtime 5 minutes). - Calls `$downloader->download($tempDir)`. - Atomic-replaces the existing files at `GEOIP_COUNTRY_DB` and `GEOIP_ASN_DB`. `tempnam()` in the same filesystem as the target, write, `rename()` to the target. Avoid leaving partials if the process crashes. - Reloads in-process readers (`MmdbEnrichmentService::reloadReaders()` clears its cached `MaxMind\Db\Reader` instances). - On success: `items_processed` = sum of `metadata.nodeCount` from both files (rough indicator). - Optional `?reenrich=true` query flag: after a successful refresh, also call `EnrichmentRepository::clearAllEnrichedAt()`. Reflect the count in the response. Default off. - On HTTP/network failure: write a failure run entry, log clearly with provider name (no credential in any log line), don't leave partial files. - Use Guzzle (already in api deps). ### 4. UI: IP detail enrichment panel The endpoint `GET /api/v1/admin/ips/{ip}` already returns the `enrichment` block; from M09 the field is null. After this milestone the data fills in. Update `ui/resources/views/pages/ips/detail.twig`: - If `enrichment.country_code` is null, show "Unknown" greyed out. - Otherwise show the country flag (Unicode regional indicator) + country name (use a small mapping or a JSON lookup table). - ASN: show as `AS{asn} {as_org}`, link to bgp.he.net or similar (target=_blank, rel=noopener) — optional but nice. - Add `enriched_at` as a small timestamp footer ("Enriched 4 hours ago"). - **Attribution footer** under the panel: read the configured provider from the dashboard config endpoint (or expose via `GET /api/v1/admin/config` if not already; or pass through Twig globals) and render: - `dbip` → `IP Geolocation by DB-IP` (CC BY 4.0). - `ipinfo` → `IP data powered by IPinfo`. - `maxmind` → no attribution required; render nothing. ### 5. Search filters The IPs list page already accepts `country` and `asn` filters from M09. They should now actually filter results — the api joins `ip_enrichment` on the search query (already wired in `IpScoreRepository::searchIps`). Add a simple country dropdown using the populated set of countries seen so far via a new `GET /api/v1/admin/ips/countries` endpoint (returns `[{code, count}]` from `SELECT country_code, COUNT(*) FROM ip_enrichment WHERE country_code IS NOT NULL GROUP BY country_code ORDER BY country_code`). ### 6. Update healthz `/healthz` on api now reports GeoIP DB status: ```json { "status": "ok", "db": {"connected": true, "driver": "sqlite"}, "geoip": { "provider": "dbip", "provider_configured": true, "country_db_present": true, "asn_db_present": true, "country_db_modified": "2026-04-20T...", "asn_db_modified": "2026-04-20T..." } } ``` - `provider_configured` is `true` for `dbip` always, `true` for `maxmind`/`ipinfo` when the credential is set. - Missing DBs don't make `/healthz` unhealthy (the system still works without enrichment). Just report the state. ## Implementation notes ### Cross-provider - **Stable on-disk filenames.** Whatever provider supplied them, the runtime paths are `GEOIP_COUNTRY_DB=/data/geoip/country.mmdb` and `GEOIP_ASN_DB=/data/geoip/asn.mmdb` (generalize the SPEC §9 defaults — see "Deviations from SPEC" in the handoff). Downloaders write to a temp dir and the job atomic-renames to these stable paths. The lookup service never sees provider details. - **Atomic file replace.** `tempnam()` in `/data/geoip/`, write the new file, `rename()` to the target. Avoid leaving partials if the process crashes. - **MMDB library.** Use `geoip2/geoip2` for the package; use the underlying `MaxMind\Db\Reader` class directly so the same code reads MaxMind, DB-IP, and IPinfo files. Don't roll your own `.mmdb` parser. Don't use a service that calls back to a remote API on every lookup — the local DB is the point. - **IPv6.** All three providers' DBs cover both families. Verify with a v6 lookup test against the fixtures. - **Large batches.** 200 per tick is a safe default. Each lookup is microseconds; 200 takes well under a second. - **Tests.** The fixture path is provider-independent: ship two small `.mmdb` files in `api/tests/Fixtures/geoip/` and have the test harness point `GEOIP_COUNTRY_DB`/`GEOIP_ASN_DB` at them. Use the `MaxMindRecordAdapter` for fixture-based tests since the public test MMDBs use MaxMind's schema. ### Provider-specific - **DB-IP**: monthly cadence — flag if `country_db_modified` is older than 45 days in healthz (warning, not error). License is CC BY 4.0; the UI footer + README must credit DB-IP. URL pattern is date-stamped; downloader composes from `now()` and falls back one month on 404. - **MaxMind**: never log the license key. Don't include it in error messages, `job_runs.details`, or any echoed config. Mask in the masked-config endpoint. - **IPinfo**: same — never log the token. Same masking treatment. - **Build-time vs runtime DBs**. The Dockerfile may bake DBs in at build time when an opt-in provider's credential is set as a build arg; otherwise they're absent until `refresh-geoip` runs. With DB-IP default, the entrypoint can optionally trigger an initial `refresh-geoip` on first boot if the files are missing — out of scope for this milestone; leave for M14 hardening. ## Out of scope (DO NOT) - Other enrichment sources (Spamhaus, AbuseIPDB, internal corporate feeds). Three providers is the cap; the abstraction is enough. - Per-request enrichment lookups in the report endpoint. Enrichment is a background concern. - Reverse-DNS / WHOIS enrichment. - Auditing the enrichment job (M12 owns audit emission generally; this job logs to its `job_runs` row). - New API endpoints beyond what's listed (the `/admin/ips/countries` endpoint is the only addition). - Mass re-enrichment of all IPs on every refresh-geoip run. New DB ⇒ existing rows stay. The `?reenrich=true` flag opts into clearing `enriched_at` so `findPending` re-picks them up — only on explicit request. - A fourth provider. Pick from the three above. - Auto-bootstrapping the DB on first container start. The job runs on schedule; first-run will populate. ## Acceptance The acceptance script is structured into three blocks: default provider (DB-IP, no credentials), then opt-ins (MaxMind, IPinfo). The fixture-based assertions are provider-independent and are the load-bearing checks for correctness. ```bash cd api && composer cs && composer stan && composer test && cd .. docker compose down -v cp .env.example .env # Default config: GEOIP_PROVIDER=dbip, no MAXMIND_LICENSE_KEY, no IPINFO_TOKEN docker compose up -d sleep 15 ADMIN_TOKEN=$(docker compose exec -T api php bin/console auth:create-token --kind=admin --role=admin --quiet) INTERNAL_TOKEN=$(grep ^INTERNAL_JOB_TOKEN= .env | cut -d= -f2) # --- Block A: default provider (DB-IP) --- # DB-IP needs no credential — refresh-geoip does NOT 412. # (Skip the live download in CI; assert the controller doesn't short-circuit.) test "$(curl -s -o /dev/null -w '%{http_code}' \ -H "Authorization: Bearer $INTERNAL_TOKEN" \ -X POST 'http://localhost:8081/internal/jobs/refresh-geoip?dry_run=1')" != "412" # enrich-pending no-ops cleanly when DBs are missing (regardless of provider) RESP=$(curl -s -X POST -H "Authorization: Bearer $INTERNAL_TOKEN" \ http://localhost:8081/internal/jobs/enrich-pending) echo "$RESP" | grep -q '"status":"success"' echo "$RESP" | grep -q '"items_processed":0' # /healthz reports geoip status with provider name curl -s http://localhost:8081/healthz | grep -q '"provider":"dbip"' curl -s http://localhost:8081/healthz | grep -q '"country_db_present":false' # Fixture-based functional check (provider-independent path) docker compose cp api/tests/Fixtures/geoip/. api:/data/geoip/ RID=$(curl -s -X POST -H "Authorization: Bearer $ADMIN_TOKEN" -H "Content-Type: application/json" \ -d '{"name":"test","trust_weight":1.0}' \ http://localhost:8081/api/v1/admin/reporters | php -r 'echo json_decode(stream_get_contents(STDIN),true)["id"];') RT=$(curl -s -X POST -H "Authorization: Bearer $ADMIN_TOKEN" -H "Content-Type: application/json" \ -d "{\"kind\":\"reporter\",\"reporter_id\":$RID}" \ http://localhost:8081/api/v1/admin/tokens | php -r 'echo json_decode(stream_get_contents(STDIN),true)["raw_token"];') curl -s -X POST -H "Authorization: Bearer $RT" -H "Content-Type: application/json" \ -d '{"ip":"81.2.69.142","category":"brute_force"}' \ http://localhost:8081/api/v1/report > /dev/null curl -s -X POST -H "Authorization: Bearer $INTERNAL_TOKEN" \ http://localhost:8081/internal/jobs/enrich-pending | grep -q '"items_processed":1' curl -s -H "Authorization: Bearer $ADMIN_TOKEN" \ http://localhost:8081/api/v1/admin/ips/81.2.69.142 | grep -qE '"country_code":"(GB|US)"' curl -s http://localhost:8081/healthz | grep -q '"country_db_present":true' docker compose down -v # --- Block B: MaxMind opt-in --- cp .env.example .env echo 'GEOIP_PROVIDER=maxmind' >> .env # Leave MAXMIND_LICENSE_KEY empty docker compose up -d sleep 15 INTERNAL_TOKEN=$(grep ^INTERNAL_JOB_TOKEN= .env | cut -d= -f2) # Missing license key now triggers 412 (not under DB-IP default) test "$(curl -s -o /dev/null -w '%{http_code}' \ -H "Authorization: Bearer $INTERNAL_TOKEN" \ -X POST http://localhost:8081/internal/jobs/refresh-geoip)" = "412" curl -s http://localhost:8081/healthz | grep -q '"provider":"maxmind"' curl -s http://localhost:8081/healthz | grep -q '"provider_configured":false' docker compose down -v # --- Block C: IPinfo opt-in --- cp .env.example .env echo 'GEOIP_PROVIDER=ipinfo' >> .env # Leave IPINFO_TOKEN empty docker compose up -d sleep 15 INTERNAL_TOKEN=$(grep ^INTERNAL_JOB_TOKEN= .env | cut -d= -f2) test "$(curl -s -o /dev/null -w '%{http_code}' \ -H "Authorization: Bearer $INTERNAL_TOKEN" \ -X POST http://localhost:8081/internal/jobs/refresh-geoip)" = "412" curl -s http://localhost:8081/healthz | grep -q '"provider":"ipinfo"' docker compose down -v ``` ## Handoff 1. Commit: ``` feat(M11): MMDB enrichment with DB-IP / MaxMind / IPinfo providers - EnrichmentService backed by MaxMind\Db\Reader (open MMDB format) - GeoIpDownloader abstraction; DB-IP default, MaxMind & IPinfo opt-in - enrich-pending job (replaces M05 skeleton): 200 per tick, no-ops cleanly without DBs - refresh-geoip job: provider-aware download + verify + atomic replace - 412 only when an opt-in provider's credential is unset - IP detail UI shows country flag + ASN with provider attribution (graceful when null) - /healthz reports provider, configured state, DB presence + mtimes - country/asn filters on IPs list now functional; /admin/ips/countries dropdown source ``` 2. Append to `PROGRESS.md`: ```markdown ## M11 — Enrichment (done) **Built:** MMDB wrapper, three pluggable downloaders (DB-IP / MaxMind / IPinfo), both jobs, UI display + attribution, healthz fields, country dropdown source. **Notes for next milestone:** - DBs live at /data/geoip/{country,asn}.mmdb (renamed from SPEC §9 defaults to be provider-agnostic; see "Deviations" below). - Default provider is DB-IP — no credential required, never returns 412. - MaxMind and IPinfo paths return 412 when their credential is empty. - License key / IPinfo token never logged. - Re-enrichment is opt-in via ?reenrich=true on refresh-geoip. - DB-IP and IPinfo: no upstream integrity file; verification is gzip-decode (DB-IP only) + MMDB metadata + node-count sanity. MaxMind keeps SHA-256. - Attribution rendered in UI for DB-IP and IPinfo per their license terms. **Deviations from SPEC:** - SPEC §9 named GEOIP_COUNTRY_DB=/data/geoip/GeoLite2-Country.mmdb. Renamed to /data/geoip/country.mmdb so the path is provider-agnostic. Documented in .env.example. - SPEC §2 names MaxMind GeoLite2 specifically; we keep MaxMind as a first-class provider but default to DB-IP (also MMDB) for friction-free self-hosting. **Added dependencies:** geoip2/geoip2 (mentioned in SPEC §2 as the planned library; we use its underlying MaxMind\Db\Reader for cross-provider support). **Added env vars:** GEOIP_PROVIDER (default `dbip`; values `dbip|maxmind|ipinfo`), IPINFO_TOKEN (used only when provider=ipinfo). MAXMIND_LICENSE_KEY was already in .env.example. ``` 3. **Stop.** Do not start M12.