Jelajahi Sumber

docs: add update workflow to README and an admin manual

README gains a short "Updating" section with the standard
git-pull / compose-up-build loop, plus the common pitfalls
(don't `restart`, don't `down -v`).

doc/admin-manual.md is the longer companion: deployment
lifecycle, image/container management, volume operations
(including the uid-1000 chown fix for pre-F18 volumes),
troubleshooting recipes, rollback caveats, scheduler ops,
disk hygiene, and a security-update workflow.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
chiappa 4 hari lalu
induk
melakukan
5a26a19be6
2 mengubah file dengan 596 tambahan dan 0 penghapusan
  1. 27 0
      README.md
  2. 569 0
      doc/admin-manual.md

+ 27 - 0
README.md

@@ -191,6 +191,33 @@ config in `api/docker/Caddyfile`); external requests get `404`.
 
 ---
 
+## Updating
+
+Pull new code and rebuild — Docker's layer cache makes the rebuild fast
+when only app code changes:
+
+```bash
+git pull
+docker compose -f docker-compose.yml -f compose.scheduler.yml up --build -d
+docker compose logs -f          # Ctrl-C once migrate exits 0 and api/ui are healthy
+```
+
+`up --build -d` rebuilds local images and recreates only the containers
+whose image hash or config changed. The `migrate` container reruns Phinx
+automatically; new migrations in `db/migrations/` are picked up in order.
+The `irdb-data` volume persists, so SQLite state, GeoIP MMDBs, and the
+audit log carry forward across updates.
+
+Don't use `docker compose restart` to pick up new code — that just bounces
+the existing containers. Don't use `docker compose down -v` either — that
+deletes the volume.
+
+Edge cases (failed migrations, force-rebuild, rollback, fixing volume
+ownership after a uid change, disk cleanup, scheduler ops) are covered in
+the [admin manual](./doc/admin-manual.md).
+
+---
+
 ## Backups
 
 The api's persistent state lives in one of two places.

+ 569 - 0
doc/admin-manual.md

@@ -0,0 +1,569 @@
+# IRDB Admin Manual
+
+> Audience: operators running the IRDB Compose stack on a host they
+> own. Covers the day-to-day deployment lifecycle — updating, rebuilding,
+> rolling back, troubleshooting, and disk hygiene. For the in-app admin
+> workflows (creating reporters, tuning policies, reading the audit
+> log) see [`user-manual.md`](./user-manual.md). For architecture see
+> [`architecture.md`](./architecture.md).
+
+This document picks up where the [`README.md` Quickstart](../README.md#quickstart-5-minutes)
+leaves off: the stack is running, you have admin access, now you need
+to keep it running across upgrades, recover from breakage, and reason
+about the moving parts.
+
+---
+
+## 1. Update workflow
+
+### 1.1 The standard loop
+
+```bash
+git pull
+docker compose -f docker-compose.yml -f compose.scheduler.yml up --build -d
+docker compose logs -f
+```
+
+That's the whole thing for the happy path. `Ctrl-C` out of the logs
+once you see `migrate` exit 0, `api` reach healthy, and `ui` reach
+healthy — `-d` keeps everything running.
+
+What each piece does:
+
+| Step                         | What it actually does                                                                                                                            |
+|------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------|
+| `git pull`                   | Updates the source tree on the host — Dockerfiles, compose file, app code, migrations, scheduler crontab.                                         |
+| `--build`                    | Rebuilds local images. Docker layer cache reuses everything that didn't change. App-code-only changes rebuild in seconds; `composer.lock` changes re-run `composer install`. |
+| `up -d`                      | Compares each service's desired image+config against what's running and recreates **only** the containers whose image hash or config changed.    |
+| `docker compose logs -f`     | Follows logs from all services. Useful while waiting for migrations to apply.                                                                    |
+
+The dependency chain in `docker-compose.yml` orders things correctly
+without any manual sequencing:
+
+1. `migrate` runs first (Phinx applies any new migrations, exits 0).
+2. `api` waits for `migrate` to complete successfully, then starts.
+3. `ui` waits for `api`'s `/healthz` to pass, then starts.
+4. `scheduler` (overlay) waits for `api` to be healthy.
+
+### 1.2 What you don't need to do
+
+These are common confusions — none of them apply to a normal update:
+
+- **`docker compose pull`** — that's for fetching images from a registry. This stack builds locally from `./api` and `./ui`, so a `pull` does nothing useful.
+- **`docker compose restart`** — kicks the existing container without changing the image. **Does not pick up new code.** Don't use this for updates.
+- **`docker compose down` then `up`** — works, but unnecessary churn. `up --build -d` does selective recreation in one step.
+- **`docker compose down -v`** — **destroys** the `irdb-data` volume and your SQLite database with it. Don't run this casually.
+- **`sudo docker ...`** — on Linux, your user should be in the `docker` group; on macOS / Docker Desktop, sudo isn't needed at all.
+
+### 1.3 Verifying the new code is live
+
+```bash
+docker compose ps                   # all should show "Up (healthy)" except migrate ("Exited (0)")
+docker compose images               # CREATED column should show your fresh build timestamp
+curl -s http://localhost:8081/healthz | jq    # api responds, db reachable, jobs fresh
+curl -s http://localhost:8080/healthz         # ui can reach api
+```
+
+If you want to spot-check a specific code change:
+
+```bash
+docker compose exec api cat /app/src/some/changed/File.php | head -20
+```
+
+(You can't `git rev-parse HEAD` inside the container — `.dockerignore`
+excludes the `.git` directory from the build context.)
+
+### 1.4 Rebuild scope: the whole stack vs one service
+
+For most updates, rebuild the whole stack — it's only marginally
+slower because of layer caching, and it avoids "half-updated" states.
+
+If you really only want one service, e.g., the UI:
+
+```bash
+docker compose up --build -d ui
+```
+
+But notice: if your change touched `api` (which `ui` depends on), `ui`
+won't see the new api until `api` is also recreated. Whole-stack
+rebuild dodges this.
+
+### 1.5 Force rebuild (ignore cache)
+
+Rare — usually only when debugging a build that produced obviously
+stale output despite a code change:
+
+```bash
+docker compose build --no-cache
+docker compose up -d
+```
+
+This re-runs every Dockerfile step from scratch. On the api image
+that means re-fetching Alpine packages and re-running
+`composer install`; expect a few minutes.
+
+### 1.6 Pulling with the scheduler overlay
+
+If you deploy with the scheduler sidecar (the `compose.scheduler.yml`
+overlay), include it in **every** compose command, otherwise the
+scheduler container ends up orphaned:
+
+```bash
+docker compose -f docker-compose.yml -f compose.scheduler.yml up --build -d
+docker compose -f docker-compose.yml -f compose.scheduler.yml logs -f scheduler
+docker compose -f docker-compose.yml -f compose.scheduler.yml down
+```
+
+Set `COMPOSE_FILE=docker-compose.yml:compose.scheduler.yml` in your
+shell to avoid retyping the `-f` flags:
+
+```bash
+export COMPOSE_FILE=docker-compose.yml:compose.scheduler.yml
+docker compose up --build -d
+```
+
+---
+
+## 2. Image and container lifecycle
+
+### 2.1 Inspecting state
+
+```bash
+docker compose ps              # services in this project
+docker compose images          # which image is each service running
+docker compose top             # processes inside each container
+docker compose port api 8081   # host port mapping
+docker compose config          # effective merged compose file
+```
+
+### 2.2 Rebuilding a single service
+
+```bash
+docker compose build api       # rebuild api image without recreating containers
+docker compose up -d api       # then recreate the api container
+```
+
+Equivalent shortcut:
+
+```bash
+docker compose up --build -d api
+```
+
+### 2.3 Recreating without rebuilding (rare)
+
+When config changed but you don't want a rebuild — e.g., you only
+edited `.env`:
+
+```bash
+docker compose up -d           # detects env diff, recreates affected containers
+```
+
+### 2.4 Stopping vs killing
+
+```bash
+docker compose stop            # graceful: SIGTERM, then SIGKILL after grace period
+docker compose kill            # immediate SIGKILL — only when stop hangs
+docker compose start           # bring stopped containers back up (no recreate, no rebuild)
+```
+
+`stop` then `start` keeps the same container instance and writable
+layer. `up -d` on a stopped project recreates if the image or config
+changed since stop.
+
+---
+
+## 3. Volume management
+
+### 3.1 Volumes you should know about
+
+| Volume name        | Mounted at                  | Contents                                                           |
+|--------------------|-----------------------------|--------------------------------------------------------------------|
+| `irdb_irdb-data`   | `/data` in `migrate` + `api` | SQLite db (`irdb.sqlite`), GeoIP MMDBs (`geoip/*.mmdb`), backups.  |
+| `irdb_mysql-data`  | `/var/lib/mysql` in `mysql` | (Optional) MySQL data files. Only when `DB_DRIVER=mysql`.          |
+
+The volume name prefix (`irdb_`) is the Compose project name — it
+defaults to the directory name, so if you cloned into `irdb/` you get
+`irdb_irdb-data`. If you cloned somewhere else, run `docker volume
+ls | grep -i irdb-data` to find it.
+
+### 3.2 Inspecting a volume's contents
+
+You can't `cd` into the volume from the host on macOS (it lives inside
+the Docker Desktop VM) and on Linux it requires root. Use a one-shot
+container instead:
+
+```bash
+docker run --rm -v irdb_irdb-data:/data alpine ls -la /data
+docker run --rm -v irdb_irdb-data:/data alpine du -sh /data /data/geoip
+```
+
+### 3.3 Fixing ownership after a uid change
+
+The `api` and `migrate` containers run as **uid 1000** (the `app`
+user). If a volume was created when the containers ran as root —
+e.g., from a pre-`SEC_REVIEW F18` deployment — the files inside are
+root-owned and the new uid 1000 process cannot write to them. The
+symptom is:
+
+```
+PDOException: SQLSTATE[HY000]: General error: 8 attempt to write
+a readonly database
+```
+
+Fix without losing data:
+
+```bash
+docker compose down                                                  # stop, keep volume
+docker run --rm -u 0 -v irdb_irdb-data:/data alpine \
+    chown -R 1000:1000 /data
+docker compose up --build -d
+```
+
+Verify:
+
+```bash
+docker run --rm -v irdb_irdb-data:/data alpine stat -c '%u' /data    # should print 1000
+```
+
+If the volume's data is disposable (dev / fresh environment), the
+nuclear option is faster:
+
+```bash
+docker compose down -v && docker compose up --build -d
+```
+
+### 3.4 Backing up the volume (whole-tarball)
+
+The SQLite-API-aware backup is documented in
+[`README.md` § Backups](../README.md#backups). For a whole-volume
+tarball — useful before any risky volume operation:
+
+```bash
+docker compose stop api
+docker run --rm -v irdb_irdb-data:/data -v "$(pwd):/backup" alpine \
+    tar czf /backup/irdb-volume-$(date +%F).tar.gz -C /data .
+docker compose start api
+```
+
+Restore:
+
+```bash
+docker compose down
+docker run --rm -v irdb_irdb-data:/data -v "$(pwd):/backup" alpine \
+    sh -c 'rm -rf /data/* /data/.[!.]* && tar xzf /backup/irdb-volume-2026-05-01.tar.gz -C /data'
+docker compose up -d
+```
+
+### 3.5 Removing a volume explicitly
+
+```bash
+docker compose down                       # stop containers (don't use -v)
+docker volume rm irdb_irdb-data           # then remove the volume by name
+```
+
+`docker volume rm` refuses to remove a volume that's still attached
+to a container — that's a safety feature, don't fight it. Stop the
+relevant container first.
+
+---
+
+## 4. Troubleshooting
+
+### 4.1 "attempt to write a readonly database"
+
+Cause: the SQLite file or its directory isn't writable by uid 1000.
+Most commonly a pre-F18 volume — see § 3.3 for the chown fix.
+
+Other causes worth ruling out:
+
+- The host filesystem the volume lives on is full → `df -h` (Linux) or `docker system df` (any).
+- The host filesystem is read-only (e.g., emergency-mounted /) → `mount | grep ' / '`.
+- A previous migrate run left a `.journal` file owned by a different uid — the chown in § 3.3 fixes this too.
+
+### 4.2 Migration fails
+
+The `migrate` container exits non-zero, `api` doesn't start because
+the `service_completed_successfully` gate isn't satisfied. Diagnosis:
+
+```bash
+docker compose logs migrate
+```
+
+Common failures:
+
+- **Schema conflict** — a migration tries to add a column that already exists, usually because the migration was edited after being applied. Phinx tracks applied migrations in `phinxlog`. Rolling back manually:
+
+  ```bash
+  docker compose run --rm migrate vendor/bin/phinx rollback \
+      --configuration=config/phinx.php --target=<previous_version>
+  ```
+
+- **Permission error** — see § 3.3.
+- **Constraint violation** during data migration — fix the data or the migration; rerun.
+
+After fixing, just `docker compose up --build -d` again. Phinx
+resumes from the last successful version (it's idempotent on already-
+applied migrations).
+
+### 4.3 Healthcheck never passes
+
+`api` shows `Up (unhealthy)` or `ui` won't start because `api` isn't
+healthy. Diagnose:
+
+```bash
+docker compose exec api wget -qO- http://localhost:8081/healthz
+docker compose logs api --tail 100
+```
+
+Likely culprits:
+
+- Missing or invalid env var (api validates required env on boot — look for the explicit error near the top of the log).
+- `migrate` did not actually finish before `api` started — happens if you bypassed compose with manual `docker run`.
+- Database unreachable (MySQL mode only — check `mysql` is up + healthy first).
+
+### 4.4 Container won't start at all
+
+```bash
+docker compose ps -a            # see exit code
+docker compose logs <service>
+```
+
+Common:
+
+- **`.env` missing** — Compose validates `env_file: .env` and refuses to start. `cp .env.example .env` and fill in secrets.
+- **Port conflict** — host port 8080 or 8081 in use by something else. Either stop the other process, or remap in `docker-compose.yml`.
+- **Image build failed** — read the build output above the start attempt.
+
+### 4.5 Migrate looks stuck
+
+Phinx output is buffered through PHP's stdout. If a single migration
+takes a while (large data backfill), it can look frozen. Check
+liveness:
+
+```bash
+docker compose top migrate
+```
+
+If you see a `php` or `phinx` process consuming CPU, it's working.
+If the process is gone but the container is still listed, check exit
+code with `docker compose ps -a`.
+
+### 4.6 Disk fills up
+
+Symptoms: builds fail with "no space left on device", or `api`
+crashes with SQLite "disk I/O error".
+
+```bash
+docker system df                 # what's using the docker disk
+docker image prune -f            # remove dangling images (safe)
+docker builder prune -f          # remove old build cache (safe)
+docker volume ls -f dangling=true   # volumes with no associated container
+```
+
+Don't run `docker system prune --volumes` casually — it removes
+unused volumes including ones you might still want. Use the targeted
+commands above.
+
+For the `irdb-data` volume specifically, the audit log is the largest
+growing table. The `cleanup-audit` job prunes it according to
+`JOB_AUDIT_RETENTION_DAYS` (default 180 days). If audit log has run
+away, lower the retention and run the job once-shot:
+
+```bash
+docker compose exec api curl -s -X POST http://localhost:8081/internal/jobs/cleanup-audit \
+    -H "Authorization: Bearer ${INTERNAL_JOB_TOKEN}"
+```
+
+---
+
+## 5. Rolling back
+
+If a deploy breaks things and you need to get back to the prior
+version:
+
+```bash
+git log --oneline -10                                  # find the previous good commit
+git checkout <prev-commit>
+docker compose -f docker-compose.yml -f compose.scheduler.yml up --build -d
+```
+
+**Caveat: migrations are not automatically reversed.** If the broken
+deploy added a migration, the schema is still at the new version
+when you check out the old code. Two paths:
+
+1. **Old code is forward-compatible with the new schema** (most
+   additive migrations — new columns, new tables — qualify). Just
+   redeploy old code; it ignores the new shape.
+2. **Old code can't run against the new schema** (column rename, type
+   change, dropped column). Roll the migration back too:
+
+   ```bash
+   docker compose run --rm migrate vendor/bin/phinx rollback \
+       --configuration=config/phinx.php --target=<previous_version>
+   ```
+
+   Then redeploy old code.
+
+If the broken deploy corrupted data, restore from a backup
+([`README.md` § Backups](../README.md#backups) — use the SQLite
+`.backup` file or the volume tarball, not a partial state).
+
+After investigating and fixing on `main`, return to the branch tip:
+
+```bash
+git checkout main
+git pull
+docker compose -f docker-compose.yml -f compose.scheduler.yml up --build -d
+```
+
+---
+
+## 6. Scheduler operations
+
+The scheduler sidecar (`compose.scheduler.yml` overlay) runs busybox
+`crond` and posts to `/internal/jobs/tick` once a minute. The endpoint
+is bound to RFC1918 + loopback only — see `api/docker/Caddyfile`.
+
+### 6.1 Verify the scheduler is firing
+
+```bash
+docker compose logs -f scheduler           # one POST per minute
+docker compose exec api curl -s http://localhost:8081/healthz | jq .jobs
+```
+
+The `jobs` block in `/healthz` shows the most-recent successful tick;
+if it's > 5 minutes stale and the scheduler is "running", investigate.
+
+### 6.2 Force a job run
+
+Each job has its own endpoint under `/internal/jobs/`:
+
+```bash
+docker compose exec api curl -s -X POST \
+    http://localhost:8081/internal/jobs/<job-name> \
+    -H "Authorization: Bearer ${INTERNAL_JOB_TOKEN}"
+```
+
+Available jobs: `recompute-scores`, `cleanup-audit`,
+`cleanup-expired-manual-blocks`, `enrich-pending`, `refresh-geoip`,
+`tick` (dispatcher). `GET /internal/jobs/status` returns the latest
+run record per job.
+
+Admins logged into the UI can also trigger a job from the Settings →
+Jobs screen, which posts to `/api/v1/admin/jobs/trigger/{name}` —
+that path uses an admin token, not `INTERNAL_JOB_TOKEN`.
+
+### 6.3 Switching scheduler styles
+
+Three options — pick exactly one to avoid double-firing:
+
+| Style          | When to use                                                              | How                                                                                          |
+|----------------|--------------------------------------------------------------------------|----------------------------------------------------------------------------------------------|
+| Sidecar        | Default. Self-contained, no host setup.                                  | Include `compose.scheduler.yml` overlay.                                                     |
+| Host cron      | You already manage cron centrally (e.g., Ansible).                       | Drop `examples/scheduler/host.crontab` into `/etc/cron.d/`. Don't include the overlay.        |
+| systemd timer  | Modern Linux without crond, you want timer accuracy + journal logging.   | Install `examples/scheduler/irdb-tick.{service,timer}` into `/etc/systemd/system`. Don't include the overlay. |
+
+If you migrate from sidecar to host cron / systemd: stop the sidecar
+(`docker compose stop scheduler && docker compose rm -f scheduler`,
+and stop including `compose.scheduler.yml` on subsequent `up`
+invocations), enable the host driver, and verify exactly one tick
+per minute lands by tailing api logs.
+
+---
+
+## 7. Multi-host and scaling notes
+
+The default deployment is single-host. A few caveats if you scale:
+
+- **SQLite mode is single-host only.** Move to MySQL (see [`README.md` § MySQL](../README.md#mysql-optional)) before adding a second api replica.
+- **The `migrate` container must run exactly once** per deploy, not once per replica. If you orchestrate manually, gate api startup on a single-shot migrate completion.
+- **The UI session store is local file-backed** (`/tmp` in the ui container). Multiple ui replicas need either sticky sessions at the load balancer, or a shared session store (Redis) — currently not configured out of the box.
+- **The scheduler must run exactly once globally**, not per host. If you run host cron on every node, you'll multi-fire jobs. Pick one node.
+
+---
+
+## 8. Security update workflow
+
+The base images (`dunglas/frankenphp:1-php8.3-alpine` for api/ui,
+`alpine:3.21@sha256:…` digest-pinned for the scheduler) are referenced
+in their Dockerfiles. To pull security fixes:
+
+```bash
+docker compose build --pull         # forces re-pull of base images
+docker compose up -d
+```
+
+`--pull` is the difference: without it, the build reuses the locally-
+cached base image even when the registry has a newer tag.
+
+For app-level security updates (Composer dependencies):
+
+```bash
+cd api
+composer update --no-dev            # updates composer.lock on the host
+cd ..
+docker compose up --build -d        # rebuild picks up the new lock file
+```
+
+Review the changelog of any updated package before deploying. The
+[`doc/SEC_REVIEW.md`](./SEC_REVIEW.md) document tracks security
+findings + their fixes per commit; `git log --grep SEC_REVIEW`
+surfaces the pattern.
+
+---
+
+## 9. Operational quick reference
+
+Daily operations:
+
+```bash
+docker compose ps                                # are all services up?
+docker compose logs -f api ui                    # tail app logs
+docker compose exec api sh                       # shell into api (read-only rootfs; /tmp is writable)
+```
+
+Deploys:
+
+```bash
+git pull && docker compose -f docker-compose.yml -f compose.scheduler.yml up --build -d
+```
+
+Backups (SQLite, online):
+
+```bash
+docker compose exec api sh -c \
+    'sqlite3 /data/irdb.sqlite ".backup /data/irdb-backup.sqlite"'
+docker compose cp api:/data/irdb-backup.sqlite ./irdb-backup-$(date +%F).sqlite
+```
+
+Health:
+
+```bash
+curl -s http://localhost:8081/healthz | jq
+curl -s http://localhost:8080/healthz
+```
+
+Disk:
+
+```bash
+docker system df
+docker image prune -f && docker builder prune -f
+```
+
+Volume rescue:
+
+```bash
+docker run --rm -u 0 -v irdb_irdb-data:/data alpine chown -R 1000:1000 /data
+```
+
+---
+
+## 10. See also
+
+- [`README.md`](../README.md) — quickstart, secrets, reverse proxy, MySQL/OIDC setup, backups.
+- [`user-manual.md`](./user-manual.md) — UI walkthrough, screen-by-screen.
+- [`architecture.md`](./architecture.md) — system design, container topology, where state lives.
+- [`security.md`](./security.md) — threat model, hardening choices.
+- [`SEC_REVIEW.md`](./SEC_REVIEW.md) — security findings log.
+- [`auth-flows.md`](./auth-flows.md) — local admin, OIDC, token kinds.
+- [`api-overview.md`](./api-overview.md) — REST surface, common conventions.