create DB service to record messages #4

Open
opened 2025-05-05 21:03:19 +00:00 by jfig · 0 comments
Owner

## High‑level plan

  1. Introduce a lightweight Logging Service
    New container, small FastAPI app exposing POST /api/v1/log to receive the JSON payload and write it to Postgres.
    Why? Decouples DB latency from the Matrix bot; we reuse httpx already.

  2. Refine schema & future‑proofing

    CREATE TABLE messages (
        event_id   TEXT PRIMARY KEY,
        room_id    TEXT NOT NULL,
        user_id    TEXT NOT NULL,
        ts_ms      BIGINT NOT NULL,
        body       TEXT NOT NULL,
        received_at TIMESTAMPTZ DEFAULT now()  -- ingestion time
    );
    

    Separate tables later (e.g. bot_replies, edits, reactions) without touching this ingest path.

  3. Flow

    matrix_service.on_message
    ⮕ post JSON to Logging Service (non‑blocking, fire_and_forget=True via asyncio.create_task).
    ⮕ continue current AI delegation logic.

  4. Fault handling

    • If Logging Service is down: warn once, skip logging until healthy (circuit‑breaker).
    • Use event_id as PK ⇒ logging is naturally idempotent.
  5. Observability

    • Add structured logging (logfmt), health endpoint /healthz in both services.
    • Dashboards can later query Postgres or scrape metrics.
  6. Migrations

    • Adopt Alembic from day 1, even with a single revision.
  7. Security

    • Shared secret (LOG_TOKEN) in header to write endpoints.
    • Postgres credentials in .env only; no credentials baked in images.
  8. Testing

    • Unit: serialize fake event, assert 200 from Logging Service.
    • Integration (docker‑compose): run all services, send message, query Postgres with psql -c "SELECT count(*) …" in CI.

## 5. Implementation proposal (concrete steps & deliverables)

# Deliverable File / change
1 Add Postgres container docker-compose.yml → new postgres service with volume, env‑vars.
2 Logging Service logging_service/
• main.py (FastAPI, asyncpg)
• Dockerfile
• requirements.txt
3 Matrix Service patch matrix_service/main.py
• Add LOG_HANDLER_URL & LOG_TOKEN env
• async def log_event(payload): (httpx post, timeout 2 s, raise_for_status off)
• In on_message(): asyncio.create_task(log_event(payload))
4 Schema & migrations migrations/ with Alembic rev 001_create_messages_table.py
5 Environment samples .env.example updated with PG creds, LOG_TOKEN, etc.
6 CI tests Add tests/test_logging.py (pytest‑asyncio) & GitHub/Gitea Actions workflow.
7 Docs README.md section “Conversation Logging” with setup instructions and query examples.

Estimated effort

Task Role h
Postgres & schema DevOps 2
Logging Service code Backend 4
Matrix patch Backend 1
Tests & CI QA/Dev 3
Docs & review Tech writer 1
Total 11 h

The proposal keeps today’s two‑microservice pattern, adds one focused microservice, and leaves both latency‑sensitive (Matrix) and compute‑heavy (AI) paths untouched. It fulfils the immediate goal—store timestamp • room • user • message—while creating a clean lane for future enrichment (e.g., Thread context building, analytics, or GDPR exports).

## High‑level plan 1. **Introduce a lightweight *Logging Service*** *New container, small FastAPI app* exposing `POST /api/v1/log` to receive the JSON payload and write it to Postgres. **Why?** Decouples DB latency from the Matrix bot; we reuse httpx already. 2. **Refine schema & future‑proofing** ```sql CREATE TABLE messages ( event_id TEXT PRIMARY KEY, room_id TEXT NOT NULL, user_id TEXT NOT NULL, ts_ms BIGINT NOT NULL, body TEXT NOT NULL, received_at TIMESTAMPTZ DEFAULT now() -- ingestion time ); ``` Separate tables later (e.g. `bot_replies`, `edits`, `reactions`) without touching this ingest path. 3. **Flow** `matrix_service.on_message` ⮕ post JSON to **Logging Service** (non‑blocking, `fire_and_forget=True` via `asyncio.create_task`). ⮕ continue current AI delegation logic. 4. **Fault handling** * If Logging Service is down: warn once, skip logging until healthy (circuit‑breaker). * Use `event_id` as PK ⇒ logging is naturally idempotent. 5. **Observability** * Add structured logging (`logfmt`), health endpoint `/healthz` in both services. * Dashboards can later query Postgres or scrape metrics. 6. **Migrations** * Adopt Alembic from day 1, even with a single revision. 7. **Security** * Shared secret (`LOG_TOKEN`) in header to write endpoints. * Postgres credentials in `.env` only; no credentials baked in images. 8. **Testing** * Unit: serialize fake event, assert 200 from Logging Service. * Integration (docker‑compose): run all services, send message, query Postgres with `psql -c "SELECT count(*) …"` in CI. --- \## 5. Implementation proposal (concrete steps & deliverables) | # | Deliverable | File / change | | ----- | -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | **1** | **Add Postgres** container | `docker-compose.yml` → new `postgres` service with volume, env‑vars. | | **2** | **Logging Service** | `logging_service/`<br>• `main.py` (FastAPI, asyncpg)<br>• `Dockerfile`<br>• `requirements.txt` | | **3** | **Matrix Service patch** | `matrix_service/main.py`<br>• Add `LOG_HANDLER_URL` & `LOG_TOKEN` env<br>• `async def log_event(payload):` (httpx post, timeout 2 s, `raise_for_status` off)<br>• In `on_message()`: `asyncio.create_task(log_event(payload))` | | **4** | **Schema & migrations** | `migrations/` with Alembic rev `001_create_messages_table.py` | | **5** | **Environment samples** | `.env.example` updated with PG creds, `LOG_TOKEN`, etc. | | **6** | **CI tests** | Add `tests/test_logging.py` (pytest‑asyncio) & GitHub/Gitea Actions workflow. | | **7** | **Docs** | `README.md` section *“Conversation Logging”* with setup instructions and query examples. | **Estimated effort** | Task | Role | h | | -------------------- | ----------- | -------- | | Postgres & schema | DevOps | 2 | | Logging Service code | Backend | 4 | | Matrix patch | Backend | 1 | | Tests & CI | QA/Dev | 3 | | Docs & review | Tech writer | 1 | | **Total** | | **11 h** | The proposal keeps today’s two‑microservice pattern, adds one focused microservice, and leaves both latency‑sensitive (Matrix) and compute‑heavy (AI) paths untouched. It fulfils the immediate goal—store `timestamp • room • user • message`—while creating a clean lane for future enrichment (e.g., Thread context building, analytics, or GDPR exports).
jfig added this to the Message recording in DB project 2025-06-24 16:27:21 +00:00
jfig added a new dependency 2025-07-12 18:15:19 +00:00
jfig removed a dependency 2025-07-12 18:16:31 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: jfig/botbot#4
No description provided.