create DB service to record messages #4

New Issue

jfig · 2025-05-05T21:03:19Z

jfig commented

2025-05-05 21:03:19 +00:00

## High‑level plan

Introduce a lightweight Logging Service
New container, small FastAPI app exposing POST /api/v1/log to receive the JSON payload and write it to Postgres.
Why? Decouples DB latency from the Matrix bot; we reuse httpx already.

Refine schema & future‑proofing

CREATE TABLE messages (
    event_id   TEXT PRIMARY KEY,
    room_id    TEXT NOT NULL,
    user_id    TEXT NOT NULL,
    ts_ms      BIGINT NOT NULL,
    body       TEXT NOT NULL,
    received_at TIMESTAMPTZ DEFAULT now()  -- ingestion time
);

Separate tables later (e.g. bot_replies, edits, reactions) without touching this ingest path.

Flow

matrix_service.on_message
⮕ post JSON to Logging Service (non‑blocking, fire_and_forget=True via asyncio.create_task).
⮕ continue current AI delegation logic.
Fault handling
- If Logging Service is down: warn once, skip logging until healthy (circuit‑breaker).
- Use event_id as PK ⇒ logging is naturally idempotent.
Observability
- Add structured logging (logfmt), health endpoint /healthz in both services.
- Dashboards can later query Postgres or scrape metrics.
Migrations
- Adopt Alembic from day 1, even with a single revision.
Security
- Shared secret (LOG_TOKEN) in header to write endpoints.
- Postgres credentials in .env only; no credentials baked in images.
Testing
- Unit: serialize fake event, assert 200 from Logging Service.
- Integration (docker‑compose): run all services, send message, query Postgres with psql -c "SELECT count(*) …" in CI.

## 5. Implementation proposal (concrete steps & deliverables)

#	Deliverable	File / change
1	Add Postgres container	`docker-compose.yml` → new `postgres` service with volume, env‑vars.
2	Logging Service	`logging_service/` • `main.py` (FastAPI, asyncpg) • `Dockerfile` • `requirements.txt`
3	Matrix Service patch	`matrix_service/main.py` • Add `LOG_HANDLER_URL` & `LOG_TOKEN` env • `async def log_event(payload):` (httpx post, timeout 2 s, `raise_for_status` off) • In `on_message()`: `asyncio.create_task(log_event(payload))`
4	Schema & migrations	`migrations/` with Alembic rev `001_create_messages_table.py`
5	Environment samples	`.env.example` updated with PG creds, `LOG_TOKEN`, etc.
6	CI tests	Add `tests/test_logging.py` (pytest‑asyncio) & GitHub/Gitea Actions workflow.
7	Docs	`README.md` section “Conversation Logging” with setup instructions and query examples.

Estimated effort

Task	Role	h
Postgres & schema	DevOps	2
Logging Service code	Backend	4
Matrix patch	Backend	1
Tests & CI	QA/Dev	3
Docs & review	Tech writer	1
Total		11 h

The proposal keeps today’s two‑microservice pattern, adds one focused microservice, and leaves both latency‑sensitive (Matrix) and compute‑heavy (AI) paths untouched. It fulfils the immediate goal—store timestamp • room • user • message—while creating a clean lane for future enrichment (e.g., Thread context building, analytics, or GDPR exports).

## High‑level plan 1. **Introduce a lightweight *Logging Service*** *New container, small FastAPI app* exposing `POST /api/v1/log` to receive the JSON payload and write it to Postgres. **Why?** Decouples DB latency from the Matrix bot; we reuse httpx already. 2. **Refine schema & future‑proofing** ```sql CREATE TABLE messages ( event_id TEXT PRIMARY KEY, room_id TEXT NOT NULL, user_id TEXT NOT NULL, ts_ms BIGINT NOT NULL, body TEXT NOT NULL, received_at TIMESTAMPTZ DEFAULT now() -- ingestion time ); ``` Separate tables later (e.g. `bot_replies`, `edits`, `reactions`) without touching this ingest path. 3. **Flow** `matrix_service.on_message` ⮕ post JSON to **Logging Service** (non‑blocking, `fire_and_forget=True` via `asyncio.create_task`). ⮕ continue current AI delegation logic. 4. **Fault handling** * If Logging Service is down: warn once, skip logging until healthy (circuit‑breaker). * Use `event_id` as PK ⇒ logging is naturally idempotent. 5. **Observability** * Add structured logging (`logfmt`), health endpoint `/healthz` in both services. * Dashboards can later query Postgres or scrape metrics. 6. **Migrations** * Adopt Alembic from day 1, even with a single revision. 7. **Security** * Shared secret (`LOG_TOKEN`) in header to write endpoints. * Postgres credentials in `.env` only; no credentials baked in images. 8. **Testing** * Unit: serialize fake event, assert 200 from Logging Service. * Integration (docker‑compose): run all services, send message, query Postgres with `psql -c "SELECT count(*) …"` in CI. --- \## 5. Implementation proposal (concrete steps & deliverables) | # | Deliverable | File / change | | ----- | -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | **1** | **Add Postgres** container | `docker-compose.yml` → new `postgres` service with volume, env‑vars. | | **2** | **Logging Service** | `logging_service/` • `main.py` (FastAPI, asyncpg) • `Dockerfile` • `requirements.txt` | | **3** | **Matrix Service patch** | `matrix_service/main.py` • Add `LOG_HANDLER_URL` & `LOG_TOKEN` env • `async def log_event(payload):` (httpx post, timeout 2 s, `raise_for_status` off) • In `on_message()`: `asyncio.create_task(log_event(payload))` | | **4** | **Schema & migrations** | `migrations/` with Alembic rev `001_create_messages_table.py` | | **5** | **Environment samples** | `.env.example` updated with PG creds, `LOG_TOKEN`, etc. | | **6** | **CI tests** | Add `tests/test_logging.py` (pytest‑asyncio) & GitHub/Gitea Actions workflow. | | **7** | **Docs** | `README.md` section *“Conversation Logging”* with setup instructions and query examples. | **Estimated effort** | Task | Role | h | | -------------------- | ----------- | -------- | | Postgres & schema | DevOps | 2 | | Logging Service code | Backend | 4 | | Matrix patch | Backend | 1 | | Tests & CI | QA/Dev | 3 | | Docs & review | Tech writer | 1 | | **Total** | | **11 h** | The proposal keeps today’s two‑microservice pattern, adds one focused microservice, and leaves both latency‑sensitive (Matrix) and compute‑heavy (AI) paths untouched. It fulfils the immediate goal—store `timestamp • room • user • message`—while creating a clean lane for future enrichment (e.g., Thread context building, analytics, or GDPR exports).

jfig added this to the Message recording in DB project 2025-06-24 16:27:21 +00:00

jfig added a new dependency 2025-07-12 18:15:19 +00:00

jfig/IssueTraker#2 - second issue

jfig removed a dependency 2025-07-12 18:16:31 +00:00

jfig/IssueTraker#2 - second issue

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: jfig/botbot#4