← Back to the fleet FTE-05 · AI Security Guard

Vision Alert

Turn the NVR you already own into a 24/7 AI guard. It reads your existing camera feeds, tracks people and vehicles, watches the zones you draw, and fires a WhatsApp snapshot the moment a rule breaks — no rip-and-replace.

MVP · live on a real NVR Computer vision YOLO11 + ByteTrack AI-native 3/5 FTE-fit 4/5
Ingest
RTSPExisting Dahua NVR
Proven
~12 FPSLive, read-only
Zone rules
3Intrusion · loiter · line-cross
Tests
24Passing
What runs today vs. the roadmap

Running today

Read and confirmed in source · pytest → 24 passed.
  • Live RTSP ingest, NVR-proven — pulls the existing Dahua sub-stream read-only with auto-reconnect; verified against a real recorder at ~12 FPS.
  • YOLO11 detection + object tracking — class-filtered to people and vehicles, with a from-scratch ByteTrack tracker giving stable track IDs.
  • Zone rules engine — intrusion (point-in-polygon), loiter (dwell ≥ N s) and line-cross (directed), each with per-track cooldown to kill repeat spam.
  • Browser dashboard — channel selector, live annotated MJPEG view, a react-konva zone/line editor, and a live alert feed over SSE.
  • Alert store + WhatsApp snapshots — alerts and snapshots persisted; a rule break sends an annotated image via the WhatsApp bridge in seconds.

On the roadmap

Add the judgment half of a guard's job.
  • VLM reasoning agent — describe the incident in plain language ("a person climbing the fence near Gate 2"), classify intent and severity, and suppress false positives (couriers, staff, shadows).
  • Contextual escalation — silent log for benign, written WhatsApp summary for real, phone/siren for critical.
  • Production GPU serving + scale — prove the Triton/TensorRT + hardware-decode path and batch inference across 9–32 channels (today: 1–2).
  • Hardening — durable event store, evidence-clip capture, auth on the dashboard, plus PPE / fire-and-smoke / loss-prevention variants.

Honesty note: today's "AI" is classical CV perception — it detects, it doesn't yet understand. An alert is a row, not a narrated incident. The reasoning agent is the upgrade.

The role it replaces

A monitoring-room seat.

The guard watching a wall of feeds for the one moment that matters — replaced by an agent that never blinks, never gets bored, and runs on a single GPU next to the recorder you already bought.

~1guard seat / site · 8–24 hrs/day
The 60-second showcase
veloce://fte-05 — live stairwell feed
01
On the real NVR feed, drag a polygon across a doorway in the browser zone editor.
02
Walk through it. A green track box follows you and the intrusion rule fires.
03
An annotated snapshot lands on WhatsApp in seconds — and on the live alert feed.

Vision Alert is a real, working perception pipeline with an unusually high-value upgrade path. The plumbing a security FTE needs — 24/7 feeds, stable tracking, alerts, snapshot notifications — is already pointed at exactly the right seat. Bolt a VLM reasoning agent onto the crops it already produces, and "it detects" becomes "it understands, decides, and tells you what happened."