How AiDrift Sees Your Session
A short, readable primer on the model behind the drift score. The last section names the math for readers who want it.
The question behind every drift alert
When an AI coding session goes wrong, it's rarely because a single turn was bad. It's because the conversation slowly walked away from what you asked for. Files you never mentioned start getting edited. Tools you didn't need start getting called. The agent is still helpful, still competent — just not about your problem anymore.
A scalar "drift score" can tell you this is happening. It can't tell you where it went or what it's working on now. That second question is the interesting one.
The idea: a session has a shape
Every AI coding session has two things it's about:
- What you asked for — your first prompt, the files you referenced, the terms you used.
- What the agent is actually doing — the files it's touching, the tools it's running, the concepts it keeps returning to.
Write those two things down and you get a picture. Some items sit near your original ask. Some sit far away. Some are clustered tightly together (the agent is in the zone). Some are scattered (the agent is exploring — or lost). That picture is what we call a session map.
A map gives you a language the scalar score never could:
- "The session has moved" — specific things the agent is working on now, and how far they are from your anchor.
- "The session has a focus" — the handful of items it keeps returning to.
- "Two sub-agents are doing the same work" — their maps overlap.
Four things the session map gives you
1. An intent anchor
The session is anchored to what you originally asked for. We pull keywords and file references from your first prompt and plant them as a fixed point. Everything else is measured against this anchor.
2. Focal points
A session has a few items it revolves around. The file that keeps getting read. The function that keeps getting edited. The concept that keeps being mentioned. These focal points are the short answer to "what is this session actually about, right now." They're the rows you'd want in a one-line summary.
3. Scope distance
How far has the agent walked from your anchor? We measure the shortest
path through the map from your intent anchor to where the agent is
currently working. A small distance means the agent is doing what you
asked. A growing distance is drift — but unlike a scalar score, it
comes with a reason.
"Agent moved from billing/* to auth/* over
the last 14 turns" is a drift alert you can act on.
4. Overlap regions
When a session spawns sub-agents, each child builds its own sub-map. When two children's maps overlap — same files, same concepts, same cluster — they're doing redundant work. Catching this early saves compute and cleanup. It's the single drift pattern that's almost impossible to spot from a scalar score alone.
Evidence tiers: not all signals are equal
A drift signal that comes from the agent actually editing a file is not the same as a signal that comes from a keyword showing up in a user turn. We label every item and connection in the map with an evidence tier:
- Observed — we saw it happen. The agent read this file. The user said this word. Confidence 1.0.
- Inferred — we reasoned about it. Two files are probably related because they share a symbol. A concept probably matches your intent because of keyword overlap. Confidence below 1.
- Uncertain — flagged for review. We don't trust it enough to act on it, but we don't throw it away.
Every alert the dashboard shows you carries its tier. You can filter to Observed-only if you want to be strict.
Why this is honest
The session map is not a prediction model. It's a transparent accounting of things that happened and things you said. We don't invoke an LLM to decide what the session is about — we extract it from the logs you already have. We don't guess at connections — we label what we're sure about and clearly mark what we inferred.
If a drift alert is wrong, you can open the map, see the exact chain of observed events that produced it, and understand why. That's the bar. Nothing magical, nothing unverifiable.
Appendix: the math, named
For readers who want to know what's under the hood.
- The session map is a directed, weighted multigraph. Nodes are items (files, symbols, tools, concepts, turns). Edges are events (a tool call, a reference, a mention). Edge weights decay with recency — the agent's latest hour matters more than its first.
- Intent anchor is a seed node tied to the first user turn's extracted keywords and file references.
- Focal points are computed by ranking nodes via degree centrality for a fast read and PageRank for a stable one. Top-k by PageRank surface in the UI.
- Scope distance is the weighted mean geodesic distance (shortest-path length through the graph) from the intent anchor to the top-k currently-active nodes. Edge weights act as inverse distances so stronger recent activity pulls the distance down.
- Focal shift is the normalized set distance between early-window focal points and late-window focal points. It catches the case where the agent has quietly moved onto a different part of the codebase.
- Scope clusters come from Leiden community detection (a refinement of the Louvain algorithm). Leiden is deterministic enough at a fixed resolution to stay stable across small session deltas, which matters for a live dashboard.
- Overlap regions between sub-agent maps are computed as the weighted Jaccard similarity over cluster membership — intersecting clusters weighted by the edge mass they contain.
- Evidence tiers are a three-level confidence label applied at extraction time, inspired by the same pattern used in industrial knowledge-base tooling: directly observed events get 1.0, reasoned inferences get a lower confidence, and anything we can't ground gets quarantined.
All of this runs locally on the logs AiDrift already watches. No paid API calls. Tree-sitter for code structure, keyword extraction for user concepts, standard graph algorithms for everything else.
Further reading
- Leiden community detection — V.A. Traag, L. Waltman, N.J. van Eck (2019)
- PageRank — L. Page, S. Brin (1999)
- The general practice of evidence tiering for knowledge-base edges is well-established in industrial knowledge-graph tooling; it's the same discipline applied here to drift evidence.