Calibration methodology — what 'calibrated to your data' actually means

What 'calibrated' means here, and what it does not mean elsewhere

"Calibrated" is one of those words that has been reused so often by so many vendors that the signal is nearly gone. A tool is "calibrated" when a marketing page wants to sound precise. A model is "calibrated" when an engineer wants to soften the word "trained." A dashboard is "calibrated" when the copywriter has run out of synonyms. None of those uses are what this page is about.

Calibration, in the sense this page uses it, is a specific technical arrangement between a model and one agency's data that is distinguishable from three adjacent arrangements. It is not a static model. A static model is trained on a snapshot of the world, shipped, and then left to drift as the world moves. It is not a shared model. A shared model sees every customer's data at training time and produces an output that is the arithmetic mean of 600 agencies' decisions — useless to any one of them. And it is not a fine-tuned model, at least not in the sense the term is usually used. A fine-tuned model is trained on your data once, at onboarding, and then frozen. The calibration it got on day one is the calibration it still has on day 400, which is to say it has decayed into a static model wearing a different label.

Calibration here means something narrower and harder: the model's weights update every cycle against your agency's decisions, outputs, and outcomes, and only your agency's. The cycle is weekly. The data is yours. The feedback loop is architected as a first-class input, not as a usage telemetry afterthought. When a team member marks up a Living Brief — the weekly strategic document the Digital Twin produces after each calibration cycle — the markup is read as training signal, not as a customer-support ticket. That is the mechanism. Everything else on this page is the detail of what makes it work.

A Digital Twin, in case you arrived here via the /approach section 5 link, is a calibrated intelligence layer trained on your agency's own data, decisions, and outcomes — not a generic AI, not a shared model. The name is intentionally narrow. The architecture is what the name describes, not a metaphor for it.

The three calibration inputs

A calibrated model is only as sharp as the three signals it reads. On a Digital Twin configured for a performance-marketing agency, the three inputs are the decisions your team makes, the data your accounts generate, and the outcomes your campaigns produce. Each one carries a specific kind of information that the other two cannot substitute for.

Your decisions. The briefs your senior strategist actually wrote, not the ones she meant to write. The creative directions Elena approved. The creative directions Elena killed, and the reason. The campaigns Marcus launched versus the campaigns Marcus paused. The Category Entry Points — the CEPs, the mental cues consumers use when they think of the category — that your team chose to lean into this quarter, and the ones it chose to skip. These are the scarcest input. Most agencies have never been asked to record them, which means most agencies do not know how many decisions their senior strategist makes in a week, or what the distribution of decisions looks like across accounts. The calibration cycle reads them explicitly. The first full week of onboarding includes a pass where we reconstruct the prior quarter's decision log from whatever artifacts the agency did keep — slack threads, brief revisions, email approvals — and seed it into the model. After that, the cycle captures decisions as they happen.

Your data. Account performance at the campaign and creative level. Creative fatigue curves. Audience segment behavior and drop-off. The Jobs To Be Done — the JTBD shape the brief committed to, so the model can evaluate whether the creative actually served that job or drifted. Media-mix and pacing data. Every Living Brief outcome — shipped, revised, killed. This is the easiest input to gather and the most frequently misused. A shared-pool platform reads the same data and averages it against 600 other agencies. The calibration here reads it against your decisions and your outcomes only, which is what makes it pattern-match your taste instead of the market's.

Your outcomes. Did the hypothesis hold? Did the creative the twin ranked first actually outperform? Which angle worked on which CEP, and which one died? What did the Get-To-By — the GET/TO/BY statement your strategist wrote — get right about the audience, and what did it miss? Outcomes are the feedback signal. They are what lets the calibration cycle tighten next week's output. Without outcomes, calibration degrades into pattern-matching the inputs, which is what a fine-tuned model does.

These three inputs, together, produce what a single input alone cannot: a model whose outputs are recognizably in your strategist's voice and grounded in your accounts' reality, recalibrated weekly so that the outputs stay sharp as the market shifts.

The calibration cycle, week by week

The calibration cycle is what turns the three inputs into a compounding asset. It runs on a weekly cadence. The shape holds across all cohorts.

The calibration cycle — your data, one week, back again.

Week 1. The first Living Brief arrives after the Parallax Test. The brief is, at this point, calibrated on whatever decision log and outcome data the onboarding reconstructed — a fair first pass, but noticeably generic in places where your agency's taste is specific. Your senior strategist reads the brief the way she would read a junior's draft: what's useful, what's missed, what's confidently wrong. She marks it up. The markup is structured — each comment carries a signal type (missed, wrong, out-of-scope, on-target). That structure is what lets the markup train.

Week 2. Re-calibration runs. The founder runs this personally for every agency in the first cohort, because the founder is the one who can read a week-one markup and tell the difference between "the model got the SMP wrong" — the Strategic Market Position — and "the model used the right SMP but described it in the wrong vocabulary." Those two failures require different calibration responses. The new Living Brief arrives on Monday morning, noticeably sharper on the dimensions your team flagged.

Week 4. The calibration has compounded across three weekly updates. The model starts pattern-matching your agency's taste, not just your agency's data. When the Monday brief suggests a creative angle, the angle now reads like something a senior on your team would have written — the register is right, the frame is right, the CEP prioritization lines up with how your agency actually thinks. This is the point at which the twin stops being a useful draft tool and starts being a second senior strategist on your bench.

Week 12. The calibration has seen your agency through a full quarter — a seasonal cycle, a creative fatigue arc, a brief-revision loop, a client review. The local optimum the model has reached is specific to your agency. No other twin has it. No other agency has access to it. If you walked away from Khorvad the next day, you would walk away with your weights under the Portability Covenant, and nobody else would ever see them.

What calibration does not cover

An honest methodology names its limits. Three are worth naming here.

Novel cultural moments. The twin calibrates on the pattern your team has taught it. When a new platform emerges, or a generational taste shifts, or a cultural event reframes what an audience expects, the twin has no prior calibration on the new ground. Your senior strategist has to decide whether the pattern she built carries over, and the twin will be wrong about the first few briefs in the new territory until it recalibrates. This is not a bug. It is what any calibrated system does on out-of-distribution inputs, and it is why your strategist's judgment stays load-bearing.

Inside-the-agency politics. The twin reads what was decided and what the outcome was. It does not read why the decision was made, when the reason was that a client's marketing director and the agency's creative lead cannot work together. Those decisions show up in the calibration as noise. The twin will sometimes suggest the creative the model would have ranked first, unaware that it was the one the client vetoed personally. Your strategist will still have to read the brief and catch these. The twin flags the output; your team decides which of its suggestions the room will actually accept.

Causation. The twin surfaces correlations with a confidence score. The score is honest — it reads the covariance structure and reports it. The twin does not assert causation, because the data it has cannot distinguish causation from coincidence. When the brief says "the skincare audience over-indexed on routine-simplification framing in Q3," it means the correlation was strong enough to mention, not that routine-simplification caused the lift. Your strategist's job is to decide which correlations are causal and which ones are the calendar.

Mental availability is worth one note here. The framework — Sharp and Romaniuk's work on how brands get retrieved in buying moments — is something the twin reads as an input (the CEPs you chose, the distinctive assets you briefed to). It is not something the twin computes on your behalf. Mental availability is a property of your category, not of your model. The twin can tell you whether last quarter's CEP investment paid out in the metrics you care about; it cannot tell you which CEP will be load-bearing in 2028.

For the Head of Strategy reading this

If you are the senior strategist reading this page, the shape of your Monday morning once calibration is running is this. A Living Brief is waiting in your inbox, generated between Sunday and Monday by the calibrated layer. The brief covers the accounts you asked it to cover — not every account, only the ones where your judgment is the load-bearing input. The brief ranks creative angles by fit against the CEP and JTBD shapes you approved last quarter. It flags where last week's hypothesis did not hold. It asks two or three questions it could not answer from the data alone.

The questions are the part that matters. They are how you know the twin is honest about what calibration does not cover. A twin that returns zero questions is a twin that is confidently hallucinating. The ones you see will be the ones your juniors would have asked, if they were senior enough to notice them.

The full Head-of-Strategy treatment of the four named objections, the Feel-Felt-Found framing around the craft-replacement concern, and the Further Reading list (Romaniuk, Sharp, Andjelic, Cole) lives on the Head of Strategy page. If you are deciding whether to recommend the Parallax Test to the agency owner who sent you this link, that page is the one to read next.

The Stump Session is where we answer the question the twin cannot answer. Bring the hardest strategic question on your plate this week — the one the model would hedge on, the one your team has been circling for a month. Forty-five minutes, founder-run, no deck. The single next step from this page is to book the Stump Session.

Book the Stump Session