Earlier we said we'd show you how Vinci is actually made — the part people assume is the secret. So here it is, in plain language. The full technical paper, with code, ships August 8 alongside the weights.
We're writing it out openly on purpose. If the thing that's supposed to earn your trust is a method we won't show you, that's not trust, it's faith. We'd rather show our work.
The idea in one breath
Constitutional Fine-Tuning is how we take a strong open model and train it to actually live by a written set of rules you can read.
There are two documents, and both are public.
The Constitution is the rulebook — about 5,000 words on what Vinci values, what it refuses, and what it commits to do. The Character document is the personality — how it sounds, how it carries those values in a real conversation, the voice you'd recognize.
You read both before you deploy anything. Then you check that the model in front of you behaves like the documents say it should. That's the whole point: the spec isn't internal, it ships with the model, and it's the thing we're on the hook for.
How it actually works
The mechanics are three techniques, and none of them are exotic. They're standard tools used all over the open-model world. Here's what each one does, without the jargon doing the explaining.
Showing it who it is. We give the model a set of examples that reinforce its own identity and voice — so it knows who it's supposed to be and stays that way. (In the paper: identity supervised fine-tuning.)
Showing it how we do things here. We give it worked examples of the Constitution in action across the situations it'll actually face — the equivalent of onboarding someone with "here's how we handle this, and this, and this," rather than handing them a handbook and hoping. (Behavioral supervised fine-tuning.)
Teaching it taste, not just rules. This is the one that does the most work. We show the model pairs of responses — a good one and a worse one — and teach it to prefer the good one. And we built those pairs specifically around the failure modes everyone's tired of: the reflexive "Great question!", the throat-clearing "I should note…", the safety lecture on a completely safe request. We trained those out, on purpose, by always preferring the version that just does the work. (Direct Preference Optimization.)
There's also a quieter piece. Instead of only telling the model the rules, we let it learn from material written as if the rules were already the norm — the way you absorb a company's culture by working inside it, not by reading the policy doc once. That's what makes the values stick under pressure instead of peeling off the first time someone pushes.
What we borrowed, and from whom
We want to be straight about this, because it's part of the point.
We didn't invent these techniques. The synthetic-document approach, the rewrite-to-align method that does a lot of the heavy lifting for character training, the inference-time work that keeps a model's personality stable when someone's trying to jailbreak it — these come out of published research from Anthropic, OpenAI, DeepMind, and Apollo Research. We adapted them to work on top of an open base instead of from-scratch training, and we name every source in the methodology paper.
What's ours is the combination: a public Constitution as the source of truth, a Character document for the personality, preference pairs built around real-world failure modes, and a verification bundle so you can check the result. Not the ingredients — the recipe and the fact that we're cooking it in the open.
One thing worth clearing up
Earlier we made a point of saying we don't cram knowledge into the model — that facts belong in the library it reads from, kept current, not frozen into the weights. This might look like the opposite of that. It isn't.
We're not baking facts into Vinci here. Facts change, and those still live in retrieval, where they stay fresh.
What we're baking in is character — how it behaves, what it refuses, who it is. That should be stable. You don't want a model whose values drift with the news cycle. So the split is simple: knowledge stays in the library, character goes in the weights, and Constitutional Fine-Tuning is how the character gets there.
What it's not
A methodology post that only lists strengths is a sales brochure. A few honest limits, because they matter more than the strengths.
It's not a promise of perfect behavior. Fine-tuned open models can still be jailbroken. We publish the adversarial test results precisely because they show where Vinci holds and where it doesn't — and the Constitution commits us to publishing a diagnosis and fix within 30 days when its deployed behavior drifts from the spec.
It's not us advancing safety research. We're standing on other people's safety work, adapting it, and saying so. We're not going to pretend we built that stack alone.
It's not the only way to do this. Other teams will make different, reasonable choices. We picked this combination because it works on open bases, produces something you can verify, and lets a small team ship a credible product.
And it's not Canada-specific. The methodology is general. vinci-studio happens to specialize in regulated professional work where Canadian context matters, but the approach works for any specialization on any open base.
Why this matters for open AI
Closed providers ask you to trust that their model behaves the way they say. You can't check.
Open providers hand you weights, but usually no written spec for how the thing is meant to act and no published tests — so you've got a model that does things without telling you what things, or why.
Constitutional Fine-Tuning is our attempt to close that gap from the open side. The weights are open. The Constitution spells out intended behavior. The Character document spells out intended personality. The adversarial results show measured behavior. You line up what we intended, what we described, and what we measured against what you actually see in your own deployment.
That's all "verifiable" means. It's not a slogan. It's just the set of documents that let you check.
What ships August 8
vinci-studio is the first model this process produces. It comes with the open weights (Apache 2.0, on Hugging Face), the Constitution v1.0, the Character document v1.0, the full methodology writeup with code, and the adversarial test results — HarmBench, JailbreakBench, and a Chinese-model censorship testbed — run against the exact weights we deploy.
The bundle is the product. The model just runs it.
Talk soon, The SimpleDirect team
Where to next