Skip to main content

The lineup

Three models. One principle: yours.

Our models are open-weight. You download the files. They run on your hardware. No API calls, no cloud dependency.

flash-1-mini

Bilingual legal citation, compliance, edge & personal use

Free — forever

Tier 1 (laptop)

Specs

Parameters
4 billion
Context length
8K tokens
Recommended quantization
Q4_K_M (~2.7 GB)
Minimum hardware
Any laptop with 4+ GB RAM
License
Apache 2.0 — open weights, commercial use OK

Capabilities

  • Bilingual English / French
  • Citation-grounded responses
  • Multiple GGUF quantization levels
  • Tuned for Canadian legal citation accuracy
  • Vision-capable — reads images and text
  • Strong multi-part instruction-following
  • Runs on Pi-class to laptop hardware

Compatibility

  • Ollama
  • LM Studio
  • llama.cpp
  • Any GGUF-compliant runtime
View on HuggingFace

flash-1

Business daily driver, RAG, function calling

$99 one-time

Tier 1–2 (laptop to workstation)

Specs

Parameters
9 billion
Context length
16K tokens
Recommended quantization
Q4_K_M (~5.5 GB)
Minimum hardware
8 GB RAM or entry GPU
License
Free for orgs under 50 employees

Capabilities

  • Bilingual English / French
  • Citation-grounded responses
  • Multiple GGUF quantization levels
  • Balanced reasoning and speed
  • Function calling and document Q&A (RAG)
  • Quantizations from Q3_K_M to fp16

Compatibility

  • Ollama
  • LM Studio
  • llama.cpp
  • Any GGUF-compliant runtime
Download links open with launch

flash-1-pro

Enterprise, defense, complex reasoning, multi-user

$499 one-time

Tier 2–4 (workstation to colo)

Specs

Parameters
27 billion
Context length
32K tokens
Recommended quantization
Q4_K_M (~16 GB)
Minimum hardware
24+ GB VRAM or 32+ GB RAM
License
Commercial license required for orgs over 50

Capabilities

  • Bilingual English / French
  • Citation-grounded responses
  • Multiple GGUF quantization levels
  • Strongest reasoning and instruction-following
  • Function calling and RAG for agentic workloads
  • Optimized for vLLM multi-user deployments

Compatibility

  • Ollama
  • LM Studio
  • llama.cpp
  • Any GGUF-compliant runtime
  • vLLM for multi-user inference
Download links open with launch

Where should I run this?

See the ownership spectrum

Coming next

An open dataset release (July 2026), and flash-1-GC — a variant oriented to Canadian public-sector requirements — as a free public demo after September 2026.

Which model should you use?

Three questions. Two minutes. We don't need your email.

Solo user or a team?

Solo→ flash-1-mini or flash-1
Team→ flash-1 or flash-1-pro

Do you handle regulated data (legal, health, financial)?

Yes→ flash-1 minimum, flash-1-pro recommended
No→ flash-1-mini works for most cases

Air-gapped or multi-user deployment?

Yes→ flash-1-pro
No→ flash-1-mini or flash-1 is enough

Licensing

Free tier

flash-1-mini is free forever — personal, professional, commercial, doesn't matter. Download it and use it.

Small organizations (under 50 employees)

flash-1 and flash-1-pro are one-time purchases. No per-seat fees. No subscription.

Enterprise (50+ employees, regulated industries, defense)

Commercial licensing available with deployment support. See the Enterprise page or contact us.

Questions you might be having

If you have one we missed, ask us directly.

What does "open-weight" actually mean?

You get the actual model files. You can inspect them, run them, fine-tune them. There is no hosted version of our models that you have to go through us for. The weights live on your machine.

What is GGUF? Do I need anything special to run it?

GGUF is a file format for AI models. It runs on Ollama, LM Studio, llama.cpp, and any compliant runtime. If you have a modern Mac, Windows, or Linux machine with the recommended RAM, you already have what you need. Download the file, point your runtime at it, done.

Are these fine-tuned, and on what base?

flash-1-mini is fine-tuned from an open, Apache-2.0-licensed Qwen base model — not Llama — and we attribute that base as the license requires (see the NOTICE file). That cuts both ways, honestly: the gains are real and measured against that exact base, and you still own what you download. Apache 2.0 is irrevocable — it lets you run, modify, and redistribute the model commercially, and nobody can pull the license out from under you. We fine-tune on Canadian infrastructure.

Can I commercialize the outputs?

Yes. Outputs are yours. Use them in work product, client deliverables, internal tools, products you sell. The model license covers running the model. It does not claim ownership of what you generate.

What about benchmarks?

Benchmarks publish with each model launch. flash-1-mini's are live now — see the model page for the full table, including the honest tradeoffs. We don't pre-announce numbers or cherry-pick; the regressions sit right next to the gains. Real numbers, on the actual model you can download.

How do model updates work?

You get a notification when a new version is available. You decide whether to download it. The version you already have continues to work forever — we cannot break or revoke a model you already downloaded.

What if SimpleDirect goes out of business?

Your models keep working. They are files on your hardware. We cannot revoke them, even by ceasing to exist. This is the entire point of open-weight ownership.

Does this work without internet?

Yes. Once downloaded, the model runs entirely on your hardware. Airplane mode is fine. So is an air-gapped network.

What operating systems are supported?

GGUF runtimes work on macOS, Windows, and Linux. The September 2026 desktop app ships on macOS and Windows at launch, with Linux to follow.

Is my data ever sent to you?

No. We do not operate inference servers. We could not collect your prompts even if we wanted to. The desktop app, when it ships, sends no telemetry. See the privacy policy for full details.

Get notified when models ship

flash-1-mini is available now. flash-1 in July. flash-1-pro in September. One email per launch.

Join the waitlist