🧠 Models & Context Window¶

The Big Picture: Every time you chat with Copilot CLI, you're talking to an AI model — a specific "brain" — through a context window — a shared "whiteboard." Understanding these two concepts will help you get better results and spend your budget wisely.

What is a Model?¶

Think of each model as a different chef in a restaurant kitchen. They all cook food, but each has different strengths, speeds, and price tags.

When you open the /model menu, you're choosing which chef handles your order.

Model	Provider	Personality	Context	Cost	Best For
Claude Sonnet 4.5	Anthropic	🧑‍🍳 Reliable head chef	200k	1x	Balanced everyday work
Claude Opus 4.6	Anthropic	👨‍🍳 Michelin-star chef	200k	1x	Hardest problems, best quality
Claude Haiku 4.5	Anthropic	🍔 Fast-food cook	200k	0.33x	Quick questions, saving budget
Claude Opus 4.6 1M	Anthropic	👨‍🍳 Same Michelin chef, massive kitchen	1,000k	6x	Huge codebases (internal only)
GPT-5.1	OpenAI	🍝 Chef from a different restaurant chain	200k	1x	Different perspective
GPT-5.4 mini	OpenAI	🥡 Their fast option	200k	0.33x	Quick tasks, budget-friendly

Bottom Line

You don't need to memorise this table. Just remember: Opus 4.6 = smartest at standard cost, Haiku = cheapest, and 1M = huge but expensive.

The `/model` Menu Explained¶

Type /model and you'll see something like this:

┌─────────────────────────────────┬─────────┬──────┐
│ Model                           │ Context │ Cost │
├─────────────────────────────────┼─────────┼──────┤
│ Claude Opus 4.6                 │ 200k    │ 1x   │
│ Claude Sonnet 4.5               │ 200k    │ 1x   │
│ Claude Haiku 4.5                │ 200k    │ 0.33x│
│ Claude Opus 4.6 (1M context)    │ 1000k   │ 6x   │
│ GPT-5.1                         │ 200k    │ 1x   │
│ GPT-5.4 mini                    │ 200k    │ 0.33x│
└─────────────────────────────────┴─────────┴──────┘

Two columns matter: Context and Cost. Let's break each one down.

Context Column (42k, 200k, 1000k)¶

This is the size of the whiteboard the AI uses to hold your entire conversation.

200k tokens ≈ 150,000 words ≈ roughly 2–3 full novels
1,000k tokens ≈ 750,000 words ≈ roughly 10–15 novels

What does 'k' mean?

The "k" stands for thousand. So 200k = 200,000 tokens. Think of it like kilometres — 200k is 200,000 of something.

The bigger the context, the more information the AI can "see" at once — your messages, files you've shared, its own instructions, and more.

Cost Column (1x, 0.33x, 6x)¶

This is how many premium requests each message costs from your monthly budget.

Think of it like a café budget:

The Café Budget Analogy ☕

Imagine you get $100/month to spend at the AI café.

Model	Cost per "Coffee"	Coffees You Get
Haiku 4.5 / GPT-5.4 mini	$0.33 each	~300 coffees ☕☕☕
Sonnet 4.5 / Opus 4.6 / GPT-5.1	$1.00 each	100 coffees ☕
Opus 4.6 1M	$6.00 each	~16 coffees ☕

The cheap coffee is still good — it's just smaller and simpler. The $6 coffee is the same quality as the $1 coffee, but served on a massive table (1M context).

Which Model Should You Use?¶

Here's a simple decision guide:

What are you doing?
│
├── 💬 Quick question or simple task
│   └── → Haiku 4.5 (0.33x) — save your budget
│
├── 📚 Learning / daily work / writing
│   └── → Opus 4.6 (1x) — best brain at standard cost
│
├── 🏗️ Massive project with huge files
│   └── → Opus 4.6 1M (6x) — only when you truly need the space
│
└── 🔄 Want a different perspective or style
    └── → GPT-5.1 (1x) — different "restaurant," different approach

Default recommendation

Start with Opus 4.6. It gives you the best quality at standard cost (1x). Switch to Haiku when you're doing simple things and want to stretch your budget.

What Are Tokens?¶

Tokens are the unit of measurement for the whiteboard. Everything — your messages, the AI's replies, files, instructions — gets converted into tokens.

Common misconception

One word ≠ one token. It's not a 1-to-1 relationship!

Rule of thumb: 1 token ≈ ¾ of a word (~4 characters)

Think of tokens like LEGO bricks:

Short, common words (like "the", "hello") = 1 brick
Longer or unusual words (like "PowerShell") = 2–3 bricks
Technical strings get broken into many bricks

Example Text	Approximate Tokens
`Hello`	1 token
`Good morning`	2 tokens
`PowerShell`	2–3 tokens
`New-AzResourceGroup`	5–7 tokens
`M365CPI52224224.onmicrosoft.com`	~10 tokens
A full page of text (~400 words)	~500–800 tokens
An entire novel (~80,000 words)	~100,000 tokens

Why does this matter?

Because the context window is measured in tokens. When you paste a long error message or a big file, it might eat up more whiteboard space than you'd expect — every character costs tokens.

The Whiteboard (Context Window)¶

The context window is a shared whiteboard between you and the AI. Everything — your questions, the AI's answers, files, tool definitions — lives on this whiteboard.

Here's what it looks like as it fills up:

╔══════════════════════════════════════════════════╗
║              THE WHITEBOARD (200k)               ║
╠══════════════════════════════════════════════════╣
║                                                  ║
║  ██████████████████████░░░░░░░░░░░░░░░░░░░░░░░░ ║
║  ▲ Used: ~35%              Free: ~60%        ▲   ║
║  │                                           │   ║
║  System/Tools                             Buffer ║
║  (loaded before                           (5%)   ║
║   you say hello!)                                ║
║                                                  ║
║  Stage 1: 70% used — Everything is fine 😊       ║
║  ████████████████████████████████░░░░░░░░░░░░░░ ║
║                                                  ║
║  Stage 2: 85% used — Warning appears ⚠️          ║
║  ██████████████████████████████████████░░░░░░░░ ║
║                                                  ║
║  Stage 3: 95% used — Auto-compact kicks in 🔄    ║
║  █████████████████████████████████████████████░░ ║
║                                                  ║
║  Stage 4: 100% full — Can't process! 🛑          ║
║  ████████████████████████████████████████████████ ║
║                                                  ║
╚══════════════════════════════════════════════════╝

What's ON the Whiteboard?¶

Run /context and you'll see how the whiteboard is divided. Here's what each section means:

Section	Typical Size	What It Is	Analogy
System / Tools	~30–40%	My instructions, all available tools (Azure, GitHub, MCP servers, skills)	The café's operating manual — menu, rules, procedures — always pinned to the whiteboard
Messages	Grows over time	Our entire conversation history — every question and answer	The order history — every coffee you've ordered today
Free space	Shrinks over time	Room for more conversation	Empty whiteboard — space for new orders
Buffer	~5%	Safety margin so I don't crash mid-response	Reserved space — like keeping the last page of a notebook blank

Surprising Fact

System/Tools takes up 30–40% before you even say hello! That's like walking into a café and finding the whiteboard already half-full with the menu, health regulations, and staff procedures — before the first customer arrives.

This is why your 200k context window doesn't really give you 200k of conversation space. You effectively start with ~120–140k of usable space.

Traffic Light System¶

Use this to decide when to take action:

🟢 GREEN  (0–60% used)    Everything is fine. Keep going.
🟡 YELLOW (60–80% used)   Consider /compact or /new soon.
🔴 RED    (80%+ used)     Act NOW — /compact, save work, or /clear.

When you're in the RED zone

Don't ignore it. At 95%+, the system will try to auto-compact (summarise and shrink the conversation), but this can lose important details. It's better to manage it yourself before it gets critical.

What to do:

Run /compact to summarise the conversation (frees up space)
Save any important decisions to your instructions file
If needed, run /clear or /new to start fresh

Managing Your Whiteboard¶

Five practical strategies to keep your whiteboard healthy:

#	Strategy	What to Do	Why It Helps
1	Be selective with `@`	Don't `@` entire folders or huge files unless you need them	Every file you reference eats whiteboard space
2	Start new sessions for new topics	Use `/new` when switching to a completely different task	A fresh whiteboard = maximum space
3	Use `/compact` proactively	Run `/compact` when you hit the yellow zone (60–80%)	Summarises conversation, frees up tokens
4	Keep thinking OFF for simple tasks	Toggle with `Ctrl+T` for simple questions	Extended thinking uses extra tokens
5	Save to instructions, then `/clear`	Put important context in `.github/copilot-instructions.md`, then clear	The "passport strategy" — your context survives session resets because instructions are reloaded automatically

The Passport Strategy 🛂

Think of your instructions file as a passport. When you /clear a session, the whiteboard is wiped clean — but your passport (instructions file) is always reloaded. So anything saved there survives across sessions.

This is perfect for things like:

Your tenant IDs and environment details
Preferred commands or workflows
Project-specific context you always need

Model Switching Mid-Conversation¶

You can switch models at any time using /model. Here's what happens:

What carries over (stays on the whiteboard):

✅ Your entire conversation history
✅ Files that were read into context
✅ Decisions and plans already discussed

What changes (the new chef takes over):

🔄 The "brain" processing everything
🔄 Quality and style of responses
🔄 How the AI interprets nuance

It's like switching chefs mid-meal

The new chef reads all the previous chef's notes (your conversation), but they might interpret the recipe differently. The kitchen (whiteboard) stays the same — only the person cooking changes.

Best Practices for Model Switching¶

Do	Don't
✅ Start with Opus 4.6 for best quality at 1x cost	❌ Don't switch models mid-complex-task — it can cause confusion
✅ Switch to Haiku for simple follow-ups to save budget	❌ Don't assume the new model "knows" implicit context
✅ Run `/compact` before switching (clean whiteboard for the new brain)	❌ Don't switch repeatedly back and forth
✅ Brief the new model after switching ("I'm working on X, we decided Y")	❌ Don't use 1M model for short conversations

The "Internal Only" 1M Model¶

The Claude Opus 4.6 (1M context) model is special:

Same brain as regular Opus 4.6 — same intelligence, same quality
Bigger whiteboard — 1,000,000 tokens instead of 200,000 (5× larger)
6× the cost — each message costs 6 premium requests
Available to Microsoft employees only (internal access)

Don't use it all the time

Using the 1M model for everyday questions is like renting a football stadium to cook dinner for two. You're paying 6× the price for space you'll never fill.

When to actually use 1M:

You need to analyse a massive codebase (many large files at once)
You're working with very large documents (100+ page specs)
Your session has been going so long that 200k isn't enough
/compact can't free up enough space for what you need

Smart Approach

Start with regular Opus 4.6 (1x cost)
    │
    ├── Session getting long? → Run /compact
    │
    ├── /compact isn't enough? → Try /new with key context
    │
    └── Still need more space? → NOW switch to 1M (6x cost)

This way, you only pay the 6× premium when you genuinely need the extra space.

Thinking Toggle¶

The thinking toggle controls whether you can see the AI's internal reasoning process.

Setting	What Happens	When to Use
Thinking ON	You see the AI's step-by-step reasoning before the answer	Complex decisions, debugging, understanding why
Thinking OFF	You just see the final answer	Simple questions, saving tokens

Toggle it with: Ctrl+T

Windows Terminal Conflict

In Windows Terminal, Ctrl+T opens a new tab instead of toggling thinking. This is a known conflict.

Workaround: Use a different terminal emulator, or check if your terminal lets you rebind the Ctrl+T shortcut.

Extended thinking is a separate feature — the model automatically "thinks harder" on complex problems, spending more tokens on reasoning before answering. You don't control this directly; it happens when the model detects a hard problem.

Quick Decision Flowchart¶

When in doubt, use this:

Need Copilot CLI help?
│
├── 💬 Quick question or lookup
│   └── → Haiku 4.5 (0.33x) 💰 Save budget
│
├── 📚 Learning / daily work / writing code
│   └── → Opus 4.6 (1x) ⭐ Best brain, standard price
│
├── 🏗️ Massive project, huge files
│   └── → Opus 4.6 1M (6x) 🏟️ Only when truly needed
│
└── 🔄 Want a different style or approach
    └── → GPT-5.1 (1x) 🍝 Different restaurant

Check Your Budget

Run /usage at any time to see how many premium requests you have left this month. Plan accordingly!

> /usage
Premium requests: 73 / 100 remaining (resets in 12 days)

Summary¶

Concept	One-Sentence Explanation
Model	The AI "brain" — different models = different strengths, speeds, and costs
Token	The unit of measurement — roughly ¾ of a word (~4 characters)
Context window	The shared whiteboard — everything must fit on it
`/model`	Switch which brain you're using
`/context`	Check how full your whiteboard is
`/compact`	Summarise conversation to free up whiteboard space
`/usage`	Check your remaining monthly budget
`Ctrl+T`	Toggle the AI's visible thinking process

Next: Useful Commands → for a complete reference of every command mentioned here.