🧠 Models & Context Window¶
The Big Picture: Every time you chat with Copilot CLI, you're talking to an AI model — a specific "brain" — through a context window — a shared "whiteboard." Understanding these two concepts will help you get better results and spend your budget wisely.
What is a Model?¶
Think of each model as a different chef in a restaurant kitchen. They all cook food, but each has different strengths, speeds, and price tags.
When you open the /model menu, you're choosing which chef handles your order.
| Model | Provider | Personality | Context | Cost | Best For |
|---|---|---|---|---|---|
| Claude Sonnet 4.5 | Anthropic | 🧑🍳 Reliable head chef | 200k | 1x | Balanced everyday work |
| Claude Opus 4.6 | Anthropic | 👨🍳 Michelin-star chef | 200k | 1x | Hardest problems, best quality |
| Claude Haiku 4.5 | Anthropic | 🍔 Fast-food cook | 200k | 0.33x | Quick questions, saving budget |
| Claude Opus 4.6 1M | Anthropic | 👨🍳 Same Michelin chef, massive kitchen | 1,000k | 6x | Huge codebases (internal only) |
| GPT-5.1 | OpenAI | 🍝 Chef from a different restaurant chain | 200k | 1x | Different perspective |
| GPT-5.4 mini | OpenAI | 🥡 Their fast option | 200k | 0.33x | Quick tasks, budget-friendly |
Bottom Line
You don't need to memorise this table. Just remember: Opus 4.6 = smartest at standard cost, Haiku = cheapest, and 1M = huge but expensive.
The /model Menu Explained¶
Type /model and you'll see something like this:
┌─────────────────────────────────┬─────────┬──────┐
│ Model │ Context │ Cost │
├─────────────────────────────────┼─────────┼──────┤
│ Claude Opus 4.6 │ 200k │ 1x │
│ Claude Sonnet 4.5 │ 200k │ 1x │
│ Claude Haiku 4.5 │ 200k │ 0.33x│
│ Claude Opus 4.6 (1M context) │ 1000k │ 6x │
│ GPT-5.1 │ 200k │ 1x │
│ GPT-5.4 mini │ 200k │ 0.33x│
└─────────────────────────────────┴─────────┴──────┘
Two columns matter: Context and Cost. Let's break each one down.
Context Column (42k, 200k, 1000k)¶
This is the size of the whiteboard the AI uses to hold your entire conversation.
- 200k tokens ≈ 150,000 words ≈ roughly 2–3 full novels
- 1,000k tokens ≈ 750,000 words ≈ roughly 10–15 novels
What does 'k' mean?
The "k" stands for thousand. So 200k = 200,000 tokens. Think of it like kilometres — 200k is 200,000 of something.
The bigger the context, the more information the AI can "see" at once — your messages, files you've shared, its own instructions, and more.
Cost Column (1x, 0.33x, 6x)¶
This is how many premium requests each message costs from your monthly budget.
Think of it like a café budget:
The Café Budget Analogy ☕
Imagine you get $100/month to spend at the AI café.
| Model | Cost per "Coffee" | Coffees You Get |
|---|---|---|
| Haiku 4.5 / GPT-5.4 mini | $0.33 each | ~300 coffees ☕☕☕ |
| Sonnet 4.5 / Opus 4.6 / GPT-5.1 | $1.00 each | 100 coffees ☕ |
| Opus 4.6 1M | $6.00 each | ~16 coffees ☕ |
The cheap coffee is still good — it's just smaller and simpler. The $6 coffee is the same quality as the $1 coffee, but served on a massive table (1M context).
Which Model Should You Use?¶
Here's a simple decision guide:
What are you doing?
│
├── 💬 Quick question or simple task
│ └── → Haiku 4.5 (0.33x) — save your budget
│
├── 📚 Learning / daily work / writing
│ └── → Opus 4.6 (1x) — best brain at standard cost
│
├── 🏗️ Massive project with huge files
│ └── → Opus 4.6 1M (6x) — only when you truly need the space
│
└── 🔄 Want a different perspective or style
└── → GPT-5.1 (1x) — different "restaurant," different approach
Default recommendation
Start with Opus 4.6. It gives you the best quality at standard cost (1x). Switch to Haiku when you're doing simple things and want to stretch your budget.
What Are Tokens?¶
Tokens are the unit of measurement for the whiteboard. Everything — your messages, the AI's replies, files, instructions — gets converted into tokens.
Common misconception
One word ≠ one token. It's not a 1-to-1 relationship!
Rule of thumb: 1 token ≈ ¾ of a word (~4 characters)
Think of tokens like LEGO bricks:
- Short, common words (like "the", "hello") = 1 brick
- Longer or unusual words (like "PowerShell") = 2–3 bricks
- Technical strings get broken into many bricks
| Example Text | Approximate Tokens |
|---|---|
Hello |
1 token |
Good morning |
2 tokens |
PowerShell |
2–3 tokens |
New-AzResourceGroup |
5–7 tokens |
M365CPI52224224.onmicrosoft.com |
~10 tokens |
| A full page of text (~400 words) | ~500–800 tokens |
| An entire novel (~80,000 words) | ~100,000 tokens |
Why does this matter?
Because the context window is measured in tokens. When you paste a long error message or a big file, it might eat up more whiteboard space than you'd expect — every character costs tokens.
The Whiteboard (Context Window)¶
The context window is a shared whiteboard between you and the AI. Everything — your questions, the AI's answers, files, tool definitions — lives on this whiteboard.
Here's what it looks like as it fills up:
╔══════════════════════════════════════════════════╗
║ THE WHITEBOARD (200k) ║
╠══════════════════════════════════════════════════╣
║ ║
║ ██████████████████████░░░░░░░░░░░░░░░░░░░░░░░░ ║
║ ▲ Used: ~35% Free: ~60% ▲ ║
║ │ │ ║
║ System/Tools Buffer ║
║ (loaded before (5%) ║
║ you say hello!) ║
║ ║
║ Stage 1: 70% used — Everything is fine 😊 ║
║ ████████████████████████████████░░░░░░░░░░░░░░ ║
║ ║
║ Stage 2: 85% used — Warning appears ⚠️ ║
║ ██████████████████████████████████████░░░░░░░░ ║
║ ║
║ Stage 3: 95% used — Auto-compact kicks in 🔄 ║
║ █████████████████████████████████████████████░░ ║
║ ║
║ Stage 4: 100% full — Can't process! 🛑 ║
║ ████████████████████████████████████████████████ ║
║ ║
╚══════════════════════════════════════════════════╝
What's ON the Whiteboard?¶
Run /context and you'll see how the whiteboard is divided. Here's what each section means:
| Section | Typical Size | What It Is | Analogy |
|---|---|---|---|
| System / Tools | ~30–40% | My instructions, all available tools (Azure, GitHub, MCP servers, skills) | The café's operating manual — menu, rules, procedures — always pinned to the whiteboard |
| Messages | Grows over time | Our entire conversation history — every question and answer | The order history — every coffee you've ordered today |
| Free space | Shrinks over time | Room for more conversation | Empty whiteboard — space for new orders |
| Buffer | ~5% | Safety margin so I don't crash mid-response | Reserved space — like keeping the last page of a notebook blank |
Surprising Fact
System/Tools takes up 30–40% before you even say hello! That's like walking into a café and finding the whiteboard already half-full with the menu, health regulations, and staff procedures — before the first customer arrives.
This is why your 200k context window doesn't really give you 200k of conversation space. You effectively start with ~120–140k of usable space.
Traffic Light System¶
Use this to decide when to take action:
🟢 GREEN (0–60% used) Everything is fine. Keep going.
🟡 YELLOW (60–80% used) Consider /compact or /new soon.
🔴 RED (80%+ used) Act NOW — /compact, save work, or /clear.
When you're in the RED zone
Don't ignore it. At 95%+, the system will try to auto-compact (summarise and shrink the conversation), but this can lose important details. It's better to manage it yourself before it gets critical.
What to do:
- Run
/compactto summarise the conversation (frees up space) - Save any important decisions to your instructions file
- If needed, run
/clearor/newto start fresh
Managing Your Whiteboard¶
Five practical strategies to keep your whiteboard healthy:
| # | Strategy | What to Do | Why It Helps |
|---|---|---|---|
| 1 | Be selective with @ |
Don't @ entire folders or huge files unless you need them |
Every file you reference eats whiteboard space |
| 2 | Start new sessions for new topics | Use /new when switching to a completely different task |
A fresh whiteboard = maximum space |
| 3 | Use /compact proactively |
Run /compact when you hit the yellow zone (60–80%) |
Summarises conversation, frees up tokens |
| 4 | Keep thinking OFF for simple tasks | Toggle with Ctrl+T for simple questions |
Extended thinking uses extra tokens |
| 5 | Save to instructions, then /clear |
Put important context in .github/copilot-instructions.md, then clear |
The "passport strategy" — your context survives session resets because instructions are reloaded automatically |
The Passport Strategy 🛂
Think of your instructions file as a passport. When you /clear a session, the whiteboard is wiped clean — but your passport (instructions file) is always reloaded. So anything saved there survives across sessions.
This is perfect for things like:
- Your tenant IDs and environment details
- Preferred commands or workflows
- Project-specific context you always need
Model Switching Mid-Conversation¶
You can switch models at any time using /model. Here's what happens:
What carries over (stays on the whiteboard):
- ✅ Your entire conversation history
- ✅ Files that were read into context
- ✅ Decisions and plans already discussed
What changes (the new chef takes over):
- 🔄 The "brain" processing everything
- 🔄 Quality and style of responses
- 🔄 How the AI interprets nuance
It's like switching chefs mid-meal
The new chef reads all the previous chef's notes (your conversation), but they might interpret the recipe differently. The kitchen (whiteboard) stays the same — only the person cooking changes.
Best Practices for Model Switching¶
| Do | Don't |
|---|---|
| ✅ Start with Opus 4.6 for best quality at 1x cost | ❌ Don't switch models mid-complex-task — it can cause confusion |
| ✅ Switch to Haiku for simple follow-ups to save budget | ❌ Don't assume the new model "knows" implicit context |
✅ Run /compact before switching (clean whiteboard for the new brain) |
❌ Don't switch repeatedly back and forth |
| ✅ Brief the new model after switching ("I'm working on X, we decided Y") | ❌ Don't use 1M model for short conversations |
The "Internal Only" 1M Model¶
The Claude Opus 4.6 (1M context) model is special:
- Same brain as regular Opus 4.6 — same intelligence, same quality
- Bigger whiteboard — 1,000,000 tokens instead of 200,000 (5× larger)
- 6× the cost — each message costs 6 premium requests
- Available to Microsoft employees only (internal access)
Don't use it all the time
Using the 1M model for everyday questions is like renting a football stadium to cook dinner for two. You're paying 6× the price for space you'll never fill.
When to actually use 1M:
- You need to analyse a massive codebase (many large files at once)
- You're working with very large documents (100+ page specs)
- Your session has been going so long that 200k isn't enough
/compactcan't free up enough space for what you need
Smart Approach
Start with regular Opus 4.6 (1x cost)
│
├── Session getting long? → Run /compact
│
├── /compact isn't enough? → Try /new with key context
│
└── Still need more space? → NOW switch to 1M (6x cost)
This way, you only pay the 6× premium when you genuinely need the extra space.
Thinking Toggle¶
The thinking toggle controls whether you can see the AI's internal reasoning process.
| Setting | What Happens | When to Use |
|---|---|---|
| Thinking ON | You see the AI's step-by-step reasoning before the answer | Complex decisions, debugging, understanding why |
| Thinking OFF | You just see the final answer | Simple questions, saving tokens |
Toggle it with: Ctrl+T
Windows Terminal Conflict
In Windows Terminal, Ctrl+T opens a new tab instead of toggling thinking. This is a known conflict.
Workaround: Use a different terminal emulator, or check if your terminal lets you rebind the Ctrl+T shortcut.
Extended thinking is a separate feature — the model automatically "thinks harder" on complex problems, spending more tokens on reasoning before answering. You don't control this directly; it happens when the model detects a hard problem.
Quick Decision Flowchart¶
When in doubt, use this:
Need Copilot CLI help?
│
├── 💬 Quick question or lookup
│ └── → Haiku 4.5 (0.33x) 💰 Save budget
│
├── 📚 Learning / daily work / writing code
│ └── → Opus 4.6 (1x) ⭐ Best brain, standard price
│
├── 🏗️ Massive project, huge files
│ └── → Opus 4.6 1M (6x) 🏟️ Only when truly needed
│
└── 🔄 Want a different style or approach
└── → GPT-5.1 (1x) 🍝 Different restaurant
Check Your Budget
Run /usage at any time to see how many premium requests you have left this month. Plan accordingly!
Summary¶
| Concept | One-Sentence Explanation |
|---|---|
| Model | The AI "brain" — different models = different strengths, speeds, and costs |
| Token | The unit of measurement — roughly ¾ of a word (~4 characters) |
| Context window | The shared whiteboard — everything must fit on it |
/model |
Switch which brain you're using |
/context |
Check how full your whiteboard is |
/compact |
Summarise conversation to free up whiteboard space |
/usage |
Check your remaining monthly budget |
Ctrl+T |
Toggle the AI's visible thinking process |
Next: Useful Commands → for a complete reference of every command mentioned here.