Skip to content

🧠 Models & Context Window

The Big Picture: Every time you chat with Copilot CLI, you're talking to an AI model — a specific "brain" — through a context window — a shared "whiteboard." Understanding these two concepts will help you get better results and spend your budget wisely.


What is a Model?

Think of each model as a different chef in a restaurant kitchen. They all cook food, but each has different strengths, speeds, and price tags.

When you open the /model menu, you're choosing which chef handles your order.

Model Provider Personality Context Cost Best For
Claude Sonnet 4.5 Anthropic 🧑‍🍳 Reliable head chef 200k 1x Balanced everyday work
Claude Opus 4.6 Anthropic 👨‍🍳 Michelin-star chef 200k 1x Hardest problems, best quality
Claude Haiku 4.5 Anthropic 🍔 Fast-food cook 200k 0.33x Quick questions, saving budget
Claude Opus 4.6 1M Anthropic 👨‍🍳 Same Michelin chef, massive kitchen 1,000k 6x Huge codebases (internal only)
GPT-5.1 OpenAI 🍝 Chef from a different restaurant chain 200k 1x Different perspective
GPT-5.4 mini OpenAI 🥡 Their fast option 200k 0.33x Quick tasks, budget-friendly

Bottom Line

You don't need to memorise this table. Just remember: Opus 4.6 = smartest at standard cost, Haiku = cheapest, and 1M = huge but expensive.


The /model Menu Explained

Type /model and you'll see something like this:

┌─────────────────────────────────┬─────────┬──────┐
│ Model                           │ Context │ Cost │
├─────────────────────────────────┼─────────┼──────┤
│ Claude Opus 4.6                 │ 200k    │ 1x   │
│ Claude Sonnet 4.5               │ 200k    │ 1x   │
│ Claude Haiku 4.5                │ 200k    │ 0.33x│
│ Claude Opus 4.6 (1M context)    │ 1000k   │ 6x   │
│ GPT-5.1                         │ 200k    │ 1x   │
│ GPT-5.4 mini                    │ 200k    │ 0.33x│
└─────────────────────────────────┴─────────┴──────┘

Two columns matter: Context and Cost. Let's break each one down.


Context Column (42k, 200k, 1000k)

This is the size of the whiteboard the AI uses to hold your entire conversation.

  • 200k tokens150,000 words ≈ roughly 2–3 full novels
  • 1,000k tokens750,000 words ≈ roughly 10–15 novels

What does 'k' mean?

The "k" stands for thousand. So 200k = 200,000 tokens. Think of it like kilometres — 200k is 200,000 of something.

The bigger the context, the more information the AI can "see" at once — your messages, files you've shared, its own instructions, and more.


Cost Column (1x, 0.33x, 6x)

This is how many premium requests each message costs from your monthly budget.

Think of it like a café budget:

The Café Budget Analogy ☕

Imagine you get $100/month to spend at the AI café.

Model Cost per "Coffee" Coffees You Get
Haiku 4.5 / GPT-5.4 mini $0.33 each ~300 coffees ☕☕☕
Sonnet 4.5 / Opus 4.6 / GPT-5.1 $1.00 each 100 coffees ☕
Opus 4.6 1M $6.00 each ~16 coffees ☕

The cheap coffee is still good — it's just smaller and simpler. The $6 coffee is the same quality as the $1 coffee, but served on a massive table (1M context).


Which Model Should You Use?

Here's a simple decision guide:

What are you doing?
├── 💬 Quick question or simple task
│   └── → Haiku 4.5 (0.33x) — save your budget
├── 📚 Learning / daily work / writing
│   └── → Opus 4.6 (1x) — best brain at standard cost
├── 🏗️ Massive project with huge files
│   └── → Opus 4.6 1M (6x) — only when you truly need the space
└── 🔄 Want a different perspective or style
    └── → GPT-5.1 (1x) — different "restaurant," different approach

Default recommendation

Start with Opus 4.6. It gives you the best quality at standard cost (1x). Switch to Haiku when you're doing simple things and want to stretch your budget.


What Are Tokens?

Tokens are the unit of measurement for the whiteboard. Everything — your messages, the AI's replies, files, instructions — gets converted into tokens.

Common misconception

One word ≠ one token. It's not a 1-to-1 relationship!

Rule of thumb: 1 token ≈ ¾ of a word (~4 characters)

Think of tokens like LEGO bricks:

  • Short, common words (like "the", "hello") = 1 brick
  • Longer or unusual words (like "PowerShell") = 2–3 bricks
  • Technical strings get broken into many bricks
Example Text Approximate Tokens
Hello 1 token
Good morning 2 tokens
PowerShell 2–3 tokens
New-AzResourceGroup 5–7 tokens
M365CPI52224224.onmicrosoft.com ~10 tokens
A full page of text (~400 words) ~500–800 tokens
An entire novel (~80,000 words) ~100,000 tokens

Why does this matter?

Because the context window is measured in tokens. When you paste a long error message or a big file, it might eat up more whiteboard space than you'd expect — every character costs tokens.


The Whiteboard (Context Window)

The context window is a shared whiteboard between you and the AI. Everything — your questions, the AI's answers, files, tool definitions — lives on this whiteboard.

Here's what it looks like as it fills up:

╔══════════════════════════════════════════════════╗
║              THE WHITEBOARD (200k)               ║
╠══════════════════════════════════════════════════╣
║                                                  ║
║  ██████████████████████░░░░░░░░░░░░░░░░░░░░░░░░ ║
║  ▲ Used: ~35%              Free: ~60%        ▲   ║
║  │                                           │   ║
║  System/Tools                             Buffer ║
║  (loaded before                           (5%)   ║
║   you say hello!)                                ║
║                                                  ║
║  Stage 1: 70% used — Everything is fine 😊       ║
║  ████████████████████████████████░░░░░░░░░░░░░░ ║
║                                                  ║
║  Stage 2: 85% used — Warning appears ⚠️          ║
║  ██████████████████████████████████████░░░░░░░░ ║
║                                                  ║
║  Stage 3: 95% used — Auto-compact kicks in 🔄    ║
║  █████████████████████████████████████████████░░ ║
║                                                  ║
║  Stage 4: 100% full — Can't process! 🛑          ║
║  ████████████████████████████████████████████████ ║
║                                                  ║
╚══════════════════════════════════════════════════╝

What's ON the Whiteboard?

Run /context and you'll see how the whiteboard is divided. Here's what each section means:

Section Typical Size What It Is Analogy
System / Tools ~30–40% My instructions, all available tools (Azure, GitHub, MCP servers, skills) The café's operating manual — menu, rules, procedures — always pinned to the whiteboard
Messages Grows over time Our entire conversation history — every question and answer The order history — every coffee you've ordered today
Free space Shrinks over time Room for more conversation Empty whiteboard — space for new orders
Buffer ~5% Safety margin so I don't crash mid-response Reserved space — like keeping the last page of a notebook blank

Surprising Fact

System/Tools takes up 30–40% before you even say hello! That's like walking into a café and finding the whiteboard already half-full with the menu, health regulations, and staff procedures — before the first customer arrives.

This is why your 200k context window doesn't really give you 200k of conversation space. You effectively start with ~120–140k of usable space.


Traffic Light System

Use this to decide when to take action:

🟢 GREEN  (0–60% used)    Everything is fine. Keep going.
🟡 YELLOW (60–80% used)   Consider /compact or /new soon.
🔴 RED    (80%+ used)     Act NOW — /compact, save work, or /clear.

When you're in the RED zone

Don't ignore it. At 95%+, the system will try to auto-compact (summarise and shrink the conversation), but this can lose important details. It's better to manage it yourself before it gets critical.

What to do:

  1. Run /compact to summarise the conversation (frees up space)
  2. Save any important decisions to your instructions file
  3. If needed, run /clear or /new to start fresh

Managing Your Whiteboard

Five practical strategies to keep your whiteboard healthy:

# Strategy What to Do Why It Helps
1 Be selective with @ Don't @ entire folders or huge files unless you need them Every file you reference eats whiteboard space
2 Start new sessions for new topics Use /new when switching to a completely different task A fresh whiteboard = maximum space
3 Use /compact proactively Run /compact when you hit the yellow zone (60–80%) Summarises conversation, frees up tokens
4 Keep thinking OFF for simple tasks Toggle with Ctrl+T for simple questions Extended thinking uses extra tokens
5 Save to instructions, then /clear Put important context in .github/copilot-instructions.md, then clear The "passport strategy" — your context survives session resets because instructions are reloaded automatically

The Passport Strategy 🛂

Think of your instructions file as a passport. When you /clear a session, the whiteboard is wiped clean — but your passport (instructions file) is always reloaded. So anything saved there survives across sessions.

This is perfect for things like:

  • Your tenant IDs and environment details
  • Preferred commands or workflows
  • Project-specific context you always need

Model Switching Mid-Conversation

You can switch models at any time using /model. Here's what happens:

What carries over (stays on the whiteboard):

  • ✅ Your entire conversation history
  • ✅ Files that were read into context
  • ✅ Decisions and plans already discussed

What changes (the new chef takes over):

  • 🔄 The "brain" processing everything
  • 🔄 Quality and style of responses
  • 🔄 How the AI interprets nuance

It's like switching chefs mid-meal

The new chef reads all the previous chef's notes (your conversation), but they might interpret the recipe differently. The kitchen (whiteboard) stays the same — only the person cooking changes.


Best Practices for Model Switching

Do Don't
✅ Start with Opus 4.6 for best quality at 1x cost ❌ Don't switch models mid-complex-task — it can cause confusion
✅ Switch to Haiku for simple follow-ups to save budget ❌ Don't assume the new model "knows" implicit context
✅ Run /compact before switching (clean whiteboard for the new brain) ❌ Don't switch repeatedly back and forth
Brief the new model after switching ("I'm working on X, we decided Y") ❌ Don't use 1M model for short conversations

The "Internal Only" 1M Model

The Claude Opus 4.6 (1M context) model is special:

  • Same brain as regular Opus 4.6 — same intelligence, same quality
  • Bigger whiteboard — 1,000,000 tokens instead of 200,000 (5× larger)
  • 6× the cost — each message costs 6 premium requests
  • Available to Microsoft employees only (internal access)

Don't use it all the time

Using the 1M model for everyday questions is like renting a football stadium to cook dinner for two. You're paying 6× the price for space you'll never fill.

When to actually use 1M:

  • You need to analyse a massive codebase (many large files at once)
  • You're working with very large documents (100+ page specs)
  • Your session has been going so long that 200k isn't enough
  • /compact can't free up enough space for what you need

Smart Approach

Start with regular Opus 4.6 (1x cost)
    ├── Session getting long? → Run /compact
    ├── /compact isn't enough? → Try /new with key context
    └── Still need more space? → NOW switch to 1M (6x cost)

This way, you only pay the 6× premium when you genuinely need the extra space.


Thinking Toggle

The thinking toggle controls whether you can see the AI's internal reasoning process.

Setting What Happens When to Use
Thinking ON You see the AI's step-by-step reasoning before the answer Complex decisions, debugging, understanding why
Thinking OFF You just see the final answer Simple questions, saving tokens

Toggle it with: Ctrl+T

Windows Terminal Conflict

In Windows Terminal, Ctrl+T opens a new tab instead of toggling thinking. This is a known conflict.

Workaround: Use a different terminal emulator, or check if your terminal lets you rebind the Ctrl+T shortcut.

Extended thinking is a separate feature — the model automatically "thinks harder" on complex problems, spending more tokens on reasoning before answering. You don't control this directly; it happens when the model detects a hard problem.


Quick Decision Flowchart

When in doubt, use this:

Need Copilot CLI help?
├── 💬 Quick question or lookup
│   └── → Haiku 4.5 (0.33x) 💰 Save budget
├── 📚 Learning / daily work / writing code
│   └── → Opus 4.6 (1x) ⭐ Best brain, standard price
├── 🏗️ Massive project, huge files
│   └── → Opus 4.6 1M (6x) 🏟️ Only when truly needed
└── 🔄 Want a different style or approach
    └── → GPT-5.1 (1x) 🍝 Different restaurant

Check Your Budget

Run /usage at any time to see how many premium requests you have left this month. Plan accordingly!

> /usage
Premium requests: 73 / 100 remaining (resets in 12 days)

Summary

Concept One-Sentence Explanation
Model The AI "brain" — different models = different strengths, speeds, and costs
Token The unit of measurement — roughly ¾ of a word (~4 characters)
Context window The shared whiteboard — everything must fit on it
/model Switch which brain you're using
/context Check how full your whiteboard is
/compact Summarise conversation to free up whiteboard space
/usage Check your remaining monthly budget
Ctrl+T Toggle the AI's visible thinking process

Next: Useful Commands → for a complete reference of every command mentioned here.