Chat Mode

Have multi-turn conversations with AI models where every response builds on previous context. Perfect for iterative problem-solving, pair programming, and in-depth discussions.

Overview

Chat mode enables:

Context preservation - AI remembers the entire conversation
Iterative refinement - Build on previous responses
File references - Use @filename syntax in any message
Model persistence - Same model throughout the session
System prompts - Custom model personas applied to all messages
Error recovery - Retry on failures without losing history

Basic Usage

Starting a Chat Session

# Basic chat
zo --chat 'Let us discuss Rust lifetimes'
> Can you give me an example?
> What happens if I violate the rules?
> exit
 
# With model selection (slash command)
zo --chat /sonnet 'Explain async/await'
> Show me a practical example
> How does it compare to threads?
> exit
 
# With model selection (CLI flag)
zo --chat --model opus 'Let us design a system'
> What about scalability?
> How would you handle failures?
> exit

Exiting Chat

Multiple ways to exit:

Type exit
Type quit
Type q
Press Ctrl+D (EOF)

zo --chat 'Hello'
> Let us talk about databases
> exit  # ← Exits the chat session

Multiline Input

For entering multiline messages, use one of these key combinations:

Alt-Enter - Works in most modern terminals (recommended)
Ctrl-O - Fallback for terminals where Alt-Enter doesn't work
Ctrl-J - Alternative binding

Press Enter normally to submit your message.

zo --chat 'Help me write a function'
> def process_data(items):  # Alt-Enter to continue
>     for item in items:    # Alt-Enter to continue
>         yield item * 2    # Enter to submit

Chat with Initial Context

File References

Start the chat with file context using @filename syntax:

# Single file
zo --chat @code.rs 'Let us refactor this together'
> Make it more functional
> Add error handling
> Now add tests
> exit
 
# Multiple files
zo --chat @v1.py @v2.py 'Help me merge these versions'
> Focus on the database code
> What about the API changes?
> exit
 
# With model selection
zo --chat /sonnet @main.rs 'Code review session'
> Check the error handling
> What about performance?
> exit

Piped Input (STDIN)

Start the chat with piped content:

# From git diff
git diff | zo --chat 'Let us review these changes'
> What about the tests?
> Should I refactor anything?
> exit
 
# From error log
cat error.log | zo --chat /debugger 'Help me debug this'
> What should I check first?
> I tried that, still failing
> exit
 
# From command output
cargo build 2>&1 | zo --chat 'Let us fix these errors'
> Start with the most critical
> Explain that error in detail
> exit

Combining File References and STDIN

# Git diff + README context
git diff | zo --chat @README.md 'Do these changes need documentation updates?'
> Should I update the examples?
> Write the updated section
> exit

Interactive File References

You can use @filename syntax in any message during the chat, not just the initial prompt:

zo --chat 'I need help with my project'
> @src/main.rs Review this file
> @tests/test.rs Do the tests cover the main logic?
> @Cargo.toml Should I add any dependencies?
> exit

File Error Handling: If a file doesn't exist or can't be read during chat, zo displays an error and prompts you again without exiting the session.

Use Cases

Pair Programming

zo --chat /coder @app.rs 'Let us implement authentication'
> Add password hashing with bcrypt
> Now add JWT token generation
> Add refresh token logic
> Write unit tests for the auth flow
> exit

Why it works: Each response builds on the previous code. The AI remembers what you've already implemented.

Iterative Debugging

cat stacktrace.log | zo --chat /debugger 'Let us debug this crash'
> What is the most likely cause?
> I checked that, the variable is initialized
> Could it be a race condition?
> How do I verify that?
> exit

Why it works: The AI maintains context about what you've already tried, making suggestions more targeted.

Learning Sessions

zo --chat /teacher 'Teach me about Redis internals'
> How does it achieve such high performance?
> Explain the data structures it uses
> How does persistence work?
> What about replication?
> Give me a practical example
> exit

Why it works: Each explanation builds on previous ones, creating a coherent learning path.

Architecture Design

zo --chat /architect 'Design a real-time notification system'
> It needs to handle 100k concurrent users
> What about message persistence?
> How do we handle failures?
> What is the deployment strategy?
> Estimate the costs
> exit

Why it works: Design decisions compound. Later questions can reference earlier architectural choices.

Code Review Conversations

zo --chat /reviewer @pull_request.diff 'Comprehensive PR review'
> What are the security concerns?
> I fixed the SQL injection. What else?
> Is the error handling sufficient?
> Any performance issues?
> exit

Why it works: Iterative review where you can ask for clarification and address feedback incrementally.

Documentation Writing

zo --chat /writer @lib.rs @examples/ 'Help me write comprehensive docs'
> Start with an overview
> Now document the main API functions
> Add usage examples for each function
> Include common pitfalls section
> exit

Why it works: Documentation sections build on each other. Later sections can reference earlier explanations.

Technical Details

Context Preservation

zo maintains a complete message history:

pub struct ChatSession {
    messages: Vec<Message>,  // Full conversation history
    model_entry: ModelEntry,
    client: OpenRouterClient,
}

Every message includes:

System prompt (if defined for custom model)
All previous user messages
All previous assistant responses
Current user message

Token implications: Long conversations consume more tokens. Most models have context limits (8K-200K tokens).

Input Handling

Interactive Input with Piped STDIN

When you start chat with piped STDIN, zo uses a special technique:

cat file.txt | zo --chat 'Analyze this'
# STDIN consumed for first message
> Follow-up question  # ← This reads from keyboard (TTY)
> exit

How it works: On Unix systems, zo reads subsequent messages from /dev/tty instead of stdin. This allows interactive input even when initial stdin was piped.

Non-Unix systems: Falls back to regular stdin (may not work with piped initial input).

File References in Chat

File references are parsed fresh in each message:

zo --chat 'Hello'
> @file1.txt Analyze this
# File read and sent with this message
> @file2.txt Now analyze this one
# Different file read and sent with this message
> exit

Each @filename reference is independent and read at the time of the message.

Error Recovery

If an API call fails during chat:

zo --chat 'Test'
> This is a long question...
# Network error occurs
Error: Network request failed
Retry? [y/N]: y
# Re-sends the same message with full history

History preservation: Your message history is NOT modified on error. You can retry with the same context.

Model Selection

The model is selected at chat session start and used for all messages:

# Model selected here ↓
zo --chat /sonnet 'Question 1'
> Question 2  # Uses sonnet
> Question 3  # Uses sonnet
> exit
 
# Cannot change mid-chat
zo --chat /sonnet 'Question'
> /opus 'Question'  # ← This doesn't switch models!
# Treats "/opus" as part of the message text

To change models: Exit and start a new chat session.

System Prompts

If using a custom model with a system prompt, it applies to the entire conversation:

# ~/.config/zo/config.toml
[[custom_models]]
name = "coder"
model = "anthropic/claude-sonnet-4.5"
system_prompt = "You are an expert programmer. Provide concise, tested code."

zo --chat /coder 'Write a function'
> Make it faster  # System prompt still applies
> Add tests      # System prompt still applies
> exit

The system prompt is sent with every request, ensuring consistent behavior.

Best Practices

✅ Do This

# Start with clear context
zo --chat @project.rs 'Let us review and improve this code'
 
# Ask follow-up questions
> Focus on the error handling
> Can you explain that pattern?
 
# Build iteratively
zo --chat 'Design a URL shortener'
> Add analytics tracking
> Now add rate limiting
> How do we scale it?
 
# Use appropriate models
zo --chat /opus 'Complex system design discussion'
zo --chat /flash 'Quick questions'
 
# Exit cleanly
> exit

❌ Avoid This

# Don't repeat context unnecessarily
> Here is my code again: [paste entire code]
# The AI already has it from earlier messages
 
# Don't try to change models mid-chat
> /gpt4 'switch to GPT-4'  # Doesn't work
 
# Don't ignore context limits
# Very long chat sessions may hit model context limits
 
# Don't use for single questions
zo --chat 'What is 2+2?'  # Just use: zo 'What is 2+2?'
> exit

Advanced Patterns

Multi-File Workflow

zo --chat 'Help me build a web API'
> @schema.sql Start with database schema
> @models.py Generate Python models from that schema
> @api.py Create REST endpoints
> @tests/test_api.py Write integration tests
> exit

Iterative Refinement

zo --chat /coder 'Write a binary search tree in Rust'
> Add a delete method
> Make it generic over any Ord type
> Add iterators
> Optimize the rebalancing
> Add comprehensive documentation
> exit

Problem Solving Session

cat benchmark.txt | zo --chat 'My app is slow, help me optimize'
> Profile the hot path
> I see malloc is called frequently. Explain.
> How do I use a memory pool?
> Show me the implementation
> How much improvement should I expect?
> exit

Educational Dialogue

zo --chat /teacher 'Explain how compilers work'
> What is the difference between lexing and parsing?
> Show me a simple example
> How does semantic analysis work?
> What about optimization passes?
> Can you recommend resources to learn more?
> exit

Shell Integration

Quick Chat Alias

# ~/.bashrc or ~/.zshrc
alias chat='zo --chat'
alias chat-code='zo --chat /coder'
alias chat-review='zo --chat /reviewer'
alias chat-debug='zo --chat /debugger'

Usage:

chat 'Let us discuss databases'
chat-code @main.rs 'Refactor this'
chat-review @pr.diff 'Review this PR'

Chat with Context Function

# Start chat with git context
gitchat() {
    git diff | zo --chat /reviewer 'Let us review these changes'
}
 
# Chat about errors
debugchat() {
    "$@" 2>&1 | zo --chat /debugger 'Let us fix this'
}

Usage:

gitchat
debugchat cargo build

Comparison: Chat Mode vs Single Request

Use Chat Mode When:

✅ You need to ask follow-up questions
✅ Building something iteratively
✅ Exploring a topic in depth
✅ Debugging complex issues
✅ Pair programming sessions
✅ Learning conversations

Use Single Request When:

✅ One-off questions
✅ Quick transformations
✅ Simple analysis
✅ Scripting/automation
✅ Pipeline processing

Examples Gallery

Real-World Chat Sessions

Session 1: Feature Implementation

zo --chat /coder @app.rs 'Add user authentication'
> Use JWT tokens
> Add password reset functionality
> Add rate limiting on login endpoint
> Write integration tests
> Document the auth flow in comments
> exit

Session 2: Performance Investigation

cat profile.txt | zo --chat /debugger 'App is using 2GB memory'
> Show me common memory leak patterns in this context
> I see a lot of Vec allocations. Explain.
> Should I use a different data structure?
> Show me how to use a slab allocator here
> exit

Session 3: Learning New Concept

zo --chat /teacher 'Explain the ownership system of Rust'
> Give me a simple example
> What happens if I try to use a value after moving it?
> How do borrowing and references work?
> When should I use Rc vs Arc?
> Give me a real-world example using all these concepts
> exit

Session 4: Architecture Review

zo --chat @architecture.md 'Review this system design'
> What are the bottlenecks?
> How should we handle database failover?
> What about cross-region replication?
> Estimate the infrastructure costs
> Suggest monitoring strategy
> exit

Troubleshooting

Context Limit Exceeded

Problem: Error about context/token limit.

Solution:

Use models with larger context windows (Claude has 200K, GPT-4 has 128K)
Start a new chat session to clear history
Summarize earlier parts of the conversation

# Summarize and start fresh
> Summarize what we discussed so far
# Copy summary
> exit
 
zo --chat 'Continuing from: [paste summary]'

Can't Input After STDIN

Problem: Can't type after piping initial input.

Solution: This should work automatically on Unix systems. If not:

# Workaround: save to file first
cat data.txt > /tmp/context.txt
zo --chat @/tmp/context.txt 'Discuss this'

Lost Context

Problem: AI seems to forget earlier messages.

This shouldn't happen - zo sends full history. If it does:

Check for API errors in earlier messages
Verify the model has sufficient context window
File an issue - this is a bug

Model Not Using System Prompt

Problem: Custom model not following system prompt.

Solution: Verify configuration:

# List models (shows custom ones too)
zo +list-models | grep yourmodel
 
# Check config file
cat ~/.config/zo/config.toml

System prompt must be in [[custom_models]] section.

Performance Considerations

Token Usage

Every message sends full history:

Message 1: system_prompt + user1
Message 2: system_prompt + user1 + assistant1 + user2
Message 3: system_prompt + user1 + assistant1 + user2 + assistant2 + user3

Token cost grows quadratically with conversation length.

Tips:

Keep sessions focused
Use cheaper models for long conversations
Start new session when changing topics

Latency

Each message includes full history, so:

Longer conversations = more tokens to process
Slower response times as chat grows
More API cost per message

Recommendation: For very long sessions (20+ exchanges), consider starting fresh and providing a summary.

Memory Usage

Full conversation history kept in memory. For typical sessions (<100 messages), this is negligible (<1MB).

Next Steps

STDIN Pipelines → - Pipe command output to zo
Custom Models → - Create specialized chat personas
Shell Integration → - Workflow integration
Configuration → - Set up custom models