Chat Mode
Have multi-turn conversations with AI models where every response builds on previous context. Perfect for iterative problem-solving, pair programming, and in-depth discussions.
Overview
Chat mode enables:
- Context preservation - AI remembers the entire conversation
- Iterative refinement - Build on previous responses
- File references - Use
@filenamesyntax in any message - Model persistence - Same model throughout the session
- System prompts - Custom model personas applied to all messages
- Error recovery - Retry on failures without losing history
Basic Usage
Starting a Chat Session
# Basic chat
zo --chat 'Let us discuss Rust lifetimes'
> Can you give me an example?
> What happens if I violate the rules?
> exit
# With model selection (slash command)
zo --chat /sonnet 'Explain async/await'
> Show me a practical example
> How does it compare to threads?
> exit
# With model selection (CLI flag)
zo --chat --model opus 'Let us design a system'
> What about scalability?
> How would you handle failures?
> exitExiting Chat
Multiple ways to exit:
- Type
exit - Type
quit - Type
q - Press
Ctrl+D(EOF)
zo --chat 'Hello'
> Let us talk about databases
> exit # ← Exits the chat sessionMultiline Input
For entering multiline messages, use one of these key combinations:
- Alt-Enter - Works in most modern terminals (recommended)
- Ctrl-O - Fallback for terminals where Alt-Enter doesn't work
- Ctrl-J - Alternative binding
Press Enter normally to submit your message.
zo --chat 'Help me write a function'
> def process_data(items): # Alt-Enter to continue
> for item in items: # Alt-Enter to continue
> yield item * 2 # Enter to submitChat with Initial Context
File References
Start the chat with file context using @filename syntax:
# Single file
zo --chat @code.rs 'Let us refactor this together'
> Make it more functional
> Add error handling
> Now add tests
> exit
# Multiple files
zo --chat @v1.py @v2.py 'Help me merge these versions'
> Focus on the database code
> What about the API changes?
> exit
# With model selection
zo --chat /sonnet @main.rs 'Code review session'
> Check the error handling
> What about performance?
> exitPiped Input (STDIN)
Start the chat with piped content:
# From git diff
git diff | zo --chat 'Let us review these changes'
> What about the tests?
> Should I refactor anything?
> exit
# From error log
cat error.log | zo --chat /debugger 'Help me debug this'
> What should I check first?
> I tried that, still failing
> exit
# From command output
cargo build 2>&1 | zo --chat 'Let us fix these errors'
> Start with the most critical
> Explain that error in detail
> exitCombining File References and STDIN
# Git diff + README context
git diff | zo --chat @README.md 'Do these changes need documentation updates?'
> Should I update the examples?
> Write the updated section
> exitInteractive File References
You can use @filename syntax in any message during the chat, not just the initial prompt:
zo --chat 'I need help with my project'
> @src/main.rs Review this file
> @tests/test.rs Do the tests cover the main logic?
> @Cargo.toml Should I add any dependencies?
> exitFile Error Handling: If a file doesn't exist or can't be read during chat, zo displays an error and prompts you again without exiting the session.
Use Cases
Pair Programming
zo --chat /coder @app.rs 'Let us implement authentication'
> Add password hashing with bcrypt
> Now add JWT token generation
> Add refresh token logic
> Write unit tests for the auth flow
> exitWhy it works: Each response builds on the previous code. The AI remembers what you've already implemented.
Iterative Debugging
cat stacktrace.log | zo --chat /debugger 'Let us debug this crash'
> What is the most likely cause?
> I checked that, the variable is initialized
> Could it be a race condition?
> How do I verify that?
> exitWhy it works: The AI maintains context about what you've already tried, making suggestions more targeted.
Learning Sessions
zo --chat /teacher 'Teach me about Redis internals'
> How does it achieve such high performance?
> Explain the data structures it uses
> How does persistence work?
> What about replication?
> Give me a practical example
> exitWhy it works: Each explanation builds on previous ones, creating a coherent learning path.
Architecture Design
zo --chat /architect 'Design a real-time notification system'
> It needs to handle 100k concurrent users
> What about message persistence?
> How do we handle failures?
> What is the deployment strategy?
> Estimate the costs
> exitWhy it works: Design decisions compound. Later questions can reference earlier architectural choices.
Code Review Conversations
zo --chat /reviewer @pull_request.diff 'Comprehensive PR review'
> What are the security concerns?
> I fixed the SQL injection. What else?
> Is the error handling sufficient?
> Any performance issues?
> exitWhy it works: Iterative review where you can ask for clarification and address feedback incrementally.
Documentation Writing
zo --chat /writer @lib.rs @examples/ 'Help me write comprehensive docs'
> Start with an overview
> Now document the main API functions
> Add usage examples for each function
> Include common pitfalls section
> exitWhy it works: Documentation sections build on each other. Later sections can reference earlier explanations.
Technical Details
Context Preservation
zo maintains a complete message history:
pub struct ChatSession {
messages: Vec<Message>, // Full conversation history
model_entry: ModelEntry,
client: OpenRouterClient,
}Every message includes:
- System prompt (if defined for custom model)
- All previous user messages
- All previous assistant responses
- Current user message
Token implications: Long conversations consume more tokens. Most models have context limits (8K-200K tokens).
Input Handling
Interactive Input with Piped STDIN
When you start chat with piped STDIN, zo uses a special technique:
cat file.txt | zo --chat 'Analyze this'
# STDIN consumed for first message
> Follow-up question # ← This reads from keyboard (TTY)
> exitHow it works: On Unix systems, zo reads subsequent messages from /dev/tty instead of stdin. This allows interactive input even when initial stdin was piped.
Non-Unix systems: Falls back to regular stdin (may not work with piped initial input).
File References in Chat
File references are parsed fresh in each message:
zo --chat 'Hello'
> @file1.txt Analyze this
# File read and sent with this message
> @file2.txt Now analyze this one
# Different file read and sent with this message
> exitEach @filename reference is independent and read at the time of the message.
Error Recovery
If an API call fails during chat:
zo --chat 'Test'
> This is a long question...
# Network error occurs
Error: Network request failed
Retry? [y/N]: y
# Re-sends the same message with full historyHistory preservation: Your message history is NOT modified on error. You can retry with the same context.
Model Selection
The model is selected at chat session start and used for all messages:
# Model selected here ↓
zo --chat /sonnet 'Question 1'
> Question 2 # Uses sonnet
> Question 3 # Uses sonnet
> exit
# Cannot change mid-chat
zo --chat /sonnet 'Question'
> /opus 'Question' # ← This doesn't switch models!
# Treats "/opus" as part of the message textTo change models: Exit and start a new chat session.
System Prompts
If using a custom model with a system prompt, it applies to the entire conversation:
# ~/.config/zo/config.toml
[[custom_models]]
name = "coder"
model = "anthropic/claude-sonnet-4.5"
system_prompt = "You are an expert programmer. Provide concise, tested code."zo --chat /coder 'Write a function'
> Make it faster # System prompt still applies
> Add tests # System prompt still applies
> exitThe system prompt is sent with every request, ensuring consistent behavior.
Best Practices
✅ Do This
# Start with clear context
zo --chat @project.rs 'Let us review and improve this code'
# Ask follow-up questions
> Focus on the error handling
> Can you explain that pattern?
# Build iteratively
zo --chat 'Design a URL shortener'
> Add analytics tracking
> Now add rate limiting
> How do we scale it?
# Use appropriate models
zo --chat /opus 'Complex system design discussion'
zo --chat /flash 'Quick questions'
# Exit cleanly
> exit❌ Avoid This
# Don't repeat context unnecessarily
> Here is my code again: [paste entire code]
# The AI already has it from earlier messages
# Don't try to change models mid-chat
> /gpt4 'switch to GPT-4' # Doesn't work
# Don't ignore context limits
# Very long chat sessions may hit model context limits
# Don't use for single questions
zo --chat 'What is 2+2?' # Just use: zo 'What is 2+2?'
> exitAdvanced Patterns
Multi-File Workflow
zo --chat 'Help me build a web API'
> @schema.sql Start with database schema
> @models.py Generate Python models from that schema
> @api.py Create REST endpoints
> @tests/test_api.py Write integration tests
> exitIterative Refinement
zo --chat /coder 'Write a binary search tree in Rust'
> Add a delete method
> Make it generic over any Ord type
> Add iterators
> Optimize the rebalancing
> Add comprehensive documentation
> exitProblem Solving Session
cat benchmark.txt | zo --chat 'My app is slow, help me optimize'
> Profile the hot path
> I see malloc is called frequently. Explain.
> How do I use a memory pool?
> Show me the implementation
> How much improvement should I expect?
> exitEducational Dialogue
zo --chat /teacher 'Explain how compilers work'
> What is the difference between lexing and parsing?
> Show me a simple example
> How does semantic analysis work?
> What about optimization passes?
> Can you recommend resources to learn more?
> exitShell Integration
Quick Chat Alias
# ~/.bashrc or ~/.zshrc
alias chat='zo --chat'
alias chat-code='zo --chat /coder'
alias chat-review='zo --chat /reviewer'
alias chat-debug='zo --chat /debugger'Usage:
chat 'Let us discuss databases'
chat-code @main.rs 'Refactor this'
chat-review @pr.diff 'Review this PR'Chat with Context Function
# Start chat with git context
gitchat() {
git diff | zo --chat /reviewer 'Let us review these changes'
}
# Chat about errors
debugchat() {
"$@" 2>&1 | zo --chat /debugger 'Let us fix this'
}Usage:
gitchat
debugchat cargo buildComparison: Chat Mode vs Single Request
Use Chat Mode When:
✅ You need to ask follow-up questions
✅ Building something iteratively
✅ Exploring a topic in depth
✅ Debugging complex issues
✅ Pair programming sessions
✅ Learning conversations
Use Single Request When:
✅ One-off questions
✅ Quick transformations
✅ Simple analysis
✅ Scripting/automation
✅ Pipeline processing
Examples Gallery
Real-World Chat Sessions
Session 1: Feature Implementation
zo --chat /coder @app.rs 'Add user authentication'
> Use JWT tokens
> Add password reset functionality
> Add rate limiting on login endpoint
> Write integration tests
> Document the auth flow in comments
> exitSession 2: Performance Investigation
cat profile.txt | zo --chat /debugger 'App is using 2GB memory'
> Show me common memory leak patterns in this context
> I see a lot of Vec allocations. Explain.
> Should I use a different data structure?
> Show me how to use a slab allocator here
> exitSession 3: Learning New Concept
zo --chat /teacher 'Explain the ownership system of Rust'
> Give me a simple example
> What happens if I try to use a value after moving it?
> How do borrowing and references work?
> When should I use Rc vs Arc?
> Give me a real-world example using all these concepts
> exitSession 4: Architecture Review
zo --chat @architecture.md 'Review this system design'
> What are the bottlenecks?
> How should we handle database failover?
> What about cross-region replication?
> Estimate the infrastructure costs
> Suggest monitoring strategy
> exitTroubleshooting
Context Limit Exceeded
Problem: Error about context/token limit.
Solution:
- Use models with larger context windows (Claude has 200K, GPT-4 has 128K)
- Start a new chat session to clear history
- Summarize earlier parts of the conversation
# Summarize and start fresh
> Summarize what we discussed so far
# Copy summary
> exit
zo --chat 'Continuing from: [paste summary]'Can't Input After STDIN
Problem: Can't type after piping initial input.
Solution: This should work automatically on Unix systems. If not:
# Workaround: save to file first
cat data.txt > /tmp/context.txt
zo --chat @/tmp/context.txt 'Discuss this'Lost Context
Problem: AI seems to forget earlier messages.
This shouldn't happen - zo sends full history. If it does:
- Check for API errors in earlier messages
- Verify the model has sufficient context window
- File an issue - this is a bug
Model Not Using System Prompt
Problem: Custom model not following system prompt.
Solution: Verify configuration:
# List models (shows custom ones too)
zo +list-models | grep yourmodel
# Check config file
cat ~/.config/zo/config.tomlSystem prompt must be in [[custom_models]] section.
Performance Considerations
Token Usage
Every message sends full history:
Message 1: system_prompt + user1
Message 2: system_prompt + user1 + assistant1 + user2
Message 3: system_prompt + user1 + assistant1 + user2 + assistant2 + user3Token cost grows quadratically with conversation length.
Tips:
- Keep sessions focused
- Use cheaper models for long conversations
- Start new session when changing topics
Latency
Each message includes full history, so:
- Longer conversations = more tokens to process
- Slower response times as chat grows
- More API cost per message
Recommendation: For very long sessions (20+ exchanges), consider starting fresh and providing a summary.
Memory Usage
Full conversation history kept in memory. For typical sessions (<100 messages), this is negligible (<1MB).
Next Steps
- STDIN Pipelines → - Pipe command output to zo
- Custom Models → - Create specialized chat personas
- Shell Integration → - Workflow integration
- Configuration → - Set up custom models