Are you an LLM? Read llms.txt for a summary of the docs, or llms-full.txt for the full context.
Skip to content

Data Analysis Examples

Use zo for quick data analysis, processing, and insights from the command line.

CSV Analysis

Basic Statistics

cat sales.csv | zo 'Calculate summary statistics:
- Total revenue
- Average order value
- Number of transactions
- Revenue by category'

Find Patterns

cat customer_data.csv | zo 'Analyze this customer data:
- Find patterns in purchase behavior
- Identify high-value customers
- Suggest segmentation strategy'

Data Quality Check

cat data.csv | zo 'Check data quality:
- Missing values per column
- Outliers
- Duplicate records
- Data type issues'

Compare Datasets

zo @dataset1.csv @dataset2.csv 'Compare these datasets:
- Schema differences
- Value distribution changes
- New/removed records'

JSON Processing

Extract Data

curl -s https://1.0.0.1/cdn-cgi/trace | zo 'Extract:
- IP
- Location
- TLS Version'

Transform Format

cat data.json | zo 'Convert to CSV with columns: id, name, email, created_at'

Query Data

cat large_api_response.json | zo 'Find all users with:
- More than 100 followers
- Created in 2024
- Located in California'

Generate jq Command

cat sample.json | zo 'Generate jq command to extract all email addresses from nested user objects'

Log Analysis

Error Summary

tail -1000 app.log | zo 'Summarize errors:
- Most common error types
- Error frequency
- Affected components
- Suggested fixes'

Pattern Detection

grep ERROR /var/log/nginx/error.log | zo 'Find patterns:
- Time of day with most errors
- Common error causes
- Suspicious IP addresses
- Recommended actions'

Performance Analysis

cat slow_queries.log | zo 'Analyze slow queries:
- Slowest queries
- Common patterns in slow queries
- Optimization suggestions
- Index recommendations'

Security Analysis

cat access.log | zo 'Security analysis:
- Suspicious request patterns
- Potential attack attempts
- Unusual user agents
- Recommended security measures'

SQL Generation

Simple Query

zo @schema.sql 'Generate query to:
- Find top 10 customers by total purchases
- Include customer name, email, purchase count, and total spent
- Order by total spent descending'

Complex Query

zo @schema.sql 'Generate query with:
- Join customers, orders, and products
- Filter for 2024 purchases
- Group by product category
- Calculate revenue per category
- Include percentage of total'

Schema Design

zo 'Generate SQL schema for:
- E-commerce platform
- Users, products, orders, reviews
- Include foreign keys, indexes
- Add constraints and defaults'

Data Transformation

Format Conversion

# CSV to JSON
cat data.csv | zo 'Convert to JSON array'
 
# JSON to CSV
cat data.json | zo 'Convert to CSV'
 
# XML to JSON
cat data.xml | zo 'Convert to JSON'

Data Cleaning

cat messy_data.csv | zo 'Clean this data:
- Remove duplicates
- Fix date formats to YYYY-MM-DD
- Standardize phone numbers
- Fill missing values with median'

System Diagnostics

Memory Analysis

ps aux --sort=-%mem | head -20 | zo 'Memory analysis:
- Highest memory consumers
- Potential memory leaks
- Processes to investigate
- Recommended actions'

Disk Usage

du -sh * | sort -hr | head -20 | zo 'Disk usage analysis:
- Largest directories
- What can be cleaned
- Suggested optimizations'

Network Analysis

netstat -tuln | zo 'Network analysis:
- Active connections
- Listening ports
- Security concerns
- Recommendations'

Process Analysis

ps aux | zo 'Analyze running processes:
- Resource-intensive processes
- Zombie processes
- Unusual processes
- Recommendations'

API Data Analysis

Rate Limit Analysis

cat api_logs.txt | zo 'Analyze API usage:
- Requests per endpoint
- Peak usage times
- Users hitting rate limits
- Capacity recommendations'

Response Time Analysis

cat api_metrics.json | zo 'Analyze response times:
- Slowest endpoints
- P95, P99 latencies
- Performance trends
- Optimization targets'

Error Rate Analysis

cat api_errors.log | zo 'Error rate analysis:
- Error frequency by endpoint
- Most common error types
- Client vs server errors
- Suggested fixes'

Data Visualization Suggestions

Chart Recommendations

cat sales_data.csv | zo 'Recommend visualizations:
- Best chart types for this data
- Key metrics to highlight
- Interactive features to include
- Color scheme suggestions'

Python Plotting Code

cat data.csv | zo /coder 'Generate Python code to:
- Load this CSV
- Create 3 meaningful visualizations
- Use matplotlib or seaborn
- Include titles and labels'

Dashboard Design

cat metrics.json | zo 'Design dashboard:
- Key metrics to display
- Layout suggestions
- Chart types
- Real-time vs historical views'

Statistical Analysis

Descriptive Statistics

cat survey_results.csv | zo 'Calculate:
- Mean, median, mode for each numeric column
- Standard deviation
- Quartiles
- Correlation matrix'

Hypothesis Testing

cat experiment_data.csv | zo 'Statistical analysis:
- Compare control vs treatment groups
- Calculate p-value
- Determine statistical significance
- Interpret results'

Regression Analysis

cat sales_data.csv | zo 'Regression analysis:
- Predict sales based on marketing spend
- Calculate R-squared
- Identify significant variables
- Interpret coefficients'

Data Pipeline Examples

ETL Pipeline

#!/bin/bash
 
# Extract
curl -s https://api.example.com/data > raw_data.json
 
# Transform
cat raw_data.json | zo 'Transform to CSV with columns: id, name, amount, date' > clean_data.csv
 
# Load (generate SQL)
zo @schema.sql @clean_data.csv 'Generate INSERT statements'

Data Quality Pipeline

#!/bin/bash
 
# Check quality
issues=$(cat data.csv | zo 'List data quality issues as JSON')
 
# Fix issues
echo "$issues" | zo 'Generate Python script to fix these data quality issues'

Advanced Analysis

Time Series Analysis

cat time_series.csv | zo 'Time series analysis:
- Identify trends
- Detect seasonality
- Find anomalies
- Forecast next 30 days
- Confidence intervals'

Cohort Analysis

cat user_cohorts.csv | zo 'Cohort analysis:
- Retention rates by cohort
- Cohort size trends
- Revenue per cohort
- Churn patterns
- Recommendations'

Shell Functions for Data Analysis

Add to ~/.bashrc or ~/.zshrc:

# Quick CSV stats
csvstats() {
    cat "$1" | zo 'Summary statistics for all numeric columns'
}
 
# Find correlations
csvcorr() {
    cat "$1" | zo 'Correlation matrix for numeric columns. Highlight strong correlations.'
}
 
# Data quality check
datacheck() {
    cat "$1" | zo 'Data quality report:
    - Missing values
    - Outliers
    - Duplicates
    - Type mismatches'
}
 
# Generate SQL
sqlgen() {
    zo @schema.sql "$*"
}
 
# API analysis
apistat() {
    cat "$1" | zo 'API usage statistics:
    - Request distribution
    - Error rates
    - Response times
    - Top consumers'
}

Tips for Data Analysis

Provide Context

# ❌ Too vague
cat data.csv | zo 'Analyze this'
 
# ✅ Specific
cat sales_data.csv | zo 'Analyze 2024 sales data:
- Monthly revenue trends
- Top performing products
- Regional performance
- YoY growth rate'

Sample Large Datasets

# First 1000 rows for quick analysis
head -1000 large_file.csv | zo 'Quick analysis'
 
# Random sample
shuf -n 1000 large_file.csv | zo 'Sample analysis'

Iterate with Chat Mode

zo --chat @data.csv 'Let us analyze this sales data'
> What are the top 10 products?
> Show me revenue trends
> Any seasonal patterns?
> exit

Use Appropriate Models

# Quick analysis - fast model
cat data.csv | zo /flash 'Quick summary'
 
# Deep analysis - smart model
cat data.csv | zo /opus 'Comprehensive statistical analysis'

Next Steps