# Environment Information Collection Guide This guide provides systematic approaches for gathering environment information during error troubleshooting while maintaining strict privacy and security standards. ## Core Privacy Principles ### Never Collect Without Authorization **Absolutely Forbidden** (never collect these): - Passwords or password hashes - API keys, tokens, or credentials - Private keys or certificates - Personal identifiable information (PII): names, emails, addresses, phone numbers - Financial information - Session cookies - Authentication headers - Database connection strings with credentials **Requires Explicit User Permission**: - Project-specific file paths - Custom configuration files - Custom environment variables - Application logs (may contain sensitive data) - Network configurations - User-specific system settings **Generally Safe** (collect without permission): - Public software versions - Operating system type and version - Public package versions - Standard error messages - Command outputs that don't reveal sensitive paths or data ### Privacy-First Collection Strategy 1. **Assess Necessity**: Only collect information directly relevant to the error 2. **Request Permission**: When in doubt, ask the user first 3. **Sanitize Output**: Remove sensitive data before recording 4. **Minimize Scope**: Collect the smallest amount needed 5. **Explain Purpose**: Tell the user why specific information is needed ## Environment Information Categories ### 1. System Environment **What to Collect:** - Operating system and version - System architecture (x86, ARM, etc.) - Shell/terminal type - Locale and encoding settings **Collection Commands:** ```bash # Cross-platform OS detection # Linux/Mac uname -a uname -s # Just OS name uname -r # Just kernel version # Windows ver systeminfo | findstr /B /C:"OS Name" /C:"OS Version" # Architecture uname -m # Linux/Mac echo %PROCESSOR_ARCHITECTURE% # Windows # Locale locale # Linux/Mac echo %LANG% # Unix-like chcp # Windows (code page) ``` **Privacy Notes:** - These commands generally don't expose sensitive information - Hostname may be included in `uname -a` output (consider sanitizing) ### 2. Language Runtime Environment **What to Collect:** - Programming language version - Runtime environment details - Virtual environment status **Collection Commands by Language:** #### Python ```bash # Version python --version python3 --version # Detailed info python -c "import sys; print(sys.version)" # Virtual environment detection echo $VIRTUAL_ENV # Unix-like echo %VIRTUAL_ENV% # Windows # Python path python -c "import sys; print(sys.executable)" ``` #### Node.js/JavaScript ```bash # Versions node --version npm --version yarn --version # Node environment echo $NODE_ENV # Global package location npm config get prefix ``` #### Java ```bash # Version java -version javac -version # Runtime details java -XshowSettings:properties -version 2>&1 | grep 'java.version' ``` #### Ruby ```bash # Version ruby --version # Gem environment gem env ``` #### Go ```bash # Version go version # Environment go env ``` **Privacy Notes:** - Installation paths may reveal usernames (sanitize if needed) - Custom environment variables may contain sensitive data ### 3. Package Dependencies **What to Collect:** - Installed package versions (relevant to error) - Package manager version - Dependency conflicts - Lock file status **Collection Commands:** #### Python (pip) ```bash # Specific package version pip show pip list | grep # All packages (use sparingly, large output) pip list # Dependency conflicts pip check # Requirements file cat requirements.txt # Request permission first ``` #### Python (conda) ```bash # Environment info conda info # Installed packages conda list # Environment exports conda env export # Use with caution, may be large ``` #### Node.js (npm) ```bash # Specific package version npm list npm view version # Global packages npm list -g --depth=0 # Dependency audit npm doctor npm audit # Lock file status ls -l package-lock.json ``` #### Ruby (gem) ```bash # Specific gem version gem list # All gems gem list # Gem environment gem env ``` **Privacy Notes:** - Package lists can be large; collect only relevant packages when possible - Lock files may contain private registry URLs (review before sharing) - Package names might reveal business logic (request permission for private packages) ### 4. Configuration Files **What to Collect:** Configuration files often contain sensitive data. Always exercise caution. **Approach:** 1. **Identify relevant config**: Only collect configs directly related to the error 2. **Request permission**: Always ask before reading project-specific configs 3. **Sanitize**: Remove credentials, API keys, and sensitive values before recording 4. **Provide context**: Explain why the config is needed **Common Configuration Files:** ```bash # Python # - setup.py, setup.cfg, pyproject.toml, tox.ini # Node.js # - package.json (usually safe), .npmrc (check for tokens) # General # - .env files (NEVER share without sanitization) # - config.json, config.yaml (sanitize before sharing) ``` **Sanitization Example:** ```yaml # Before sanitization database: host: db.example.com username: admin password: super_secret_123 port: 5432 # After sanitization database: host: [REDACTED] username: [REDACTED] password: [REDACTED] port: 5432 ``` ### 5. Environment Variables **What to Collect:** Environment variables often contain sensitive data. Collect with extreme caution. **Approach:** 1. **Be Specific**: Only check specific, relevant variables 2. **Avoid Wildcards**: Never do `env` or `printenv` without filtering 3. **Sanitize**: Redact values that might be sensitive 4. **Public Variables Only**: Prefer checking well-known, non-sensitive variables **Safe Environment Variables:** ```bash # Locale and encoding echo $LANG echo $LC_ALL # Shell echo $SHELL # Path (usually safe, but may reveal usernames) echo $PATH # Node environment echo $NODE_ENV # Python path echo $PYTHONPATH ``` **Potentially Sensitive Variables:** ```bash # Requires permission or sanitization # - API keys: $API_KEY, $SECRET_KEY, $TOKEN # - Database URLs: $DATABASE_URL # - Credentials: $USERNAME, $PASSWORD # - Custom app settings: $APP_* variables ``` **Collection Command (filtered):** ```bash # Safe: Check specific variable echo $LANG # Risky: List all variables (AVOID unless necessary) env printenv # Better: Filter for specific patterns env | grep -i "^PYTHON" env | grep -i "^NODE" ``` ### 6. Network and Connectivity **What to Collect:** - Network connectivity status - DNS resolution (for external services) - Proxy settings - Firewall status (general) **Collection Commands:** ```bash # Test connectivity ping -c 4 google.com # Linux/Mac ping -n 4 google.com # Windows # DNS resolution nslookup example.com dig example.com # Linux/Mac # Proxy settings (may contain credentials - sanitize) echo $HTTP_PROXY echo $HTTPS_PROXY # Network interfaces (general info) ifconfig # Linux/Mac ipconfig # Windows # Firewall status (general) sudo ufw status # Linux (Ubuntu) netsh advfirewall show allprofiles # Windows (requires admin) ``` **Privacy Notes:** - Internal IP addresses are generally low-risk - Proxy settings may contain authentication credentials (sanitize) - Network topology might be sensitive in enterprise environments ### 7. File System and Permissions **What to Collect:** - File existence and permissions (for files mentioned in error) - Directory structure (limited) - Disk space (if relevant) **Collection Commands:** ```bash # File info (request permission if non-system file) ls -la /path/to/file # Linux/Mac dir /path/to/file # Windows # Permissions stat /path/to/file # Linux/Mac # Disk space df -h # Linux/Mac wmic logicaldisk get size,freespace,caption # Windows # Check if file exists test -f /path/to/file && echo "exists" || echo "not found" ``` **Privacy Notes:** - File paths may reveal usernames or project structure (request permission) - Avoid listing directory contents unless necessary ## Collection Workflow ### Step 1: Assess Relevance Before collecting any information, ask: - Is this directly related to the error? - Will this information help diagnose or resolve the issue? - Is there a less invasive way to get the same information? ### Step 2: Categorize Sensitivity Classify the information: - **Public**: Widely available, non-sensitive (e.g., OS version) - **Private**: User-specific but non-confidential (e.g., package versions) - **Confidential**: May contain sensitive data (e.g., config files) - **Secret**: Credentials, keys, PII (NEVER collect without explicit permission) ### Step 3: Request Permission When Needed For private or confidential information: ``` "To diagnose this error, I need to check [specific information]. This will involve [specific action]. Is it okay to proceed?" ``` Example: ``` "To diagnose this database connection error, I need to check your database configuration settings. This will involve reading the config/database.yml file. Any sensitive values will be redacted. Is it okay to proceed?" ``` ### Step 4: Collect and Sanitize Execute the collection command and immediately sanitize: 1. **Capture output** 2. **Review for sensitive data** 3. **Redact or replace sensitive values** 4. **Document what was redacted** ### Step 5: Document Collection Record what was collected and why: - What information was gathered - Why it was needed - What commands were used - What was sanitized ## Sanitization Techniques ### Pattern-Based Redaction Common patterns to redact: ```bash # API keys (various formats) AIza[0-9A-Za-z-_]{35} # Google API keys sk_live_[0-9a-zA-Z]{24} # Stripe keys [0-9a-f]{32} # Generic 32-char hex keys # Email addresses [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,} # URLs with credentials https?://[^:]+:[^@]+@[^/]+ # IP addresses (if needed) \b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b # File paths with usernames /home/[^/]+/ -> /home/[USERNAME]/ C:\\Users\\[^\\]+\\ -> C:\Users\[USERNAME]\ ``` ### Replacement Strategies ```bash # Replace with generic placeholder password: super_secret_123 → password: [REDACTED] # Replace with type indicator api_key: sk_live_abc123xyz → api_key: [API_KEY] # Partial redaction email: john.doe@example.com → email: [***]@example.com # Anonymize paths /home/john/project → /home/[USER]/project ``` ## Quick Reference: Collection Decision Tree ``` Need environment information? ↓ Is it sensitive or user-specific? ├─ NO → Collect directly │ (e.g., OS version, Python version) │ └─ YES → Does it contain credentials or PII? ├─ YES → Request explicit permission │ ↓ │ Permission granted? │ ├─ YES → Collect and sanitize │ └─ NO → Find alternative approach │ └─ NO → Is it project-specific? ├─ YES → Request permission └─ NO → Collect and sanitize proactively ``` ## Common Scenarios ### Scenario 1: Module Not Found Error **Information Needed:** - Python version - pip version - Virtual environment status - Package installation status **Collection:** ```bash python --version pip --version echo $VIRTUAL_ENV pip show ``` **Privacy Impact:** Low (all public information) ### Scenario 2: Database Connection Error **Information Needed:** - Database client version - Connection configuration (sanitized) - Network connectivity **Collection:** ```bash # Client version (safe) psql --version # PostgreSQL mysql --version # MySQL # Configuration (REQUIRES PERMISSION) # Request permission, then read config with sanitization # Connectivity (safe) ping -c 4 database.host.com nslookup database.host.com ``` **Privacy Impact:** Medium-High (config contains credentials) ### Scenario 3: Build Failure **Information Needed:** - Compiler/build tool version - System libraries - Build configuration **Collection:** ```bash # Build tools (safe) gcc --version make --version cmake --version # Package manager (safe) apt list --installed | grep # Debian/Ubuntu brew info # macOS # Build config (request permission for project-specific) cat CMakeLists.txt cat Makefile ``` **Privacy Impact:** Low-Medium (build config might reveal project details) ## Best Practices Summary 1. **Collect Minimally**: Only gather what's directly relevant 2. **Request Permission**: When information is user-specific or potentially sensitive 3. **Sanitize Proactively**: Remove credentials and PII before recording 4. **Document Purpose**: Explain why information is needed 5. **Validate Necessity**: Double-check if collection is truly required 6. **Use Specific Commands**: Avoid broad commands like `env` or `find /` 7. **Respect User Privacy**: When uncertain, err on the side of asking permission 8. **Provide Context**: Help users understand what information will be accessed ## Red Flags: Never Collect - Raw credential files (.env, credentials.json) - Browser cookies or session storage - SSH keys or SSL certificates - Database dumps - Full process listings (might expose arguments with credentials) - Complete environment variable dumps - User home directory listings - Git repository contents (without permission) - Application logs (without permission and sanitization)