Files
2025-11-29 18:47:55 +08:00

13 KiB

Environment Information Collection Guide

This guide provides systematic approaches for gathering environment information during error troubleshooting while maintaining strict privacy and security standards.

Core Privacy Principles

Never Collect Without Authorization

Absolutely Forbidden (never collect these):

  • Passwords or password hashes
  • API keys, tokens, or credentials
  • Private keys or certificates
  • Personal identifiable information (PII): names, emails, addresses, phone numbers
  • Financial information
  • Session cookies
  • Authentication headers
  • Database connection strings with credentials

Requires Explicit User Permission:

  • Project-specific file paths
  • Custom configuration files
  • Custom environment variables
  • Application logs (may contain sensitive data)
  • Network configurations
  • User-specific system settings

Generally Safe (collect without permission):

  • Public software versions
  • Operating system type and version
  • Public package versions
  • Standard error messages
  • Command outputs that don't reveal sensitive paths or data

Privacy-First Collection Strategy

  1. Assess Necessity: Only collect information directly relevant to the error
  2. Request Permission: When in doubt, ask the user first
  3. Sanitize Output: Remove sensitive data before recording
  4. Minimize Scope: Collect the smallest amount needed
  5. Explain Purpose: Tell the user why specific information is needed

Environment Information Categories

1. System Environment

What to Collect:

  • Operating system and version
  • System architecture (x86, ARM, etc.)
  • Shell/terminal type
  • Locale and encoding settings

Collection Commands:

# Cross-platform OS detection
# Linux/Mac
uname -a
uname -s  # Just OS name
uname -r  # Just kernel version

# Windows
ver
systeminfo | findstr /B /C:"OS Name" /C:"OS Version"

# Architecture
uname -m  # Linux/Mac
echo %PROCESSOR_ARCHITECTURE%  # Windows

# Locale
locale  # Linux/Mac
echo %LANG%  # Unix-like
chcp  # Windows (code page)

Privacy Notes:

  • These commands generally don't expose sensitive information
  • Hostname may be included in uname -a output (consider sanitizing)

2. Language Runtime Environment

What to Collect:

  • Programming language version
  • Runtime environment details
  • Virtual environment status

Collection Commands by Language:

Python

# Version
python --version
python3 --version

# Detailed info
python -c "import sys; print(sys.version)"

# Virtual environment detection
echo $VIRTUAL_ENV  # Unix-like
echo %VIRTUAL_ENV%  # Windows

# Python path
python -c "import sys; print(sys.executable)"

Node.js/JavaScript

# Versions
node --version
npm --version
yarn --version

# Node environment
echo $NODE_ENV

# Global package location
npm config get prefix

Java

# Version
java -version
javac -version

# Runtime details
java -XshowSettings:properties -version 2>&1 | grep 'java.version'

Ruby

# Version
ruby --version

# Gem environment
gem env

Go

# Version
go version

# Environment
go env

Privacy Notes:

  • Installation paths may reveal usernames (sanitize if needed)
  • Custom environment variables may contain sensitive data

3. Package Dependencies

What to Collect:

  • Installed package versions (relevant to error)
  • Package manager version
  • Dependency conflicts
  • Lock file status

Collection Commands:

Python (pip)

# Specific package version
pip show <package-name>
pip list | grep <package-name>

# All packages (use sparingly, large output)
pip list

# Dependency conflicts
pip check

# Requirements file
cat requirements.txt  # Request permission first

Python (conda)

# Environment info
conda info

# Installed packages
conda list <package-name>

# Environment exports
conda env export  # Use with caution, may be large

Node.js (npm)

# Specific package version
npm list <package-name>
npm view <package-name> version

# Global packages
npm list -g --depth=0

# Dependency audit
npm doctor
npm audit

# Lock file status
ls -l package-lock.json

Ruby (gem)

# Specific gem version
gem list <gem-name>

# All gems
gem list

# Gem environment
gem env

Privacy Notes:

  • Package lists can be large; collect only relevant packages when possible
  • Lock files may contain private registry URLs (review before sharing)
  • Package names might reveal business logic (request permission for private packages)

4. Configuration Files

What to Collect: Configuration files often contain sensitive data. Always exercise caution.

Approach:

  1. Identify relevant config: Only collect configs directly related to the error
  2. Request permission: Always ask before reading project-specific configs
  3. Sanitize: Remove credentials, API keys, and sensitive values before recording
  4. Provide context: Explain why the config is needed

Common Configuration Files:

# Python
# - setup.py, setup.cfg, pyproject.toml, tox.ini

# Node.js
# - package.json (usually safe), .npmrc (check for tokens)

# General
# - .env files (NEVER share without sanitization)
# - config.json, config.yaml (sanitize before sharing)

Sanitization Example:

# Before sanitization
database:
  host: db.example.com
  username: admin
  password: super_secret_123
  port: 5432

# After sanitization
database:
  host: [REDACTED]
  username: [REDACTED]
  password: [REDACTED]
  port: 5432

5. Environment Variables

What to Collect: Environment variables often contain sensitive data. Collect with extreme caution.

Approach:

  1. Be Specific: Only check specific, relevant variables
  2. Avoid Wildcards: Never do env or printenv without filtering
  3. Sanitize: Redact values that might be sensitive
  4. Public Variables Only: Prefer checking well-known, non-sensitive variables

Safe Environment Variables:

# Locale and encoding
echo $LANG
echo $LC_ALL

# Shell
echo $SHELL

# Path (usually safe, but may reveal usernames)
echo $PATH

# Node environment
echo $NODE_ENV

# Python path
echo $PYTHONPATH

Potentially Sensitive Variables:

# Requires permission or sanitization
# - API keys: $API_KEY, $SECRET_KEY, $TOKEN
# - Database URLs: $DATABASE_URL
# - Credentials: $USERNAME, $PASSWORD
# - Custom app settings: $APP_* variables

Collection Command (filtered):

# Safe: Check specific variable
echo $LANG

# Risky: List all variables (AVOID unless necessary)
env
printenv

# Better: Filter for specific patterns
env | grep -i "^PYTHON"
env | grep -i "^NODE"

6. Network and Connectivity

What to Collect:

  • Network connectivity status
  • DNS resolution (for external services)
  • Proxy settings
  • Firewall status (general)

Collection Commands:

# Test connectivity
ping -c 4 google.com  # Linux/Mac
ping -n 4 google.com  # Windows

# DNS resolution
nslookup example.com
dig example.com  # Linux/Mac

# Proxy settings (may contain credentials - sanitize)
echo $HTTP_PROXY
echo $HTTPS_PROXY

# Network interfaces (general info)
ifconfig  # Linux/Mac
ipconfig  # Windows

# Firewall status (general)
sudo ufw status  # Linux (Ubuntu)
netsh advfirewall show allprofiles  # Windows (requires admin)

Privacy Notes:

  • Internal IP addresses are generally low-risk
  • Proxy settings may contain authentication credentials (sanitize)
  • Network topology might be sensitive in enterprise environments

7. File System and Permissions

What to Collect:

  • File existence and permissions (for files mentioned in error)
  • Directory structure (limited)
  • Disk space (if relevant)

Collection Commands:

# File info (request permission if non-system file)
ls -la /path/to/file  # Linux/Mac
dir /path/to/file  # Windows

# Permissions
stat /path/to/file  # Linux/Mac

# Disk space
df -h  # Linux/Mac
wmic logicaldisk get size,freespace,caption  # Windows

# Check if file exists
test -f /path/to/file && echo "exists" || echo "not found"

Privacy Notes:

  • File paths may reveal usernames or project structure (request permission)
  • Avoid listing directory contents unless necessary

Collection Workflow

Step 1: Assess Relevance

Before collecting any information, ask:

  • Is this directly related to the error?
  • Will this information help diagnose or resolve the issue?
  • Is there a less invasive way to get the same information?

Step 2: Categorize Sensitivity

Classify the information:

  • Public: Widely available, non-sensitive (e.g., OS version)
  • Private: User-specific but non-confidential (e.g., package versions)
  • Confidential: May contain sensitive data (e.g., config files)
  • Secret: Credentials, keys, PII (NEVER collect without explicit permission)

Step 3: Request Permission When Needed

For private or confidential information:

"To diagnose this error, I need to check [specific information].
This will involve [specific action].
Is it okay to proceed?"

Example:

"To diagnose this database connection error, I need to check your database
configuration settings. This will involve reading the config/database.yml file.
Any sensitive values will be redacted. Is it okay to proceed?"

Step 4: Collect and Sanitize

Execute the collection command and immediately sanitize:

  1. Capture output
  2. Review for sensitive data
  3. Redact or replace sensitive values
  4. Document what was redacted

Step 5: Document Collection

Record what was collected and why:

  • What information was gathered
  • Why it was needed
  • What commands were used
  • What was sanitized

Sanitization Techniques

Pattern-Based Redaction

Common patterns to redact:

# API keys (various formats)
AIza[0-9A-Za-z-_]{35}  # Google API keys
sk_live_[0-9a-zA-Z]{24}  # Stripe keys
[0-9a-f]{32}  # Generic 32-char hex keys

# Email addresses
[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}

# URLs with credentials
https?://[^:]+:[^@]+@[^/]+

# IP addresses (if needed)
\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b

# File paths with usernames
/home/[^/]+/  -> /home/[USERNAME]/
C:\\Users\\[^\\]+\\  -> C:\Users\[USERNAME]\

Replacement Strategies

# Replace with generic placeholder
password: super_secret_123  →  password: [REDACTED]

# Replace with type indicator
api_key: sk_live_abc123xyz  →  api_key: [API_KEY]

# Partial redaction
email: john.doe@example.com  →  email: [***]@example.com

# Anonymize paths
/home/john/project  →  /home/[USER]/project

Quick Reference: Collection Decision Tree

Need environment information?
    ↓
Is it sensitive or user-specific?
    ├─ NO → Collect directly
    │        (e.g., OS version, Python version)
    │
    └─ YES → Does it contain credentials or PII?
               ├─ YES → Request explicit permission
               │         ↓
               │      Permission granted?
               │         ├─ YES → Collect and sanitize
               │         └─ NO → Find alternative approach
               │
               └─ NO → Is it project-specific?
                        ├─ YES → Request permission
                        └─ NO → Collect and sanitize proactively

Common Scenarios

Scenario 1: Module Not Found Error

Information Needed:

  • Python version
  • pip version
  • Virtual environment status
  • Package installation status

Collection:

python --version
pip --version
echo $VIRTUAL_ENV
pip show <package-name>

Privacy Impact: Low (all public information)

Scenario 2: Database Connection Error

Information Needed:

  • Database client version
  • Connection configuration (sanitized)
  • Network connectivity

Collection:

# Client version (safe)
psql --version  # PostgreSQL
mysql --version  # MySQL

# Configuration (REQUIRES PERMISSION)
# Request permission, then read config with sanitization

# Connectivity (safe)
ping -c 4 database.host.com
nslookup database.host.com

Privacy Impact: Medium-High (config contains credentials)

Scenario 3: Build Failure

Information Needed:

  • Compiler/build tool version
  • System libraries
  • Build configuration

Collection:

# Build tools (safe)
gcc --version
make --version
cmake --version

# Package manager (safe)
apt list --installed | grep <lib-name>  # Debian/Ubuntu
brew info <lib-name>  # macOS

# Build config (request permission for project-specific)
cat CMakeLists.txt
cat Makefile

Privacy Impact: Low-Medium (build config might reveal project details)

Best Practices Summary

  1. Collect Minimally: Only gather what's directly relevant
  2. Request Permission: When information is user-specific or potentially sensitive
  3. Sanitize Proactively: Remove credentials and PII before recording
  4. Document Purpose: Explain why information is needed
  5. Validate Necessity: Double-check if collection is truly required
  6. Use Specific Commands: Avoid broad commands like env or find /
  7. Respect User Privacy: When uncertain, err on the side of asking permission
  8. Provide Context: Help users understand what information will be accessed

Red Flags: Never Collect

  • Raw credential files (.env, credentials.json)
  • Browser cookies or session storage
  • SSH keys or SSL certificates
  • Database dumps
  • Full process listings (might expose arguments with credentials)
  • Complete environment variable dumps
  • User home directory listings
  • Git repository contents (without permission)
  • Application logs (without permission and sanitization)