Files

Zhongwei Li e732da8316 Initial commit

2025-11-29 18:47:55 +08:00

13 KiB

Raw Permalink Blame History

Environment Information Collection Guide

This guide provides systematic approaches for gathering environment information during error troubleshooting while maintaining strict privacy and security standards.

Core Privacy Principles

Never Collect Without Authorization

Absolutely Forbidden (never collect these):

Passwords or password hashes
API keys, tokens, or credentials
Private keys or certificates
Personal identifiable information (PII): names, emails, addresses, phone numbers
Financial information
Session cookies
Authentication headers
Database connection strings with credentials

Requires Explicit User Permission:

Project-specific file paths
Custom configuration files
Custom environment variables
Application logs (may contain sensitive data)
Network configurations
User-specific system settings

Generally Safe (collect without permission):

Public software versions
Operating system type and version
Public package versions
Standard error messages
Command outputs that don't reveal sensitive paths or data

Privacy-First Collection Strategy

Assess Necessity: Only collect information directly relevant to the error
Request Permission: When in doubt, ask the user first
Sanitize Output: Remove sensitive data before recording
Minimize Scope: Collect the smallest amount needed
Explain Purpose: Tell the user why specific information is needed

Environment Information Categories

1. System Environment

What to Collect:

Operating system and version
System architecture (x86, ARM, etc.)
Shell/terminal type
Locale and encoding settings

Collection Commands:

# Cross-platform OS detection
# Linux/Mac
uname -a
uname -s  # Just OS name
uname -r  # Just kernel version

# Windows
ver
systeminfo | findstr /B /C:"OS Name" /C:"OS Version"

# Architecture
uname -m  # Linux/Mac
echo %PROCESSOR_ARCHITECTURE%  # Windows

# Locale
locale  # Linux/Mac
echo %LANG%  # Unix-like
chcp  # Windows (code page)

Privacy Notes:

These commands generally don't expose sensitive information
Hostname may be included in uname -a output (consider sanitizing)

2. Language Runtime Environment

What to Collect:

Programming language version
Runtime environment details
Virtual environment status

Collection Commands by Language:

Python

# Version
python --version
python3 --version

# Detailed info
python -c "import sys; print(sys.version)"

# Virtual environment detection
echo $VIRTUAL_ENV  # Unix-like
echo %VIRTUAL_ENV%  # Windows

# Python path
python -c "import sys; print(sys.executable)"

Node.js/JavaScript

# Versions
node --version
npm --version
yarn --version

# Node environment
echo $NODE_ENV

# Global package location
npm config get prefix

Java

# Version
java -version
javac -version

# Runtime details
java -XshowSettings:properties -version 2>&1 | grep 'java.version'

Ruby

# Version
ruby --version

# Gem environment
gem env

Go

# Version
go version

# Environment
go env

Privacy Notes:

Installation paths may reveal usernames (sanitize if needed)
Custom environment variables may contain sensitive data

3. Package Dependencies

What to Collect:

Installed package versions (relevant to error)
Package manager version
Dependency conflicts
Lock file status

Collection Commands:

Python (pip)

# Specific package version
pip show <package-name>
pip list | grep <package-name>

# All packages (use sparingly, large output)
pip list

# Dependency conflicts
pip check

# Requirements file
cat requirements.txt  # Request permission first

Python (conda)

# Environment info
conda info

# Installed packages
conda list <package-name>

# Environment exports
conda env export  # Use with caution, may be large

Node.js (npm)

# Specific package version
npm list <package-name>
npm view <package-name> version

# Global packages
npm list -g --depth=0

# Dependency audit
npm doctor
npm audit

# Lock file status
ls -l package-lock.json

Ruby (gem)

# Specific gem version
gem list <gem-name>

# All gems
gem list

# Gem environment
gem env

Privacy Notes:

Package lists can be large; collect only relevant packages when possible
Lock files may contain private registry URLs (review before sharing)
Package names might reveal business logic (request permission for private packages)

4. Configuration Files

What to Collect: Configuration files often contain sensitive data. Always exercise caution.

Approach:

Identify relevant config: Only collect configs directly related to the error
Request permission: Always ask before reading project-specific configs
Sanitize: Remove credentials, API keys, and sensitive values before recording
Provide context: Explain why the config is needed

Common Configuration Files:

# Python
# - setup.py, setup.cfg, pyproject.toml, tox.ini

# Node.js
# - package.json (usually safe), .npmrc (check for tokens)

# General
# - .env files (NEVER share without sanitization)
# - config.json, config.yaml (sanitize before sharing)

Sanitization Example:

# Before sanitization
database:
  host: db.example.com
  username: admin
  password: super_secret_123
  port: 5432

# After sanitization
database:
  host: [REDACTED]
  username: [REDACTED]
  password: [REDACTED]
  port: 5432

5. Environment Variables

What to Collect: Environment variables often contain sensitive data. Collect with extreme caution.

Approach:

Be Specific: Only check specific, relevant variables
Avoid Wildcards: Never do env or printenv without filtering
Sanitize: Redact values that might be sensitive
Public Variables Only: Prefer checking well-known, non-sensitive variables

Safe Environment Variables:

# Locale and encoding
echo $LANG
echo $LC_ALL

# Shell
echo $SHELL

# Path (usually safe, but may reveal usernames)
echo $PATH

# Node environment
echo $NODE_ENV

# Python path
echo $PYTHONPATH

Potentially Sensitive Variables:

# Requires permission or sanitization
# - API keys: $API_KEY, $SECRET_KEY, $TOKEN
# - Database URLs: $DATABASE_URL
# - Credentials: $USERNAME, $PASSWORD
# - Custom app settings: $APP_* variables

Collection Command (filtered):

# Safe: Check specific variable
echo $LANG

# Risky: List all variables (AVOID unless necessary)
env
printenv

# Better: Filter for specific patterns
env | grep -i "^PYTHON"
env | grep -i "^NODE"

6. Network and Connectivity

What to Collect:

Network connectivity status
DNS resolution (for external services)
Proxy settings
Firewall status (general)

Collection Commands:

# Test connectivity
ping -c 4 google.com  # Linux/Mac
ping -n 4 google.com  # Windows

# DNS resolution
nslookup example.com
dig example.com  # Linux/Mac

# Proxy settings (may contain credentials - sanitize)
echo $HTTP_PROXY
echo $HTTPS_PROXY

# Network interfaces (general info)
ifconfig  # Linux/Mac
ipconfig  # Windows

# Firewall status (general)
sudo ufw status  # Linux (Ubuntu)
netsh advfirewall show allprofiles  # Windows (requires admin)

Privacy Notes:

Internal IP addresses are generally low-risk
Proxy settings may contain authentication credentials (sanitize)
Network topology might be sensitive in enterprise environments

7. File System and Permissions

What to Collect:

File existence and permissions (for files mentioned in error)
Directory structure (limited)
Disk space (if relevant)

Collection Commands:

# File info (request permission if non-system file)
ls -la /path/to/file  # Linux/Mac
dir /path/to/file  # Windows

# Permissions
stat /path/to/file  # Linux/Mac

# Disk space
df -h  # Linux/Mac
wmic logicaldisk get size,freespace,caption  # Windows

# Check if file exists
test -f /path/to/file && echo "exists" || echo "not found"

Privacy Notes:

File paths may reveal usernames or project structure (request permission)
Avoid listing directory contents unless necessary

Collection Workflow

Step 1: Assess Relevance

Before collecting any information, ask:

Is this directly related to the error?
Will this information help diagnose or resolve the issue?
Is there a less invasive way to get the same information?

Step 2: Categorize Sensitivity

Classify the information:

Public: Widely available, non-sensitive (e.g., OS version)
Private: User-specific but non-confidential (e.g., package versions)
Confidential: May contain sensitive data (e.g., config files)
Secret: Credentials, keys, PII (NEVER collect without explicit permission)

Step 3: Request Permission When Needed

For private or confidential information:

"To diagnose this error, I need to check [specific information].
This will involve [specific action].
Is it okay to proceed?"

Example:

"To diagnose this database connection error, I need to check your database
configuration settings. This will involve reading the config/database.yml file.
Any sensitive values will be redacted. Is it okay to proceed?"

Step 4: Collect and Sanitize

Execute the collection command and immediately sanitize:

Capture output
Review for sensitive data
Redact or replace sensitive values
Document what was redacted

Step 5: Document Collection

Record what was collected and why:

What information was gathered
Why it was needed
What commands were used
What was sanitized

Sanitization Techniques

Pattern-Based Redaction

Common patterns to redact:

# API keys (various formats)
AIza[0-9A-Za-z-_]{35}  # Google API keys
sk_live_[0-9a-zA-Z]{24}  # Stripe keys
[0-9a-f]{32}  # Generic 32-char hex keys

# Email addresses
[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}

# URLs with credentials
https?://[^:]+:[^@]+@[^/]+

# IP addresses (if needed)
\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b

# File paths with usernames
/home/[^/]+/  -> /home/[USERNAME]/
C:\\Users\\[^\\]+\\  -> C:\Users\[USERNAME]\

Replacement Strategies

# Replace with generic placeholder
password: super_secret_123  →  password: [REDACTED]

# Replace with type indicator
api_key: sk_live_abc123xyz  →  api_key: [API_KEY]

# Partial redaction
email: john.doe@example.com  →  email: [***]@example.com

# Anonymize paths
/home/john/project  →  /home/[USER]/project

Quick Reference: Collection Decision Tree

Need environment information?
    ↓
Is it sensitive or user-specific?
    ├─ NO → Collect directly
    │        (e.g., OS version, Python version)
    │
    └─ YES → Does it contain credentials or PII?
               ├─ YES → Request explicit permission
               │         ↓
               │      Permission granted?
               │         ├─ YES → Collect and sanitize
               │         └─ NO → Find alternative approach
               │
               └─ NO → Is it project-specific?
                        ├─ YES → Request permission
                        └─ NO → Collect and sanitize proactively

Common Scenarios

Scenario 1: Module Not Found Error

Information Needed:

Python version
pip version
Virtual environment status
Package installation status

Collection:

python --version
pip --version
echo $VIRTUAL_ENV
pip show <package-name>

Privacy Impact: Low (all public information)

Scenario 2: Database Connection Error

Information Needed:

Database client version
Connection configuration (sanitized)
Network connectivity

Collection:

# Client version (safe)
psql --version  # PostgreSQL
mysql --version  # MySQL

# Configuration (REQUIRES PERMISSION)
# Request permission, then read config with sanitization

# Connectivity (safe)
ping -c 4 database.host.com
nslookup database.host.com

Privacy Impact: Medium-High (config contains credentials)

Scenario 3: Build Failure

Information Needed:

Compiler/build tool version
System libraries
Build configuration

Collection:

# Build tools (safe)
gcc --version
make --version
cmake --version

# Package manager (safe)
apt list --installed | grep <lib-name>  # Debian/Ubuntu
brew info <lib-name>  # macOS

# Build config (request permission for project-specific)
cat CMakeLists.txt
cat Makefile

Privacy Impact: Low-Medium (build config might reveal project details)

Best Practices Summary

Collect Minimally: Only gather what's directly relevant
Request Permission: When information is user-specific or potentially sensitive
Sanitize Proactively: Remove credentials and PII before recording
Document Purpose: Explain why information is needed
Validate Necessity: Double-check if collection is truly required
Use Specific Commands: Avoid broad commands like env or find /
Respect User Privacy: When uncertain, err on the side of asking permission
Provide Context: Help users understand what information will be accessed

Red Flags: Never Collect

Raw credential files (.env, credentials.json)
Browser cookies or session storage
SSH keys or SSL certificates
Database dumps
Full process listings (might expose arguments with credentials)
Complete environment variable dumps
User home directory listings
Git repository contents (without permission)
Application logs (without permission and sanitization)

13 KiB Raw Permalink Blame History

Environment Information Collection Guide

Core Privacy Principles

Never Collect Without Authorization

Privacy-First Collection Strategy

Environment Information Categories

1. System Environment

2. Language Runtime Environment

Python

Node.js/JavaScript

Java

Ruby

Go

3. Package Dependencies

Python (pip)

Python (conda)

Node.js (npm)

Ruby (gem)

4. Configuration Files

5. Environment Variables

6. Network and Connectivity

7. File System and Permissions

Collection Workflow

Step 1: Assess Relevance

Step 2: Categorize Sensitivity

Step 3: Request Permission When Needed

Step 4: Collect and Sanitize

Step 5: Document Collection

Sanitization Techniques

Pattern-Based Redaction

Replacement Strategies

Quick Reference: Collection Decision Tree

Common Scenarios

Scenario 1: Module Not Found Error

Scenario 2: Database Connection Error

Scenario 3: Build Failure

Best Practices Summary

Red Flags: Never Collect

13 KiB

Raw Permalink Blame History