Files
2025-11-30 09:05:19 +08:00

9.8 KiB

Python Scripting Reference

Detailed patterns and examples for Python automation scripts.

Subprocess Patterns

Two-Stage Subprocess (Avoid Shell Parsing)

Problem: Using shell=True with complex patterns causes shell parsing issues.

Don't: shell=True with complex patterns

cmd = 'curl -s "url" | grep -oE "pattern(with|parens)"'
subprocess.run(cmd, shell=True, ...)

Do: Separate calls with input= piping

curl_result = subprocess.run(['curl', '-s', url],
                            capture_output=True, text=True)
grep_result = subprocess.run(['grep', '-oE', pattern],
                            input=curl_result.stdout,
                            capture_output=True, text=True)

Why List Arguments Work

  • Python executes command directly (no shell interpretation)
  • Arguments passed as literal strings
  • Special chars like |(){} treated as text, not operators

When shell=True Is Needed

Only use for hard-coded commands that require shell features:

  • * wildcards
  • ~ home directory expansion
  • && operators
  • Environment variable expansion
# Hard-coded command only
subprocess.run('ls *.txt | wc -l', shell=True, ...)

Debugging Subprocess Failures

Workflow

  1. Test command in bash first - Verify it works outside Python
  2. Add debug output:
    result = subprocess.run(cmd, ...)
    print(f"stdout: {result.stdout[:100]}")
    print(f"stderr: {result.stderr}")
    print(f"returncode: {result.returncode}")
    
  3. Check stderr for shell errors - Syntax errors indicate shell parsing issues
  4. Rewrite without shell=True - Use list arguments and two-stage pattern

Common Errors

Error Cause Solution
syntax error near unexpected token '(' Shell parsing regex/parens Two-stage subprocess
command not found PATH issue or typo Check command exists with which
Empty stdout Command construction error Debug with stderr output

Debugging Invisible Characters

Problem: Files with invisible characters (backspace, null bytes) cause mysterious errors.

Symptoms:

  • LaTeX: Unicode character ^^H (U+0008) not set up for use with LaTeX
  • Commands fail with "invalid character" but file looks normal

Detection:

# Show all characters including invisible ones
od -c file.txt

# Check specific line range
sed -n '10,20p' file.txt | od -c

# Find backspaces
grep -P '\x08' file.txt

Example output:

0000000    %   %       f   i   l   e   .  \n  \b   \   b   e   g   i
                                            ^^^ backspace character

Fix:

# Remove all backspace characters
tr -d '\b' < corrupted.tex > clean.tex

# Remove all control characters (preserve newlines)
tr -cd '[:print:]\n' < file.txt > clean.txt

Prevention: Use proper quoting when generating files (see Bash reference for LaTeX string escaping).

Error Handling

Basic Pattern

import sys
import subprocess

try:
    result = subprocess.run(['command'],
                          capture_output=True,
                          text=True,
                          check=True)  # Raises on non-zero exit
except subprocess.CalledProcessError as e:
    print(f"Error: Command failed with exit code {e.returncode}", file=sys.stderr)
    print(f"stderr: {e.stderr}", file=sys.stderr)
    sys.exit(1)
except FileNotFoundError:
    print("Error: Command not found in PATH", file=sys.stderr)
    sys.exit(1)

File Operations

try:
    with open(file_path, 'r') as f:
        content = f.read()
except FileNotFoundError:
    print(f"Error: File not found: {file_path}", file=sys.stderr)
    sys.exit(1)
except PermissionError:
    print(f"Error: Permission denied: {file_path}", file=sys.stderr)
    sys.exit(1)
except IOError as e:
    print(f"Error reading file: {e}", file=sys.stderr)
    sys.exit(1)

Argparse Patterns

Multi-Mode Scripts

import argparse

parser = argparse.ArgumentParser(description='Script description')
parser.add_argument('input', nargs='?', help='Input file or topic')
parser.add_argument('--url', help='Direct URL mode')
parser.add_argument('--verify', action='store_true', help='Verify output')
args = parser.parse_args()

# Validate combinations
if not args.input and not args.url:
    parser.error("Provide either input or --url")

Common Flag Patterns

parser.add_argument('-v', '--verbose', action='store_true',
                   help='Verbose output')
parser.add_argument('-f', '--force', action='store_true',
                   help='Force operation')
parser.add_argument('-o', '--output', default='output.txt',
                   help='Output file')
parser.add_argument('--count', type=int, default=5,
                   help='Number of items')
parser.add_argument('--config', type=str,
                   help='Config file path')

Mutually Exclusive Groups

group = parser.add_mutually_exclusive_group()
group.add_argument('--json', action='store_true')
group.add_argument('--yaml', action='store_true')

Environment Variables

import os

# ✅ Never hardcode credentials
API_KEY = os.getenv('API_KEY')
if not API_KEY:
    print("Error: API_KEY environment variable not set", file=sys.stderr)
    sys.exit(1)

# ✅ Provide defaults
LOG_LEVEL = os.getenv('LOG_LEVEL', 'INFO')
OUTPUT_DIR = os.getenv('OUTPUT_DIR', './output')

# ✅ Type conversion with defaults
MAX_RETRIES = int(os.getenv('MAX_RETRIES', '3'))
TIMEOUT = float(os.getenv('TIMEOUT', '30.0'))

File Processing Patterns

Process Files Matching Pattern

import glob
import sys

def process_files(pattern: str) -> list[str]:
    """Find and process files matching pattern."""
    files = glob.glob(pattern, recursive=True)
    results = []

    for file in files:
        try:
            with open(file, 'r') as f:
                content = f.read()
                results.append(process(content))
        except IOError as e:
            print(f"Error reading {file}: {e}", file=sys.stderr)

    return results

Safe File Writing

import tempfile
import shutil

def safe_write(file_path: str, content: str):
    """Write to temp file first, then atomic move."""
    # Write to temp file in same directory
    dir_name = os.path.dirname(file_path)
    with tempfile.NamedTemporaryFile(mode='w', dir=dir_name,
                                     delete=False) as tmp:
        tmp.write(content)
        tmp_path = tmp.name

    # Atomic move
    shutil.move(tmp_path, file_path)

URL Verification

import subprocess

def verify_url(url: str) -> bool:
    """Verify URL is accessible with HTTP HEAD request."""
    result = subprocess.run(['curl', '-I', '-s', url],
                          capture_output=True, text=True)

    if 'HTTP/2 200' in result.stdout or 'HTTP/1.1 200' in result.stdout:
        if 'content-type:' in result.stdout.lower():
            return True
    return False

Automation Script Patterns

Dry-Run Mode

import argparse

parser = argparse.ArgumentParser()
parser.add_argument('--force', action='store_true',
                   help='Apply changes (dry-run by default)')
args = parser.parse_args()

dry_run = not args.force

# Use dry_run flag throughout script
for item in items:
    change_description = f"Would rename {item['old']}{item['new']}"

    if dry_run:
        print(f"→ {change_description}")
    else:
        print(f"✓ {change_description}")
        apply_change(item)

Backup-First Pattern

from datetime import datetime
import shutil

def backup_before_modify(config_path: str) -> str:
    """Create timestamped backup before modifications."""
    timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
    backup_path = f"{config_path}.backup.{timestamp}"

    shutil.copy2(config_path, backup_path)
    print(f"✓ Backup created: {backup_path}")

    return backup_path

# Use in operations
if not dry_run:
    backup_before_modify(config_path)
    update_config(config_path)

Self-Documenting Output

print("=" * 70)
print("CONFIGURATION MIGRATION")
print("=" * 70)
print()

print("Step 1: Analyzing input files")
print("-" * 70)
files = find_files()
print(f"Found: {len(files)} files")
for f in files[:5]:
    print(f"  • {f}")
print()

print("Step 2: Validating configuration")
print("-" * 70)
errors = validate_config()
if errors:
    print(f"✗ Found {len(errors)} errors")
    for error in errors:
        print(f"  • {error}")
else:
    print("✓ Configuration valid")

Common Pitfalls

Using shell=True Unnecessarily

# Vulnerable and error-prone
subprocess.run(f'rm -rf {user_input}', shell=True)  # DANGER

Use List Arguments

subprocess.run(['rm', '-rf', user_input])  # Safe

Not Handling Encoding

result = subprocess.run(['cmd'], capture_output=True)
print(result.stdout)  # bytes, not string

Specify text=True

result = subprocess.run(['cmd'], capture_output=True, text=True)
print(result.stdout)  # string

Ignoring Errors

result = subprocess.run(['cmd'])
# No error handling

Check Exit Code

result = subprocess.run(['cmd'], capture_output=True, text=True)
if result.returncode != 0:
    print(f"Error: {result.stderr}", file=sys.stderr)
    sys.exit(1)

Validation Tools

# Check syntax
python3 -m py_compile script.py

# Lint with pylint
pip install pylint
pylint script.py

# Format with black
pip install black
black script.py

# Type check with mypy
pip install mypy
mypy script.py

References