Initial commit

2025-11-30 09:05:19 +08:00
commit 09fec2555b
96 changed files with 24269 additions and 0 deletions
--- a/skills/writing-scripts/references/bash.md
+++ b/skills/writing-scripts/references/bash.md
@@ -0,0 +1,529 @@
+# Bash Scripting Reference
+
+Detailed patterns and examples for Bash automation scripts.
+
+## Error Handling
+
+### Essential Settings
+
+Put these at the top of **every** Bash script:
+
+```bash
+#!/usr/bin/env bash
+set -Eeuo pipefail
+trap cleanup SIGINT SIGTERM ERR EXIT
+
+cleanup() {
+    trap - SIGINT SIGTERM ERR EXIT
+    # Cleanup code here (remove temp files, etc.)
+}
+```
+
+### Flag Breakdown
+
+**`-E` (errtrap):** Error traps work in functions
+```bash
+trap 'echo "Error"' ERR
+func() { false; }  # Trap fires (wouldn't without -E)
+```
+
+**`-e` (errexit):** Stop on first error
+```bash
+command_fails  # Script exits here
+never_runs     # Never executes
+```
+
+**`-u` (nounset):** Catch undefined variables
+```bash
+echo "$TYPO"  # Error: TYPO: unbound variable (not silent)
+```
+
+**`-o pipefail`:** Detect failures in pipes
+```bash
+false | true  # Fails (not just last command status)
+```
+
+**`trap`:** Run cleanup on exit/error/signal
+
+## String Escaping for LaTeX and Special Characters
+
+**Problem:** Bash interprets escape sequences in double-quoted strings, which corrupts LaTeX commands and special text.
+
+**Dangerous sequences:** `\b` (backspace), `\n` (newline), `\t` (tab), `\r` (return)
+
+### Example Failure
+
+```bash
+# ❌ Wrong: Creates backspace character
+echo "\\begin{document}" >> file.tex  # Becomes: <backspace>egin{document}
+echo "\\bibliographystyle{ACM}" >> file.tex  # Becomes: <backspace>ibliographystyle{ACM}
+```
+
+### Safe Approaches
+
+**1. Single quotes** (Best for simple cases):
+```bash
+echo '\begin{document}' >> file.tex  # ✅ No interpretation
+echo '\bibliographystyle{ACM-Reference-Format}' >> file.tex  # ✅ Safe
+```
+
+**2. Double backslashes** (When variables needed):
+```bash
+echo "\\\\begin{document}" >> file.tex  # ✅ 4 backslashes → \b
+cmd="begin"
+echo "\\\\${cmd}{document}" >> file.tex  # ✅ Works with variables
+```
+
+**3. Printf** (More predictable):
+```bash
+printf '%s\n' '\begin{document}' >> file.tex  # ✅ Literal strings
+printf '%s\n' '\bibliographystyle{ACM-Reference-Format}' >> file.tex
+```
+
+**4. Heredoc** (Best for multi-line LaTeX):
+```bash
+cat >> file.tex << 'EOF'  # ✅ Note quoted delimiter
+\begin{document}
+\section{Title}
+\bibliographystyle{ACM-Reference-Format}
+\end{document}
+EOF
+```
+
+### Quick Reference
+
+| Character | Echo double-quotes | Echo single-quotes | Heredoc |
+|-----------|-------------------|-------------------|---------|
+| `\b` | ❌ Backspace | ✅ Literal | ✅ Literal |
+| `\n` | ❌ Newline | ✅ Literal | ✅ Literal |
+| `\t` | ❌ Tab | ✅ Literal | ✅ Literal |
+| Variables | ✅ Work | ❌ Don't expand | ✅ With `"EOF"` |
+
+**Rule of thumb:** For LaTeX, use single quotes or heredocs to avoid escape sequence interpretation.
+
+## Variable Quoting
+
+### Always Quote Variables
+
+```bash
+# ✅ Always quote variables
+file="my file.txt"
+cat "$file"          # Correct
+
+# ❌ Unquoted breaks on spaces
+cat $file            # WRONG: tries to cat "my" and "file.txt"
+```
+
+### Array Expansion
+
+```bash
+files=("file 1.txt" "file 2.txt")
+
+# ✅ Quote array expansion
+for file in "${files[@]}"; do
+    echo "$file"
+done
+
+# ❌ Unquoted splits on spaces
+for file in ${files[@]}; do
+    echo "$file"  # WRONG: treats spaces as separators
+done
+```
+
+## Script Directory Detection
+
+```bash
+# Get directory where script is located
+script_dir=$(cd "$(dirname "${BASH_SOURCE[0]}")" &>/dev/null && pwd -P)
+
+# Use for relative paths
+source "${script_dir}/config.sh"
+data_file="${script_dir}/../data/input.txt"
+```
+
+## Functions
+
+### Function Template
+
+```bash
+# Document functions with comments
+# Args:
+#   $1 - input file
+#   $2 - output file
+# Returns:
+#   0 on success, 1 on error
+process_file() {
+    local input="$1"
+    local output="$2"
+
+    if [[ ! -f "$input" ]]; then
+        echo "Error: Input file not found: $input" >&2
+        return 1
+    fi
+
+    # Process file
+    grep pattern "$input" > "$output"
+}
+
+# Call function
+if process_file "input.txt" "output.txt"; then
+    echo "Success"
+else
+    echo "Failed" >&2
+    exit 1
+fi
+```
+
+### Local Variables
+
+Always use `local` for function variables:
+
+```bash
+process_data() {
+    local data="$1"  # ✅ Local to function
+    local result
+
+    result=$(transform "$data")
+    echo "$result"
+}
+```
+
+## Error Messages
+
+### Write Errors to Stderr
+
+```bash
+# ✅ Write errors to stderr
+echo "Error: File not found" >&2
+
+# ✅ Exit with non-zero code
+exit 1
+
+# ❌ Don't write errors to stdout
+echo "Error: File not found"
+```
+
+### Structured Error Handling
+
+```bash
+error() {
+    echo "Error: $*" >&2
+    exit 1
+}
+
+warn() {
+    echo "Warning: $*" >&2
+}
+
+# Usage
+[[ -f "$config" ]] || error "Config file not found: $config"
+[[ -w "$output" ]] || warn "Output file not writable: $output"
+```
+
+## Checking Commands Exist
+
+```bash
+if ! command -v jq &> /dev/null; then
+    echo "Error: jq is required but not installed" >&2
+    exit 1
+fi
+
+# Check multiple commands
+for cmd in curl jq sed; do
+    if ! command -v "$cmd" &> /dev/null; then
+        echo "Error: $cmd is required but not installed" >&2
+        exit 1
+    fi
+done
+```
+
+## Parallel Processing
+
+```bash
+# Run commands in parallel, wait for all
+for file in *.txt; do
+    process_file "$file" &
+done
+wait
+
+echo "All files processed"
+```
+
+### Parallel with Error Handling
+
+```bash
+pids=()
+for file in *.txt; do
+    process_file "$file" &
+    pids+=($!)
+done
+
+# Wait and check exit codes
+failed=0
+for pid in "${pids[@]}"; do
+    if ! wait "$pid"; then
+        ((failed++))
+    fi
+done
+
+if [[ $failed -gt 0 ]]; then
+    echo "Error: $failed jobs failed" >&2
+    exit 1
+fi
+```
+
+## Configuration Files
+
+### Loading Config
+
+```bash
+# Load config file if exists
+config_file="${script_dir}/config.sh"
+if [[ -f "$config_file" ]]; then
+    source "$config_file"
+else
+    # Default values
+    LOG_DIR="/var/log"
+    BACKUP_DIR="/backup"
+fi
+```
+
+### Safe Config Sourcing
+
+```bash
+# Validate config before sourcing
+validate_config() {
+    local config="$1"
+
+    # Check syntax
+    if ! bash -n "$config" 2>/dev/null; then
+        echo "Error: Invalid syntax in $config" >&2
+        return 1
+    fi
+
+    return 0
+}
+
+if validate_config "$config_file"; then
+    source "$config_file"
+else
+    exit 1
+fi
+```
+
+## Argument Parsing
+
+### Simple Pattern
+
+```bash
+# Parse flags
+VERBOSE=false
+FORCE=false
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -v|--verbose)
+            VERBOSE=true
+            shift
+            ;;
+        -f|--force)
+            FORCE=true
+            shift
+            ;;
+        -o|--output)
+            OUTPUT="$2"
+            shift 2
+            ;;
+        *)
+            echo "Unknown option: $1" >&2
+            exit 1
+            ;;
+    esac
+done
+```
+
+### Usage Function
+
+```bash
+usage() {
+    cat << EOF
+Usage: $0 [OPTIONS] INPUT OUTPUT
+
+Process files with various options.
+
+OPTIONS:
+    -v, --verbose    Verbose output
+    -f, --force      Force operation
+    -o, --output     Output file
+    -h, --help       Show this help
+
+EXAMPLES:
+    $0 input.txt output.txt
+    $0 -v --force input.txt output.txt
+EOF
+}
+
+# Show usage on error or -h
+[[ "$1" == "-h" || "$1" == "--help" ]] && usage && exit 0
+[[ $# -lt 2 ]] && usage && exit 1
+```
+
+## Temporary Files
+
+### Safe Temp File Creation
+
+```bash
+# Create temp file
+tmpfile=$(mktemp)
+trap "rm -f '$tmpfile'" EXIT
+
+# Use temp file
+curl -s "$url" > "$tmpfile"
+process "$tmpfile"
+
+# Cleanup happens automatically via trap
+```
+
+### Temp Directory
+
+```bash
+# Create temp directory
+tmpdir=$(mktemp -d)
+trap "rm -rf '$tmpdir'" EXIT
+
+# Use temp directory
+download_files "$tmpdir"
+process_directory "$tmpdir"
+```
+
+## Common Patterns
+
+### File Existence Checks
+
+```bash
+# Check file exists
+[[ -f "$file" ]] || error "File not found: $file"
+
+# Check directory exists
+[[ -d "$dir" ]] || error "Directory not found: $dir"
+
+# Check file readable
+[[ -r "$file" ]] || error "File not readable: $file"
+
+# Check file writable
+[[ -w "$file" ]] || error "File not writable: $file"
+```
+
+### String Comparisons
+
+```bash
+# Check empty string
+[[ -z "$var" ]] && error "Variable is empty"
+
+# Check non-empty string
+[[ -n "$var" ]] || error "Variable not set"
+
+# String equality
+[[ "$a" == "$b" ]] && echo "Equal"
+
+# Pattern matching
+[[ "$file" == *.txt ]] && echo "Text file"
+```
+
+### Numeric Comparisons
+
+```bash
+# Greater than
+[[ $count -gt 10 ]] && echo "More than 10"
+
+# Less than or equal
+[[ $count -le 5 ]] && echo "5 or fewer"
+
+# Equal
+[[ $count -eq 0 ]] && echo "Zero"
+```
+
+## Common Pitfalls
+
+### ❌ Unquoted Variables
+
+```bash
+file=$1
+cat $file  # Breaks with spaces
+```
+
+### ✅ Always Quote
+
+```bash
+file="$1"
+cat "$file"
+```
+
+### ❌ Escape Sequences in LaTeX
+
+```bash
+# Corrupts \begin, \bibitem, etc.
+echo "\\begin{document}" >> file.tex  # Creates <backspace>egin
+```
+
+### ✅ Use Single Quotes or Heredocs
+
+```bash
+echo '\begin{document}' >> file.tex
+# Or:
+cat >> file.tex << 'EOF'
+\begin{document}
+EOF
+```
+
+### ❌ No Error Handling
+
+```bash
+#!/bin/bash
+command_that_might_fail
+continue_anyway
+```
+
+### ✅ Fail Fast
+
+```bash
+#!/usr/bin/env bash
+set -Eeuo pipefail
+command_that_might_fail  # Script exits on failure
+```
+
+### ❌ Unvalidated User Input
+
+```bash
+rm -rf /$user_input  # DANGER
+```
+
+### ✅ Validate Input
+
+```bash
+# Validate directory name
+if [[ ! "$user_input" =~ ^[a-zA-Z0-9_-]+$ ]]; then
+    error "Invalid directory name"
+fi
+```
+
+## Validation Tools
+
+```bash
+# Check syntax
+bash -n script.sh
+
+# Static analysis with shellcheck
+brew install shellcheck  # macOS
+apt install shellcheck   # Ubuntu
+shellcheck script.sh
+
+# Run with debug mode
+bash -x script.sh
+```
+
+## References
+
+- Bash error handling: https://bertvv.github.io/cheat-sheets/Bash.html
+- ShellCheck: https://www.shellcheck.net/
+- Bash best practices: https://mywiki.wooledge.org/BashGuide
--- a/skills/writing-scripts/references/python.md
+++ b/skills/writing-scripts/references/python.md
@@ -0,0 +1,406 @@
+# Python Scripting Reference
+
+Detailed patterns and examples for Python automation scripts.
+
+## Subprocess Patterns
+
+### Two-Stage Subprocess (Avoid Shell Parsing)
+
+**Problem:** Using `shell=True` with complex patterns causes shell parsing issues.
+
+**❌ Don't: shell=True with complex patterns**
+```python
+cmd = 'curl -s "url" | grep -oE "pattern(with|parens)"'
+subprocess.run(cmd, shell=True, ...)
+```
+
+**✅ Do: Separate calls with input= piping**
+```python
+curl_result = subprocess.run(['curl', '-s', url],
+                            capture_output=True, text=True)
+grep_result = subprocess.run(['grep', '-oE', pattern],
+                            input=curl_result.stdout,
+                            capture_output=True, text=True)
+```
+
+### Why List Arguments Work
+
+- Python executes command directly (no shell interpretation)
+- Arguments passed as literal strings
+- Special chars like `|(){}` treated as text, not operators
+
+### When shell=True Is Needed
+
+Only use for hard-coded commands that require shell features:
+- `*` wildcards
+- `~` home directory expansion
+- `&&` operators
+- Environment variable expansion
+
+```python
+# Hard-coded command only
+subprocess.run('ls *.txt | wc -l', shell=True, ...)
+```
+
+## Debugging Subprocess Failures
+
+### Workflow
+
+1. **Test command in bash first** - Verify it works outside Python
+2. **Add debug output:**
+   ```python
+   result = subprocess.run(cmd, ...)
+   print(f"stdout: {result.stdout[:100]}")
+   print(f"stderr: {result.stderr}")
+   print(f"returncode: {result.returncode}")
+   ```
+3. **Check stderr for shell errors** - Syntax errors indicate shell parsing issues
+4. **Rewrite without shell=True** - Use list arguments and two-stage pattern
+
+### Common Errors
+
+| Error | Cause | Solution |
+|-------|-------|----------|
+| `syntax error near unexpected token '('` | Shell parsing regex/parens | Two-stage subprocess |
+| `command not found` | PATH issue or typo | Check command exists with `which` |
+| Empty stdout | Command construction error | Debug with stderr output |
+
+### Debugging Invisible Characters
+
+**Problem:** Files with invisible characters (backspace, null bytes) cause mysterious errors.
+
+**Symptoms:**
+- LaTeX: `Unicode character ^^H (U+0008) not set up for use with LaTeX`
+- Commands fail with "invalid character" but file looks normal
+
+**Detection:**
+```bash
+# Show all characters including invisible ones
+od -c file.txt
+
+# Check specific line range
+sed -n '10,20p' file.txt | od -c
+
+# Find backspaces
+grep -P '\x08' file.txt
+```
+
+**Example output:**
+```
+0000000    %   %       f   i   l   e   .  \n  \b   \   b   e   g   i
+                                            ^^^ backspace character
+```
+
+**Fix:**
+```bash
+# Remove all backspace characters
+tr -d '\b' < corrupted.tex > clean.tex
+
+# Remove all control characters (preserve newlines)
+tr -cd '[:print:]\n' < file.txt > clean.txt
+```
+
+**Prevention:** Use proper quoting when generating files (see Bash reference for LaTeX string escaping).
+
+## Error Handling
+
+### Basic Pattern
+
+```python
+import sys
+import subprocess
+
+try:
+    result = subprocess.run(['command'],
+                          capture_output=True,
+                          text=True,
+                          check=True)  # Raises on non-zero exit
+except subprocess.CalledProcessError as e:
+    print(f"Error: Command failed with exit code {e.returncode}", file=sys.stderr)
+    print(f"stderr: {e.stderr}", file=sys.stderr)
+    sys.exit(1)
+except FileNotFoundError:
+    print("Error: Command not found in PATH", file=sys.stderr)
+    sys.exit(1)
+```
+
+### File Operations
+
+```python
+try:
+    with open(file_path, 'r') as f:
+        content = f.read()
+except FileNotFoundError:
+    print(f"Error: File not found: {file_path}", file=sys.stderr)
+    sys.exit(1)
+except PermissionError:
+    print(f"Error: Permission denied: {file_path}", file=sys.stderr)
+    sys.exit(1)
+except IOError as e:
+    print(f"Error reading file: {e}", file=sys.stderr)
+    sys.exit(1)
+```
+
+## Argparse Patterns
+
+### Multi-Mode Scripts
+
+```python
+import argparse
+
+parser = argparse.ArgumentParser(description='Script description')
+parser.add_argument('input', nargs='?', help='Input file or topic')
+parser.add_argument('--url', help='Direct URL mode')
+parser.add_argument('--verify', action='store_true', help='Verify output')
+args = parser.parse_args()
+
+# Validate combinations
+if not args.input and not args.url:
+    parser.error("Provide either input or --url")
+```
+
+### Common Flag Patterns
+
+```python
+parser.add_argument('-v', '--verbose', action='store_true',
+                   help='Verbose output')
+parser.add_argument('-f', '--force', action='store_true',
+                   help='Force operation')
+parser.add_argument('-o', '--output', default='output.txt',
+                   help='Output file')
+parser.add_argument('--count', type=int, default=5,
+                   help='Number of items')
+parser.add_argument('--config', type=str,
+                   help='Config file path')
+```
+
+### Mutually Exclusive Groups
+
+```python
+group = parser.add_mutually_exclusive_group()
+group.add_argument('--json', action='store_true')
+group.add_argument('--yaml', action='store_true')
+```
+
+## Environment Variables
+
+```python
+import os
+
+# ✅ Never hardcode credentials
+API_KEY = os.getenv('API_KEY')
+if not API_KEY:
+    print("Error: API_KEY environment variable not set", file=sys.stderr)
+    sys.exit(1)
+
+# ✅ Provide defaults
+LOG_LEVEL = os.getenv('LOG_LEVEL', 'INFO')
+OUTPUT_DIR = os.getenv('OUTPUT_DIR', './output')
+
+# ✅ Type conversion with defaults
+MAX_RETRIES = int(os.getenv('MAX_RETRIES', '3'))
+TIMEOUT = float(os.getenv('TIMEOUT', '30.0'))
+```
+
+## File Processing Patterns
+
+### Process Files Matching Pattern
+
+```python
+import glob
+import sys
+
+def process_files(pattern: str) -> list[str]:
+    """Find and process files matching pattern."""
+    files = glob.glob(pattern, recursive=True)
+    results = []
+
+    for file in files:
+        try:
+            with open(file, 'r') as f:
+                content = f.read()
+                results.append(process(content))
+        except IOError as e:
+            print(f"Error reading {file}: {e}", file=sys.stderr)
+
+    return results
+```
+
+### Safe File Writing
+
+```python
+import tempfile
+import shutil
+
+def safe_write(file_path: str, content: str):
+    """Write to temp file first, then atomic move."""
+    # Write to temp file in same directory
+    dir_name = os.path.dirname(file_path)
+    with tempfile.NamedTemporaryFile(mode='w', dir=dir_name,
+                                     delete=False) as tmp:
+        tmp.write(content)
+        tmp_path = tmp.name
+
+    # Atomic move
+    shutil.move(tmp_path, file_path)
+```
+
+## URL Verification
+
+```python
+import subprocess
+
+def verify_url(url: str) -> bool:
+    """Verify URL is accessible with HTTP HEAD request."""
+    result = subprocess.run(['curl', '-I', '-s', url],
+                          capture_output=True, text=True)
+
+    if 'HTTP/2 200' in result.stdout or 'HTTP/1.1 200' in result.stdout:
+        if 'content-type:' in result.stdout.lower():
+            return True
+    return False
+```
+
+## Automation Script Patterns
+
+### Dry-Run Mode
+
+```python
+import argparse
+
+parser = argparse.ArgumentParser()
+parser.add_argument('--force', action='store_true',
+                   help='Apply changes (dry-run by default)')
+args = parser.parse_args()
+
+dry_run = not args.force
+
+# Use dry_run flag throughout script
+for item in items:
+    change_description = f"Would rename {item['old']} → {item['new']}"
+
+    if dry_run:
+        print(f"→ {change_description}")
+    else:
+        print(f"✓ {change_description}")
+        apply_change(item)
+```
+
+### Backup-First Pattern
+
+```python
+from datetime import datetime
+import shutil
+
+def backup_before_modify(config_path: str) -> str:
+    """Create timestamped backup before modifications."""
+    timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
+    backup_path = f"{config_path}.backup.{timestamp}"
+
+    shutil.copy2(config_path, backup_path)
+    print(f"✓ Backup created: {backup_path}")
+
+    return backup_path
+
+# Use in operations
+if not dry_run:
+    backup_before_modify(config_path)
+    update_config(config_path)
+```
+
+### Self-Documenting Output
+
+```python
+print("=" * 70)
+print("CONFIGURATION MIGRATION")
+print("=" * 70)
+print()
+
+print("Step 1: Analyzing input files")
+print("-" * 70)
+files = find_files()
+print(f"Found: {len(files)} files")
+for f in files[:5]:
+    print(f"  • {f}")
+print()
+
+print("Step 2: Validating configuration")
+print("-" * 70)
+errors = validate_config()
+if errors:
+    print(f"✗ Found {len(errors)} errors")
+    for error in errors:
+        print(f"  • {error}")
+else:
+    print("✓ Configuration valid")
+```
+
+## Common Pitfalls
+
+### ❌ Using shell=True Unnecessarily
+
+```python
+# Vulnerable and error-prone
+subprocess.run(f'rm -rf {user_input}', shell=True)  # DANGER
+```
+
+### ✅ Use List Arguments
+
+```python
+subprocess.run(['rm', '-rf', user_input])  # Safe
+```
+
+### ❌ Not Handling Encoding
+
+```python
+result = subprocess.run(['cmd'], capture_output=True)
+print(result.stdout)  # bytes, not string
+```
+
+### ✅ Specify text=True
+
+```python
+result = subprocess.run(['cmd'], capture_output=True, text=True)
+print(result.stdout)  # string
+```
+
+### ❌ Ignoring Errors
+
+```python
+result = subprocess.run(['cmd'])
+# No error handling
+```
+
+### ✅ Check Exit Code
+
+```python
+result = subprocess.run(['cmd'], capture_output=True, text=True)
+if result.returncode != 0:
+    print(f"Error: {result.stderr}", file=sys.stderr)
+    sys.exit(1)
+```
+
+## Validation Tools
+
+```bash
+# Check syntax
+python3 -m py_compile script.py
+
+# Lint with pylint
+pip install pylint
+pylint script.py
+
+# Format with black
+pip install black
+black script.py
+
+# Type check with mypy
+pip install mypy
+mypy script.py
+```
+
+## References
+
+- Python subprocess docs: https://docs.python.org/3/library/subprocess.html
+- Real Python subprocess guide: https://realpython.com/python-subprocess/
+- Argparse tutorial: https://docs.python.org/3/howto/argparse.html