Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 09:05:19 +08:00
commit 09fec2555b
96 changed files with 24269 additions and 0 deletions

View File

@@ -0,0 +1,529 @@
# Bash Scripting Reference
Detailed patterns and examples for Bash automation scripts.
## Error Handling
### Essential Settings
Put these at the top of **every** Bash script:
```bash
#!/usr/bin/env bash
set -Eeuo pipefail
trap cleanup SIGINT SIGTERM ERR EXIT
cleanup() {
trap - SIGINT SIGTERM ERR EXIT
# Cleanup code here (remove temp files, etc.)
}
```
### Flag Breakdown
**`-E` (errtrap):** Error traps work in functions
```bash
trap 'echo "Error"' ERR
func() { false; } # Trap fires (wouldn't without -E)
```
**`-e` (errexit):** Stop on first error
```bash
command_fails # Script exits here
never_runs # Never executes
```
**`-u` (nounset):** Catch undefined variables
```bash
echo "$TYPO" # Error: TYPO: unbound variable (not silent)
```
**`-o pipefail`:** Detect failures in pipes
```bash
false | true # Fails (not just last command status)
```
**`trap`:** Run cleanup on exit/error/signal
## String Escaping for LaTeX and Special Characters
**Problem:** Bash interprets escape sequences in double-quoted strings, which corrupts LaTeX commands and special text.
**Dangerous sequences:** `\b` (backspace), `\n` (newline), `\t` (tab), `\r` (return)
### Example Failure
```bash
# ❌ Wrong: Creates backspace character
echo "\\begin{document}" >> file.tex # Becomes: <backspace>egin{document}
echo "\\bibliographystyle{ACM}" >> file.tex # Becomes: <backspace>ibliographystyle{ACM}
```
### Safe Approaches
**1. Single quotes** (Best for simple cases):
```bash
echo '\begin{document}' >> file.tex # ✅ No interpretation
echo '\bibliographystyle{ACM-Reference-Format}' >> file.tex # ✅ Safe
```
**2. Double backslashes** (When variables needed):
```bash
echo "\\\\begin{document}" >> file.tex # ✅ 4 backslashes → \b
cmd="begin"
echo "\\\\${cmd}{document}" >> file.tex # ✅ Works with variables
```
**3. Printf** (More predictable):
```bash
printf '%s\n' '\begin{document}' >> file.tex # ✅ Literal strings
printf '%s\n' '\bibliographystyle{ACM-Reference-Format}' >> file.tex
```
**4. Heredoc** (Best for multi-line LaTeX):
```bash
cat >> file.tex << 'EOF' # ✅ Note quoted delimiter
\begin{document}
\section{Title}
\bibliographystyle{ACM-Reference-Format}
\end{document}
EOF
```
### Quick Reference
| Character | Echo double-quotes | Echo single-quotes | Heredoc |
|-----------|-------------------|-------------------|---------|
| `\b` | ❌ Backspace | ✅ Literal | ✅ Literal |
| `\n` | ❌ Newline | ✅ Literal | ✅ Literal |
| `\t` | ❌ Tab | ✅ Literal | ✅ Literal |
| Variables | ✅ Work | ❌ Don't expand | ✅ With `"EOF"` |
**Rule of thumb:** For LaTeX, use single quotes or heredocs to avoid escape sequence interpretation.
## Variable Quoting
### Always Quote Variables
```bash
# ✅ Always quote variables
file="my file.txt"
cat "$file" # Correct
# ❌ Unquoted breaks on spaces
cat $file # WRONG: tries to cat "my" and "file.txt"
```
### Array Expansion
```bash
files=("file 1.txt" "file 2.txt")
# ✅ Quote array expansion
for file in "${files[@]}"; do
echo "$file"
done
# ❌ Unquoted splits on spaces
for file in ${files[@]}; do
echo "$file" # WRONG: treats spaces as separators
done
```
## Script Directory Detection
```bash
# Get directory where script is located
script_dir=$(cd "$(dirname "${BASH_SOURCE[0]}")" &>/dev/null && pwd -P)
# Use for relative paths
source "${script_dir}/config.sh"
data_file="${script_dir}/../data/input.txt"
```
## Functions
### Function Template
```bash
# Document functions with comments
# Args:
# $1 - input file
# $2 - output file
# Returns:
# 0 on success, 1 on error
process_file() {
local input="$1"
local output="$2"
if [[ ! -f "$input" ]]; then
echo "Error: Input file not found: $input" >&2
return 1
fi
# Process file
grep pattern "$input" > "$output"
}
# Call function
if process_file "input.txt" "output.txt"; then
echo "Success"
else
echo "Failed" >&2
exit 1
fi
```
### Local Variables
Always use `local` for function variables:
```bash
process_data() {
local data="$1" # ✅ Local to function
local result
result=$(transform "$data")
echo "$result"
}
```
## Error Messages
### Write Errors to Stderr
```bash
# ✅ Write errors to stderr
echo "Error: File not found" >&2
# ✅ Exit with non-zero code
exit 1
# ❌ Don't write errors to stdout
echo "Error: File not found"
```
### Structured Error Handling
```bash
error() {
echo "Error: $*" >&2
exit 1
}
warn() {
echo "Warning: $*" >&2
}
# Usage
[[ -f "$config" ]] || error "Config file not found: $config"
[[ -w "$output" ]] || warn "Output file not writable: $output"
```
## Checking Commands Exist
```bash
if ! command -v jq &> /dev/null; then
echo "Error: jq is required but not installed" >&2
exit 1
fi
# Check multiple commands
for cmd in curl jq sed; do
if ! command -v "$cmd" &> /dev/null; then
echo "Error: $cmd is required but not installed" >&2
exit 1
fi
done
```
## Parallel Processing
```bash
# Run commands in parallel, wait for all
for file in *.txt; do
process_file "$file" &
done
wait
echo "All files processed"
```
### Parallel with Error Handling
```bash
pids=()
for file in *.txt; do
process_file "$file" &
pids+=($!)
done
# Wait and check exit codes
failed=0
for pid in "${pids[@]}"; do
if ! wait "$pid"; then
((failed++))
fi
done
if [[ $failed -gt 0 ]]; then
echo "Error: $failed jobs failed" >&2
exit 1
fi
```
## Configuration Files
### Loading Config
```bash
# Load config file if exists
config_file="${script_dir}/config.sh"
if [[ -f "$config_file" ]]; then
source "$config_file"
else
# Default values
LOG_DIR="/var/log"
BACKUP_DIR="/backup"
fi
```
### Safe Config Sourcing
```bash
# Validate config before sourcing
validate_config() {
local config="$1"
# Check syntax
if ! bash -n "$config" 2>/dev/null; then
echo "Error: Invalid syntax in $config" >&2
return 1
fi
return 0
}
if validate_config "$config_file"; then
source "$config_file"
else
exit 1
fi
```
## Argument Parsing
### Simple Pattern
```bash
# Parse flags
VERBOSE=false
FORCE=false
while [[ $# -gt 0 ]]; do
case "$1" in
-v|--verbose)
VERBOSE=true
shift
;;
-f|--force)
FORCE=true
shift
;;
-o|--output)
OUTPUT="$2"
shift 2
;;
*)
echo "Unknown option: $1" >&2
exit 1
;;
esac
done
```
### Usage Function
```bash
usage() {
cat << EOF
Usage: $0 [OPTIONS] INPUT OUTPUT
Process files with various options.
OPTIONS:
-v, --verbose Verbose output
-f, --force Force operation
-o, --output Output file
-h, --help Show this help
EXAMPLES:
$0 input.txt output.txt
$0 -v --force input.txt output.txt
EOF
}
# Show usage on error or -h
[[ "$1" == "-h" || "$1" == "--help" ]] && usage && exit 0
[[ $# -lt 2 ]] && usage && exit 1
```
## Temporary Files
### Safe Temp File Creation
```bash
# Create temp file
tmpfile=$(mktemp)
trap "rm -f '$tmpfile'" EXIT
# Use temp file
curl -s "$url" > "$tmpfile"
process "$tmpfile"
# Cleanup happens automatically via trap
```
### Temp Directory
```bash
# Create temp directory
tmpdir=$(mktemp -d)
trap "rm -rf '$tmpdir'" EXIT
# Use temp directory
download_files "$tmpdir"
process_directory "$tmpdir"
```
## Common Patterns
### File Existence Checks
```bash
# Check file exists
[[ -f "$file" ]] || error "File not found: $file"
# Check directory exists
[[ -d "$dir" ]] || error "Directory not found: $dir"
# Check file readable
[[ -r "$file" ]] || error "File not readable: $file"
# Check file writable
[[ -w "$file" ]] || error "File not writable: $file"
```
### String Comparisons
```bash
# Check empty string
[[ -z "$var" ]] && error "Variable is empty"
# Check non-empty string
[[ -n "$var" ]] || error "Variable not set"
# String equality
[[ "$a" == "$b" ]] && echo "Equal"
# Pattern matching
[[ "$file" == *.txt ]] && echo "Text file"
```
### Numeric Comparisons
```bash
# Greater than
[[ $count -gt 10 ]] && echo "More than 10"
# Less than or equal
[[ $count -le 5 ]] && echo "5 or fewer"
# Equal
[[ $count -eq 0 ]] && echo "Zero"
```
## Common Pitfalls
### ❌ Unquoted Variables
```bash
file=$1
cat $file # Breaks with spaces
```
### ✅ Always Quote
```bash
file="$1"
cat "$file"
```
### ❌ Escape Sequences in LaTeX
```bash
# Corrupts \begin, \bibitem, etc.
echo "\\begin{document}" >> file.tex # Creates <backspace>egin
```
### ✅ Use Single Quotes or Heredocs
```bash
echo '\begin{document}' >> file.tex
# Or:
cat >> file.tex << 'EOF'
\begin{document}
EOF
```
### ❌ No Error Handling
```bash
#!/bin/bash
command_that_might_fail
continue_anyway
```
### ✅ Fail Fast
```bash
#!/usr/bin/env bash
set -Eeuo pipefail
command_that_might_fail # Script exits on failure
```
### ❌ Unvalidated User Input
```bash
rm -rf /$user_input # DANGER
```
### ✅ Validate Input
```bash
# Validate directory name
if [[ ! "$user_input" =~ ^[a-zA-Z0-9_-]+$ ]]; then
error "Invalid directory name"
fi
```
## Validation Tools
```bash
# Check syntax
bash -n script.sh
# Static analysis with shellcheck
brew install shellcheck # macOS
apt install shellcheck # Ubuntu
shellcheck script.sh
# Run with debug mode
bash -x script.sh
```
## References
- Bash error handling: https://bertvv.github.io/cheat-sheets/Bash.html
- ShellCheck: https://www.shellcheck.net/
- Bash best practices: https://mywiki.wooledge.org/BashGuide

View File

@@ -0,0 +1,406 @@
# Python Scripting Reference
Detailed patterns and examples for Python automation scripts.
## Subprocess Patterns
### Two-Stage Subprocess (Avoid Shell Parsing)
**Problem:** Using `shell=True` with complex patterns causes shell parsing issues.
**❌ Don't: shell=True with complex patterns**
```python
cmd = 'curl -s "url" | grep -oE "pattern(with|parens)"'
subprocess.run(cmd, shell=True, ...)
```
**✅ Do: Separate calls with input= piping**
```python
curl_result = subprocess.run(['curl', '-s', url],
capture_output=True, text=True)
grep_result = subprocess.run(['grep', '-oE', pattern],
input=curl_result.stdout,
capture_output=True, text=True)
```
### Why List Arguments Work
- Python executes command directly (no shell interpretation)
- Arguments passed as literal strings
- Special chars like `|(){}` treated as text, not operators
### When shell=True Is Needed
Only use for hard-coded commands that require shell features:
- `*` wildcards
- `~` home directory expansion
- `&&` operators
- Environment variable expansion
```python
# Hard-coded command only
subprocess.run('ls *.txt | wc -l', shell=True, ...)
```
## Debugging Subprocess Failures
### Workflow
1. **Test command in bash first** - Verify it works outside Python
2. **Add debug output:**
```python
result = subprocess.run(cmd, ...)
print(f"stdout: {result.stdout[:100]}")
print(f"stderr: {result.stderr}")
print(f"returncode: {result.returncode}")
```
3. **Check stderr for shell errors** - Syntax errors indicate shell parsing issues
4. **Rewrite without shell=True** - Use list arguments and two-stage pattern
### Common Errors
| Error | Cause | Solution |
|-------|-------|----------|
| `syntax error near unexpected token '('` | Shell parsing regex/parens | Two-stage subprocess |
| `command not found` | PATH issue or typo | Check command exists with `which` |
| Empty stdout | Command construction error | Debug with stderr output |
### Debugging Invisible Characters
**Problem:** Files with invisible characters (backspace, null bytes) cause mysterious errors.
**Symptoms:**
- LaTeX: `Unicode character ^^H (U+0008) not set up for use with LaTeX`
- Commands fail with "invalid character" but file looks normal
**Detection:**
```bash
# Show all characters including invisible ones
od -c file.txt
# Check specific line range
sed -n '10,20p' file.txt | od -c
# Find backspaces
grep -P '\x08' file.txt
```
**Example output:**
```
0000000 % % f i l e . \n \b \ b e g i
^^^ backspace character
```
**Fix:**
```bash
# Remove all backspace characters
tr -d '\b' < corrupted.tex > clean.tex
# Remove all control characters (preserve newlines)
tr -cd '[:print:]\n' < file.txt > clean.txt
```
**Prevention:** Use proper quoting when generating files (see Bash reference for LaTeX string escaping).
## Error Handling
### Basic Pattern
```python
import sys
import subprocess
try:
result = subprocess.run(['command'],
capture_output=True,
text=True,
check=True) # Raises on non-zero exit
except subprocess.CalledProcessError as e:
print(f"Error: Command failed with exit code {e.returncode}", file=sys.stderr)
print(f"stderr: {e.stderr}", file=sys.stderr)
sys.exit(1)
except FileNotFoundError:
print("Error: Command not found in PATH", file=sys.stderr)
sys.exit(1)
```
### File Operations
```python
try:
with open(file_path, 'r') as f:
content = f.read()
except FileNotFoundError:
print(f"Error: File not found: {file_path}", file=sys.stderr)
sys.exit(1)
except PermissionError:
print(f"Error: Permission denied: {file_path}", file=sys.stderr)
sys.exit(1)
except IOError as e:
print(f"Error reading file: {e}", file=sys.stderr)
sys.exit(1)
```
## Argparse Patterns
### Multi-Mode Scripts
```python
import argparse
parser = argparse.ArgumentParser(description='Script description')
parser.add_argument('input', nargs='?', help='Input file or topic')
parser.add_argument('--url', help='Direct URL mode')
parser.add_argument('--verify', action='store_true', help='Verify output')
args = parser.parse_args()
# Validate combinations
if not args.input and not args.url:
parser.error("Provide either input or --url")
```
### Common Flag Patterns
```python
parser.add_argument('-v', '--verbose', action='store_true',
help='Verbose output')
parser.add_argument('-f', '--force', action='store_true',
help='Force operation')
parser.add_argument('-o', '--output', default='output.txt',
help='Output file')
parser.add_argument('--count', type=int, default=5,
help='Number of items')
parser.add_argument('--config', type=str,
help='Config file path')
```
### Mutually Exclusive Groups
```python
group = parser.add_mutually_exclusive_group()
group.add_argument('--json', action='store_true')
group.add_argument('--yaml', action='store_true')
```
## Environment Variables
```python
import os
# ✅ Never hardcode credentials
API_KEY = os.getenv('API_KEY')
if not API_KEY:
print("Error: API_KEY environment variable not set", file=sys.stderr)
sys.exit(1)
# ✅ Provide defaults
LOG_LEVEL = os.getenv('LOG_LEVEL', 'INFO')
OUTPUT_DIR = os.getenv('OUTPUT_DIR', './output')
# ✅ Type conversion with defaults
MAX_RETRIES = int(os.getenv('MAX_RETRIES', '3'))
TIMEOUT = float(os.getenv('TIMEOUT', '30.0'))
```
## File Processing Patterns
### Process Files Matching Pattern
```python
import glob
import sys
def process_files(pattern: str) -> list[str]:
"""Find and process files matching pattern."""
files = glob.glob(pattern, recursive=True)
results = []
for file in files:
try:
with open(file, 'r') as f:
content = f.read()
results.append(process(content))
except IOError as e:
print(f"Error reading {file}: {e}", file=sys.stderr)
return results
```
### Safe File Writing
```python
import tempfile
import shutil
def safe_write(file_path: str, content: str):
"""Write to temp file first, then atomic move."""
# Write to temp file in same directory
dir_name = os.path.dirname(file_path)
with tempfile.NamedTemporaryFile(mode='w', dir=dir_name,
delete=False) as tmp:
tmp.write(content)
tmp_path = tmp.name
# Atomic move
shutil.move(tmp_path, file_path)
```
## URL Verification
```python
import subprocess
def verify_url(url: str) -> bool:
"""Verify URL is accessible with HTTP HEAD request."""
result = subprocess.run(['curl', '-I', '-s', url],
capture_output=True, text=True)
if 'HTTP/2 200' in result.stdout or 'HTTP/1.1 200' in result.stdout:
if 'content-type:' in result.stdout.lower():
return True
return False
```
## Automation Script Patterns
### Dry-Run Mode
```python
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--force', action='store_true',
help='Apply changes (dry-run by default)')
args = parser.parse_args()
dry_run = not args.force
# Use dry_run flag throughout script
for item in items:
change_description = f"Would rename {item['old']} → {item['new']}"
if dry_run:
print(f"→ {change_description}")
else:
print(f"✓ {change_description}")
apply_change(item)
```
### Backup-First Pattern
```python
from datetime import datetime
import shutil
def backup_before_modify(config_path: str) -> str:
"""Create timestamped backup before modifications."""
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
backup_path = f"{config_path}.backup.{timestamp}"
shutil.copy2(config_path, backup_path)
print(f"✓ Backup created: {backup_path}")
return backup_path
# Use in operations
if not dry_run:
backup_before_modify(config_path)
update_config(config_path)
```
### Self-Documenting Output
```python
print("=" * 70)
print("CONFIGURATION MIGRATION")
print("=" * 70)
print()
print("Step 1: Analyzing input files")
print("-" * 70)
files = find_files()
print(f"Found: {len(files)} files")
for f in files[:5]:
print(f" • {f}")
print()
print("Step 2: Validating configuration")
print("-" * 70)
errors = validate_config()
if errors:
print(f"✗ Found {len(errors)} errors")
for error in errors:
print(f" • {error}")
else:
print("✓ Configuration valid")
```
## Common Pitfalls
### ❌ Using shell=True Unnecessarily
```python
# Vulnerable and error-prone
subprocess.run(f'rm -rf {user_input}', shell=True) # DANGER
```
### ✅ Use List Arguments
```python
subprocess.run(['rm', '-rf', user_input]) # Safe
```
### ❌ Not Handling Encoding
```python
result = subprocess.run(['cmd'], capture_output=True)
print(result.stdout) # bytes, not string
```
### ✅ Specify text=True
```python
result = subprocess.run(['cmd'], capture_output=True, text=True)
print(result.stdout) # string
```
### ❌ Ignoring Errors
```python
result = subprocess.run(['cmd'])
# No error handling
```
### ✅ Check Exit Code
```python
result = subprocess.run(['cmd'], capture_output=True, text=True)
if result.returncode != 0:
print(f"Error: {result.stderr}", file=sys.stderr)
sys.exit(1)
```
## Validation Tools
```bash
# Check syntax
python3 -m py_compile script.py
# Lint with pylint
pip install pylint
pylint script.py
# Format with black
pip install black
black script.py
# Type check with mypy
pip install mypy
mypy script.py
```
## References
- Python subprocess docs: https://docs.python.org/3/library/subprocess.html
- Real Python subprocess guide: https://realpython.com/python-subprocess/
- Argparse tutorial: https://docs.python.org/3/howto/argparse.html