Initial commit

2025-11-30 08:45:11 +08:00
commit 42c0b6ee81
16 changed files with 3608 additions and 0 deletions
--- a/skills/python-code-review/examples.md
+++ b/skills/python-code-review/examples.md
@@ -0,0 +1,503 @@
+# Python Code Review Examples
+
+This file contains example code review scenarios demonstrating common issues and recommended fixes.
+
+## Example 1: Security Vulnerability - SQL Injection
+
+### Before (Vulnerable Code)
+
+```python
+# user_service.py:15
+def get_user_by_email(email):
+    query = f"SELECT * FROM users WHERE email = '{email}'"
+    cursor.execute(query)
+    return cursor.fetchone()
+```
+
+### Review Comment
+
+**Severity**: Critical
+**Category**: Security
+**File**: user_service.py:16
+
+SQL injection vulnerability detected. User input is directly interpolated into SQL query, allowing attackers to execute arbitrary SQL commands.
+
+**Attack example**:
+```python
+email = "'; DROP TABLE users; --"
+# Results in: SELECT * FROM users WHERE email = ''; DROP TABLE users; --'
+```
+
+### After (Fixed Code)
+
+```python
+# user_service.py:15
+def get_user_by_email(email):
+    query = "SELECT * FROM users WHERE email = %s"
+    cursor.execute(query, (email,))
+    return cursor.fetchone()
+```
+
+**Reference**: OWASP A03:2021 - Injection
+
+---
+
+## Example 2: Performance Issue - N+1 Query Problem (Django)
+
+### Before (Inefficient Code)
+
+```python
+# views.py:45
+def get_posts_with_authors(request):
+    posts = Post.objects.all()  # 1 query
+    result = []
+    for post in posts:
+        result.append({
+            'title': post.title,
+            'author': post.author.name  # N additional queries!
+        })
+    return JsonResponse(result, safe=False)
+```
+
+### Review Comment
+
+**Severity**: Important
+**Category**: Performance
+**File**: views.py:48
+
+N+1 query problem detected. For 100 posts, this executes 101 database queries (1 for posts + 100 for authors). This causes severe performance degradation under load.
+
+### After (Optimized Code)
+
+```python
+# views.py:45
+def get_posts_with_authors(request):
+    posts = Post.objects.select_related('author').all()  # 1 query with JOIN
+    result = []
+    for post in posts:
+        result.append({
+            'title': post.title,
+            'author': post.author.name
+        })
+    return JsonResponse(result, safe=False)
+```
+
+**Performance gain**: 101 queries → 1 query (100x improvement for 100 posts)
+
+**Reference**: Django QuerySet optimization
+
+---
+
+## Example 3: Code Quality - Mutable Default Argument
+
+### Before (Buggy Code)
+
+```python
+# utils.py:22
+def add_item(item, items=[]):
+    items.append(item)
+    return items
+
+# Usage that reveals the bug:
+list1 = add_item('a')  # ['a']
+list2 = add_item('b')  # ['a', 'b'] - UNEXPECTED!
+```
+
+### Review Comment
+
+**Severity**: Important
+**Category**: Code Quality
+**File**: utils.py:22
+
+Mutable default argument antipattern. The default list `[]` is created once when the function is defined, not each time it's called. All invocations share the same list object, causing unexpected state persistence.
+
+### After (Fixed Code)
+
+```python
+# utils.py:22
+def add_item(item, items=None):
+    if items is None:
+        items = []
+    items.append(item)
+    return items
+
+# Now works correctly:
+list1 = add_item('a')  # ['a']
+list2 = add_item('b')  # ['b'] - CORRECT!
+```
+
+**Reference**: Common Python Gotchas
+
+---
+
+## Example 4: PEP 8 Compliance - Naming Conventions
+
+### Before (Non-compliant Code)
+
+```python
+# data_processor.py:10
+def CalculateUserAge(BirthDate):
+    CurrentYear = 2025
+    user_birth_year = BirthDate.year
+    AGE = CurrentYear - user_birth_year
+    return AGE
+```
+
+### Review Comment
+
+**Severity**: Minor
+**Category**: Style
+**File**: data_processor.py:10-15
+
+Multiple PEP 8 naming violations:
+- Function name should be `snake_case`, not `PascalCase`
+- Parameter name should be `snake_case`, not `PascalCase`
+- Local variables should be lowercase, not mixed case or UPPERCASE
+- UPPERCASE is reserved for constants
+
+### After (Compliant Code)
+
+```python
+# data_processor.py:10
+def calculate_user_age(birth_date):
+    current_year = 2025
+    user_birth_year = birth_date.year
+    age = current_year - user_birth_year
+    return age
+```
+
+**Reference**: PEP 8 - Naming Conventions
+
+---
+
+## Example 5: Best Practice - Context Manager for Resource Handling
+
+### Before (Resource Leak Risk)
+
+```python
+# file_processor.py:30
+def process_log_file(filepath):
+    file = open(filepath, 'r')
+    data = file.read()
+    results = analyze(data)
+    file.close()  # May not execute if analyze() raises exception
+    return results
+```
+
+### Review Comment
+
+**Severity**: Important
+**Category**: Best Practices
+**File**: file_processor.py:31
+
+Missing context manager for file handling. If `analyze()` raises an exception, `file.close()` never executes, leaving the file handle open (resource leak).
+
+### After (Safe Code)
+
+```python
+# file_processor.py:30
+def process_log_file(filepath):
+    with open(filepath, 'r') as file:
+        data = file.read()
+        results = analyze(data)
+    # File automatically closed even if exception occurs
+    return results
+```
+
+**Bonus improvement**:
+```python
+# Even better with pathlib
+from pathlib import Path
+
+def process_log_file(filepath):
+    data = Path(filepath).read_text()
+    return analyze(data)
+```
+
+**Reference**: PEP 343 - The "with" Statement
+
+---
+
+## Example 6: Security - Hardcoded Credentials
+
+### Before (Security Risk)
+
+```python
+# config.py:5
+DATABASE_CONFIG = {
+    'host': 'prod-db.example.com',
+    'user': 'admin',
+    'password': 'SuperSecret123!',  # NEVER do this
+    'database': 'production'
+}
+```
+
+### Review Comment
+
+**Severity**: Critical
+**Category**: Security
+**File**: config.py:8
+
+Hardcoded credentials detected. Passwords in source code:
+1. Are visible to anyone with repository access
+2. Get committed to version control history
+3. Can't be rotated without code changes
+4. May be exposed in logs or error messages
+
+### After (Secure Code)
+
+```python
+# config.py:5
+import os
+
+DATABASE_CONFIG = {
+    'host': os.getenv('DB_HOST', 'localhost'),
+    'user': os.getenv('DB_USER'),
+    'password': os.getenv('DB_PASSWORD'),
+    'database': os.getenv('DB_NAME', 'production')
+}
+
+# Validate required environment variables
+required_vars = ['DB_USER', 'DB_PASSWORD']
+missing = [var for var in required_vars if not os.getenv(var)]
+if missing:
+    raise RuntimeError(f"Missing required environment variables: {missing}")
+```
+
+**Additional security**:
+```bash
+# Use environment files (not committed to git)
+echo "DB_PASSWORD=..." > .env
+echo ".env" >> .gitignore
+```
+
+**Reference**: OWASP A07:2021 - Identification and Authentication Failures
+
+---
+
+## Example 7: Performance - Pandas Optimization
+
+### Before (Inefficient Code)
+
+```python
+# data_analysis.py:50
+import pandas as pd
+
+def calculate_discounts(df):
+    # Anti-pattern: Iterating over DataFrame rows
+    discounts = []
+    for index, row in df.iterrows():
+        if row['total'] > 100:
+            discount = row['total'] * 0.1
+        else:
+            discount = 0
+        discounts.append(discount)
+    df['discount'] = discounts
+    return df
+```
+
+### Review Comment
+
+**Severity**: Important
+**Category**: Performance
+**File**: data_analysis.py:53
+
+Using `iterrows()` on DataFrame - this is one of the slowest operations in pandas. For 10,000 rows, this can be 100x slower than vectorized operations.
+
+### After (Vectorized Code)
+
+```python
+# data_analysis.py:50
+import pandas as pd
+
+def calculate_discounts(df):
+    # Vectorized operation - operates on entire column at once
+    df['discount'] = (df['total'] * 0.1).where(df['total'] > 100, 0)
+    return df
+
+# Alternative using numpy where:
+import numpy as np
+
+def calculate_discounts(df):
+    df['discount'] = np.where(df['total'] > 100, df['total'] * 0.1, 0)
+    return df
+```
+
+**Performance**: Vectorized operations use optimized C code, achieving 50-100x speedup on large datasets.
+
+**Reference**: Pandas Performance Optimization
+
+---
+
+## Example 8: Testing - Missing Edge Cases
+
+### Before (Incomplete Tests)
+
+```python
+# test_validators.py:15
+def test_email_validation():
+    assert is_valid_email('user@example.com') == True
+    assert is_valid_email('invalid-email') == False
+```
+
+### Review Comment
+
+**Severity**: Important
+**Category**: Testing
+**File**: test_validators.py:15
+
+Email validation tests are insufficient. Missing edge cases:
+- Empty string
+- None value
+- Email with special characters
+- Multiple @ symbols
+- Missing domain
+- Whitespace handling
+- Maximum length validation
+
+### After (Comprehensive Tests)
+
+```python
+# test_validators.py:15
+import pytest
+
+@pytest.mark.parametrize('email,expected', [
+    # Valid emails
+    ('user@example.com', True),
+    ('first.last@example.co.uk', True),
+    ('user+tag@example.com', True),
+
+    # Invalid emails
+    ('invalid-email', False),
+    ('', False),
+    ('user@', False),
+    ('user@@example.com', False),
+    ('@example.com', False),
+    ('user @example.com', False),
+    ('a' * 256 + '@example.com', False),  # Too long
+])
+def test_email_validation(email, expected):
+    assert is_valid_email(email) == expected
+
+def test_email_validation_with_none():
+    with pytest.raises(TypeError):
+        is_valid_email(None)
+```
+
+**Reference**: Testing Best Practices
+
+---
+
+## Example 9: Architecture - Separation of Concerns (FastAPI)
+
+### Before (Tightly Coupled Code)
+
+```python
+# main.py:25
+from fastapi import FastAPI
+import psycopg2
+
+app = FastAPI()
+
+@app.get('/users/{user_id}')
+def get_user(user_id: int):
+    # Business logic mixed with data access and presentation
+    conn = psycopg2.connect("dbname=mydb user=admin password=secret")
+    cursor = conn.cursor()
+    cursor.execute(f"SELECT * FROM users WHERE id = {user_id}")
+    user = cursor.fetchone()
+    conn.close()
+
+    if user:
+        return {'id': user[0], 'name': user[1], 'email': user[2]}
+    return {'error': 'User not found'}
+```
+
+### Review Comment
+
+**Severity**: Important
+**Category**: Architecture
+**File**: main.py:25-38
+
+Multiple violations of separation of concerns:
+1. Database connection logic in route handler
+2. SQL injection vulnerability
+3. Hardcoded credentials
+4. No error handling
+5. Manual dict construction
+6. No dependency injection
+
+### After (Layered Architecture)
+
+```python
+# models.py
+from pydantic import BaseModel
+
+class User(BaseModel):
+    id: int
+    name: str
+    email: str
+
+# database.py
+from sqlalchemy import create_engine
+from sqlalchemy.orm import sessionmaker
+import os
+
+SQLALCHEMY_DATABASE_URL = os.getenv('DATABASE_URL')
+engine = create_engine(SQLALCHEMY_DATABASE_URL)
+SessionLocal = sessionmaker(bind=engine)
+
+def get_db():
+    db = SessionLocal()
+    try:
+        yield db
+    finally:
+        db.close()
+
+# repositories.py
+from sqlalchemy.orm import Session
+from . import models
+
+class UserRepository:
+    def get_by_id(self, db: Session, user_id: int):
+        return db.query(models.User).filter(models.User.id == user_id).first()
+
+# main.py
+from fastapi import FastAPI, Depends, HTTPException
+from sqlalchemy.orm import Session
+from . import models, database, repositories
+
+app = FastAPI()
+user_repo = UserRepository()
+
+@app.get('/users/{user_id}', response_model=models.User)
+def get_user(user_id: int, db: Session = Depends(database.get_db)):
+    user = user_repo.get_by_id(db, user_id)
+    if not user:
+        raise HTTPException(status_code=404, detail='User not found')
+    return user
+```
+
+**Benefits**:
+- Clear separation of concerns
+- Dependency injection
+- Type safety with Pydantic
+- SQL injection protection via ORM
+- Reusable repository pattern
+- Proper error handling
+
+**Reference**: FastAPI Best Practices, Repository Pattern
+
+---
+
+## Summary of Common Issues
+
+1. **Security**: SQL injection, XSS, hardcoded credentials, insecure cryptography
+2. **Performance**: N+1 queries, inefficient loops, missing indexes, no caching
+3. **Code Quality**: Mutable defaults, global state, poor naming, missing docstrings
+4. **Style**: PEP 8 violations, inconsistent formatting, magic numbers
+5. **Best Practices**: Missing context managers, no type hints, poor error handling
+6. **Testing**: Insufficient coverage, missing edge cases, no integration tests
+7. **Architecture**: Tight coupling, mixed concerns, no dependency injection
+
+Use these examples as reference when conducting reviews. Adapt the feedback style and technical depth to the codebase context.