gh-olino3-forge-forge-plugin/skills/python-code-review/examples.md

# Python Code Review Examples

This file contains example code review scenarios demonstrating common issues and recommended fixes.

## Example 1: Security Vulnerability - SQL Injection

### Before (Vulnerable Code)

```python
# user_service.py:15
def get_user_by_email(email):
    query = f"SELECT * FROM users WHERE email = '{email}'"
    cursor.execute(query)
    return cursor.fetchone()
```

### Review Comment

**Severity**: Critical
**Category**: Security
**File**: user_service.py:16

SQL injection vulnerability detected. User input is directly interpolated into SQL query, allowing attackers to execute arbitrary SQL commands.

**Attack example**:
```python
email = "'; DROP TABLE users; --"
# Results in: SELECT * FROM users WHERE email = ''; DROP TABLE users; --'
```

### After (Fixed Code)

```python
# user_service.py:15
def get_user_by_email(email):
    query = "SELECT * FROM users WHERE email = %s"
    cursor.execute(query, (email,))
    return cursor.fetchone()
```

**Reference**: OWASP A03:2021 - Injection

---

## Example 2: Performance Issue - N+1 Query Problem (Django)

### Before (Inefficient Code)

```python
# views.py:45
def get_posts_with_authors(request):
    posts = Post.objects.all()  # 1 query
    result = []
    for post in posts:
        result.append({
            'title': post.title,
            'author': post.author.name  # N additional queries!
        })
    return JsonResponse(result, safe=False)
```

### Review Comment

**Severity**: Important
**Category**: Performance
**File**: views.py:48

N+1 query problem detected. For 100 posts, this executes 101 database queries (1 for posts + 100 for authors). This causes severe performance degradation under load.

### After (Optimized Code)

```python
# views.py:45
def get_posts_with_authors(request):
    posts = Post.objects.select_related('author').all()  # 1 query with JOIN
    result = []
    for post in posts:
        result.append({
            'title': post.title,
            'author': post.author.name
        })
    return JsonResponse(result, safe=False)
```

**Performance gain**: 101 queries → 1 query (100x improvement for 100 posts)

**Reference**: Django QuerySet optimization

---

## Example 3: Code Quality - Mutable Default Argument

### Before (Buggy Code)

```python
# utils.py:22
def add_item(item, items=[]):
    items.append(item)
    return items

# Usage that reveals the bug:
list1 = add_item('a')  # ['a']
list2 = add_item('b')  # ['a', 'b'] - UNEXPECTED!
```

### Review Comment

**Severity**: Important
**Category**: Code Quality
**File**: utils.py:22

Mutable default argument antipattern. The default list `[]` is created once when the function is defined, not each time it's called. All invocations share the same list object, causing unexpected state persistence.

### After (Fixed Code)

```python
# utils.py:22
def add_item(item, items=None):
    if items is None:
        items = []
    items.append(item)
    return items

# Now works correctly:
list1 = add_item('a')  # ['a']
list2 = add_item('b')  # ['b'] - CORRECT!
```

**Reference**: Common Python Gotchas

---

## Example 4: PEP 8 Compliance - Naming Conventions

### Before (Non-compliant Code)

```python
# data_processor.py:10
def CalculateUserAge(BirthDate):
    CurrentYear = 2025
    user_birth_year = BirthDate.year
    AGE = CurrentYear - user_birth_year
    return AGE
```

### Review Comment

**Severity**: Minor
**Category**: Style
**File**: data_processor.py:10-15

Multiple PEP 8 naming violations:
- Function name should be `snake_case`, not `PascalCase`
- Parameter name should be `snake_case`, not `PascalCase`
- Local variables should be lowercase, not mixed case or UPPERCASE
- UPPERCASE is reserved for constants

### After (Compliant Code)

```python
# data_processor.py:10
def calculate_user_age(birth_date):
    current_year = 2025
    user_birth_year = birth_date.year
    age = current_year - user_birth_year
    return age
```

**Reference**: PEP 8 - Naming Conventions

---

## Example 5: Best Practice - Context Manager for Resource Handling

### Before (Resource Leak Risk)

```python
# file_processor.py:30
def process_log_file(filepath):
    file = open(filepath, 'r')
    data = file.read()
    results = analyze(data)
    file.close()  # May not execute if analyze() raises exception
    return results
```

### Review Comment

**Severity**: Important
**Category**: Best Practices
**File**: file_processor.py:31

Missing context manager for file handling. If `analyze()` raises an exception, `file.close()` never executes, leaving the file handle open (resource leak).

### After (Safe Code)

```python
# file_processor.py:30
def process_log_file(filepath):
    with open(filepath, 'r') as file:
        data = file.read()
        results = analyze(data)
    # File automatically closed even if exception occurs
    return results
```

**Bonus improvement**:
```python
# Even better with pathlib
from pathlib import Path

def process_log_file(filepath):
    data = Path(filepath).read_text()
    return analyze(data)
```

**Reference**: PEP 343 - The "with" Statement

---

## Example 6: Security - Hardcoded Credentials

### Before (Security Risk)

```python
# config.py:5
DATABASE_CONFIG = {
    'host': 'prod-db.example.com',
    'user': 'admin',
    'password': 'SuperSecret123!',  # NEVER do this
    'database': 'production'
}
```

### Review Comment

**Severity**: Critical
**Category**: Security
**File**: config.py:8

Hardcoded credentials detected. Passwords in source code:
1. Are visible to anyone with repository access
2. Get committed to version control history
3. Can't be rotated without code changes
4. May be exposed in logs or error messages

### After (Secure Code)

```python
# config.py:5
import os

DATABASE_CONFIG = {
    'host': os.getenv('DB_HOST', 'localhost'),
    'user': os.getenv('DB_USER'),
    'password': os.getenv('DB_PASSWORD'),
    'database': os.getenv('DB_NAME', 'production')
}

# Validate required environment variables
required_vars = ['DB_USER', 'DB_PASSWORD']
missing = [var for var in required_vars if not os.getenv(var)]
if missing:
    raise RuntimeError(f"Missing required environment variables: {missing}")
```

**Additional security**:
```bash
# Use environment files (not committed to git)
echo "DB_PASSWORD=..." > .env
echo ".env" >> .gitignore
```

**Reference**: OWASP A07:2021 - Identification and Authentication Failures

---

## Example 7: Performance - Pandas Optimization

### Before (Inefficient Code)

```python
# data_analysis.py:50
import pandas as pd

def calculate_discounts(df):
    # Anti-pattern: Iterating over DataFrame rows
    discounts = []
    for index, row in df.iterrows():
        if row['total'] > 100:
            discount = row['total'] * 0.1
        else:
            discount = 0
        discounts.append(discount)
    df['discount'] = discounts
    return df
```

### Review Comment

**Severity**: Important
**Category**: Performance
**File**: data_analysis.py:53

Using `iterrows()` on DataFrame - this is one of the slowest operations in pandas. For 10,000 rows, this can be 100x slower than vectorized operations.

### After (Vectorized Code)

```python
# data_analysis.py:50
import pandas as pd

def calculate_discounts(df):
    # Vectorized operation - operates on entire column at once
    df['discount'] = (df['total'] * 0.1).where(df['total'] > 100, 0)
    return df

# Alternative using numpy where:
import numpy as np

def calculate_discounts(df):
    df['discount'] = np.where(df['total'] > 100, df['total'] * 0.1, 0)
    return df
```

**Performance**: Vectorized operations use optimized C code, achieving 50-100x speedup on large datasets.

**Reference**: Pandas Performance Optimization

---

## Example 8: Testing - Missing Edge Cases

### Before (Incomplete Tests)

```python
# test_validators.py:15
def test_email_validation():
    assert is_valid_email('user@example.com') == True
    assert is_valid_email('invalid-email') == False
```

### Review Comment

**Severity**: Important
**Category**: Testing
**File**: test_validators.py:15

Email validation tests are insufficient. Missing edge cases:
- Empty string
- None value
- Email with special characters
- Multiple @ symbols
- Missing domain
- Whitespace handling
- Maximum length validation

### After (Comprehensive Tests)

```python
# test_validators.py:15
import pytest

@pytest.mark.parametrize('email,expected', [
    # Valid emails
    ('user@example.com', True),
    ('first.last@example.co.uk', True),
    ('user+tag@example.com', True),

    # Invalid emails
    ('invalid-email', False),
    ('', False),
    ('user@', False),
    ('user@@example.com', False),
    ('@example.com', False),
    ('user @example.com', False),
    ('a' * 256 + '@example.com', False),  # Too long
])
def test_email_validation(email, expected):
    assert is_valid_email(email) == expected

def test_email_validation_with_none():
    with pytest.raises(TypeError):
        is_valid_email(None)
```

**Reference**: Testing Best Practices

---

## Example 9: Architecture - Separation of Concerns (FastAPI)

### Before (Tightly Coupled Code)

```python
# main.py:25
from fastapi import FastAPI
import psycopg2

app = FastAPI()

@app.get('/users/{user_id}')
def get_user(user_id: int):
    # Business logic mixed with data access and presentation
    conn = psycopg2.connect("dbname=mydb user=admin password=secret")
    cursor = conn.cursor()
    cursor.execute(f"SELECT * FROM users WHERE id = {user_id}")
    user = cursor.fetchone()
    conn.close()

    if user:
        return {'id': user[0], 'name': user[1], 'email': user[2]}
    return {'error': 'User not found'}
```

### Review Comment

**Severity**: Important
**Category**: Architecture
**File**: main.py:25-38

Multiple violations of separation of concerns:
1. Database connection logic in route handler
2. SQL injection vulnerability
3. Hardcoded credentials
4. No error handling
5. Manual dict construction
6. No dependency injection

### After (Layered Architecture)

```python
# models.py
from pydantic import BaseModel

class User(BaseModel):
    id: int
    name: str
    email: str

# database.py
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
import os

SQLALCHEMY_DATABASE_URL = os.getenv('DATABASE_URL')
engine = create_engine(SQLALCHEMY_DATABASE_URL)
SessionLocal = sessionmaker(bind=engine)

def get_db():
    db = SessionLocal()
    try:
        yield db
    finally:
        db.close()

# repositories.py
from sqlalchemy.orm import Session
from . import models

class UserRepository:
    def get_by_id(self, db: Session, user_id: int):
        return db.query(models.User).filter(models.User.id == user_id).first()

# main.py
from fastapi import FastAPI, Depends, HTTPException
from sqlalchemy.orm import Session
from . import models, database, repositories

app = FastAPI()
user_repo = UserRepository()

@app.get('/users/{user_id}', response_model=models.User)
def get_user(user_id: int, db: Session = Depends(database.get_db)):
    user = user_repo.get_by_id(db, user_id)
    if not user:
        raise HTTPException(status_code=404, detail='User not found')
    return user
```

**Benefits**:
- Clear separation of concerns
- Dependency injection
- Type safety with Pydantic
- SQL injection protection via ORM
- Reusable repository pattern
- Proper error handling

**Reference**: FastAPI Best Practices, Repository Pattern

---

## Summary of Common Issues

1. **Security**: SQL injection, XSS, hardcoded credentials, insecure cryptography
2. **Performance**: N+1 queries, inefficient loops, missing indexes, no caching
3. **Code Quality**: Mutable defaults, global state, poor naming, missing docstrings
4. **Style**: PEP 8 violations, inconsistent formatting, magic numbers
5. **Best Practices**: Missing context managers, no type hints, poor error handling
6. **Testing**: Insufficient coverage, missing edge cases, no integration tests
7. **Architecture**: Tight coupling, mixed concerns, no dependency injection

Use these examples as reference when conducting reviews. Adapt the feedback style and technical depth to the codebase context.