504 lines
12 KiB
Markdown
504 lines
12 KiB
Markdown
# Python Code Review Examples
|
|
|
|
This file contains example code review scenarios demonstrating common issues and recommended fixes.
|
|
|
|
## Example 1: Security Vulnerability - SQL Injection
|
|
|
|
### Before (Vulnerable Code)
|
|
|
|
```python
|
|
# user_service.py:15
|
|
def get_user_by_email(email):
|
|
query = f"SELECT * FROM users WHERE email = '{email}'"
|
|
cursor.execute(query)
|
|
return cursor.fetchone()
|
|
```
|
|
|
|
### Review Comment
|
|
|
|
**Severity**: Critical
|
|
**Category**: Security
|
|
**File**: user_service.py:16
|
|
|
|
SQL injection vulnerability detected. User input is directly interpolated into SQL query, allowing attackers to execute arbitrary SQL commands.
|
|
|
|
**Attack example**:
|
|
```python
|
|
email = "'; DROP TABLE users; --"
|
|
# Results in: SELECT * FROM users WHERE email = ''; DROP TABLE users; --'
|
|
```
|
|
|
|
### After (Fixed Code)
|
|
|
|
```python
|
|
# user_service.py:15
|
|
def get_user_by_email(email):
|
|
query = "SELECT * FROM users WHERE email = %s"
|
|
cursor.execute(query, (email,))
|
|
return cursor.fetchone()
|
|
```
|
|
|
|
**Reference**: OWASP A03:2021 - Injection
|
|
|
|
---
|
|
|
|
## Example 2: Performance Issue - N+1 Query Problem (Django)
|
|
|
|
### Before (Inefficient Code)
|
|
|
|
```python
|
|
# views.py:45
|
|
def get_posts_with_authors(request):
|
|
posts = Post.objects.all() # 1 query
|
|
result = []
|
|
for post in posts:
|
|
result.append({
|
|
'title': post.title,
|
|
'author': post.author.name # N additional queries!
|
|
})
|
|
return JsonResponse(result, safe=False)
|
|
```
|
|
|
|
### Review Comment
|
|
|
|
**Severity**: Important
|
|
**Category**: Performance
|
|
**File**: views.py:48
|
|
|
|
N+1 query problem detected. For 100 posts, this executes 101 database queries (1 for posts + 100 for authors). This causes severe performance degradation under load.
|
|
|
|
### After (Optimized Code)
|
|
|
|
```python
|
|
# views.py:45
|
|
def get_posts_with_authors(request):
|
|
posts = Post.objects.select_related('author').all() # 1 query with JOIN
|
|
result = []
|
|
for post in posts:
|
|
result.append({
|
|
'title': post.title,
|
|
'author': post.author.name
|
|
})
|
|
return JsonResponse(result, safe=False)
|
|
```
|
|
|
|
**Performance gain**: 101 queries → 1 query (100x improvement for 100 posts)
|
|
|
|
**Reference**: Django QuerySet optimization
|
|
|
|
---
|
|
|
|
## Example 3: Code Quality - Mutable Default Argument
|
|
|
|
### Before (Buggy Code)
|
|
|
|
```python
|
|
# utils.py:22
|
|
def add_item(item, items=[]):
|
|
items.append(item)
|
|
return items
|
|
|
|
# Usage that reveals the bug:
|
|
list1 = add_item('a') # ['a']
|
|
list2 = add_item('b') # ['a', 'b'] - UNEXPECTED!
|
|
```
|
|
|
|
### Review Comment
|
|
|
|
**Severity**: Important
|
|
**Category**: Code Quality
|
|
**File**: utils.py:22
|
|
|
|
Mutable default argument antipattern. The default list `[]` is created once when the function is defined, not each time it's called. All invocations share the same list object, causing unexpected state persistence.
|
|
|
|
### After (Fixed Code)
|
|
|
|
```python
|
|
# utils.py:22
|
|
def add_item(item, items=None):
|
|
if items is None:
|
|
items = []
|
|
items.append(item)
|
|
return items
|
|
|
|
# Now works correctly:
|
|
list1 = add_item('a') # ['a']
|
|
list2 = add_item('b') # ['b'] - CORRECT!
|
|
```
|
|
|
|
**Reference**: Common Python Gotchas
|
|
|
|
---
|
|
|
|
## Example 4: PEP 8 Compliance - Naming Conventions
|
|
|
|
### Before (Non-compliant Code)
|
|
|
|
```python
|
|
# data_processor.py:10
|
|
def CalculateUserAge(BirthDate):
|
|
CurrentYear = 2025
|
|
user_birth_year = BirthDate.year
|
|
AGE = CurrentYear - user_birth_year
|
|
return AGE
|
|
```
|
|
|
|
### Review Comment
|
|
|
|
**Severity**: Minor
|
|
**Category**: Style
|
|
**File**: data_processor.py:10-15
|
|
|
|
Multiple PEP 8 naming violations:
|
|
- Function name should be `snake_case`, not `PascalCase`
|
|
- Parameter name should be `snake_case`, not `PascalCase`
|
|
- Local variables should be lowercase, not mixed case or UPPERCASE
|
|
- UPPERCASE is reserved for constants
|
|
|
|
### After (Compliant Code)
|
|
|
|
```python
|
|
# data_processor.py:10
|
|
def calculate_user_age(birth_date):
|
|
current_year = 2025
|
|
user_birth_year = birth_date.year
|
|
age = current_year - user_birth_year
|
|
return age
|
|
```
|
|
|
|
**Reference**: PEP 8 - Naming Conventions
|
|
|
|
---
|
|
|
|
## Example 5: Best Practice - Context Manager for Resource Handling
|
|
|
|
### Before (Resource Leak Risk)
|
|
|
|
```python
|
|
# file_processor.py:30
|
|
def process_log_file(filepath):
|
|
file = open(filepath, 'r')
|
|
data = file.read()
|
|
results = analyze(data)
|
|
file.close() # May not execute if analyze() raises exception
|
|
return results
|
|
```
|
|
|
|
### Review Comment
|
|
|
|
**Severity**: Important
|
|
**Category**: Best Practices
|
|
**File**: file_processor.py:31
|
|
|
|
Missing context manager for file handling. If `analyze()` raises an exception, `file.close()` never executes, leaving the file handle open (resource leak).
|
|
|
|
### After (Safe Code)
|
|
|
|
```python
|
|
# file_processor.py:30
|
|
def process_log_file(filepath):
|
|
with open(filepath, 'r') as file:
|
|
data = file.read()
|
|
results = analyze(data)
|
|
# File automatically closed even if exception occurs
|
|
return results
|
|
```
|
|
|
|
**Bonus improvement**:
|
|
```python
|
|
# Even better with pathlib
|
|
from pathlib import Path
|
|
|
|
def process_log_file(filepath):
|
|
data = Path(filepath).read_text()
|
|
return analyze(data)
|
|
```
|
|
|
|
**Reference**: PEP 343 - The "with" Statement
|
|
|
|
---
|
|
|
|
## Example 6: Security - Hardcoded Credentials
|
|
|
|
### Before (Security Risk)
|
|
|
|
```python
|
|
# config.py:5
|
|
DATABASE_CONFIG = {
|
|
'host': 'prod-db.example.com',
|
|
'user': 'admin',
|
|
'password': 'SuperSecret123!', # NEVER do this
|
|
'database': 'production'
|
|
}
|
|
```
|
|
|
|
### Review Comment
|
|
|
|
**Severity**: Critical
|
|
**Category**: Security
|
|
**File**: config.py:8
|
|
|
|
Hardcoded credentials detected. Passwords in source code:
|
|
1. Are visible to anyone with repository access
|
|
2. Get committed to version control history
|
|
3. Can't be rotated without code changes
|
|
4. May be exposed in logs or error messages
|
|
|
|
### After (Secure Code)
|
|
|
|
```python
|
|
# config.py:5
|
|
import os
|
|
|
|
DATABASE_CONFIG = {
|
|
'host': os.getenv('DB_HOST', 'localhost'),
|
|
'user': os.getenv('DB_USER'),
|
|
'password': os.getenv('DB_PASSWORD'),
|
|
'database': os.getenv('DB_NAME', 'production')
|
|
}
|
|
|
|
# Validate required environment variables
|
|
required_vars = ['DB_USER', 'DB_PASSWORD']
|
|
missing = [var for var in required_vars if not os.getenv(var)]
|
|
if missing:
|
|
raise RuntimeError(f"Missing required environment variables: {missing}")
|
|
```
|
|
|
|
**Additional security**:
|
|
```bash
|
|
# Use environment files (not committed to git)
|
|
echo "DB_PASSWORD=..." > .env
|
|
echo ".env" >> .gitignore
|
|
```
|
|
|
|
**Reference**: OWASP A07:2021 - Identification and Authentication Failures
|
|
|
|
---
|
|
|
|
## Example 7: Performance - Pandas Optimization
|
|
|
|
### Before (Inefficient Code)
|
|
|
|
```python
|
|
# data_analysis.py:50
|
|
import pandas as pd
|
|
|
|
def calculate_discounts(df):
|
|
# Anti-pattern: Iterating over DataFrame rows
|
|
discounts = []
|
|
for index, row in df.iterrows():
|
|
if row['total'] > 100:
|
|
discount = row['total'] * 0.1
|
|
else:
|
|
discount = 0
|
|
discounts.append(discount)
|
|
df['discount'] = discounts
|
|
return df
|
|
```
|
|
|
|
### Review Comment
|
|
|
|
**Severity**: Important
|
|
**Category**: Performance
|
|
**File**: data_analysis.py:53
|
|
|
|
Using `iterrows()` on DataFrame - this is one of the slowest operations in pandas. For 10,000 rows, this can be 100x slower than vectorized operations.
|
|
|
|
### After (Vectorized Code)
|
|
|
|
```python
|
|
# data_analysis.py:50
|
|
import pandas as pd
|
|
|
|
def calculate_discounts(df):
|
|
# Vectorized operation - operates on entire column at once
|
|
df['discount'] = (df['total'] * 0.1).where(df['total'] > 100, 0)
|
|
return df
|
|
|
|
# Alternative using numpy where:
|
|
import numpy as np
|
|
|
|
def calculate_discounts(df):
|
|
df['discount'] = np.where(df['total'] > 100, df['total'] * 0.1, 0)
|
|
return df
|
|
```
|
|
|
|
**Performance**: Vectorized operations use optimized C code, achieving 50-100x speedup on large datasets.
|
|
|
|
**Reference**: Pandas Performance Optimization
|
|
|
|
---
|
|
|
|
## Example 8: Testing - Missing Edge Cases
|
|
|
|
### Before (Incomplete Tests)
|
|
|
|
```python
|
|
# test_validators.py:15
|
|
def test_email_validation():
|
|
assert is_valid_email('user@example.com') == True
|
|
assert is_valid_email('invalid-email') == False
|
|
```
|
|
|
|
### Review Comment
|
|
|
|
**Severity**: Important
|
|
**Category**: Testing
|
|
**File**: test_validators.py:15
|
|
|
|
Email validation tests are insufficient. Missing edge cases:
|
|
- Empty string
|
|
- None value
|
|
- Email with special characters
|
|
- Multiple @ symbols
|
|
- Missing domain
|
|
- Whitespace handling
|
|
- Maximum length validation
|
|
|
|
### After (Comprehensive Tests)
|
|
|
|
```python
|
|
# test_validators.py:15
|
|
import pytest
|
|
|
|
@pytest.mark.parametrize('email,expected', [
|
|
# Valid emails
|
|
('user@example.com', True),
|
|
('first.last@example.co.uk', True),
|
|
('user+tag@example.com', True),
|
|
|
|
# Invalid emails
|
|
('invalid-email', False),
|
|
('', False),
|
|
('user@', False),
|
|
('user@@example.com', False),
|
|
('@example.com', False),
|
|
('user @example.com', False),
|
|
('a' * 256 + '@example.com', False), # Too long
|
|
])
|
|
def test_email_validation(email, expected):
|
|
assert is_valid_email(email) == expected
|
|
|
|
def test_email_validation_with_none():
|
|
with pytest.raises(TypeError):
|
|
is_valid_email(None)
|
|
```
|
|
|
|
**Reference**: Testing Best Practices
|
|
|
|
---
|
|
|
|
## Example 9: Architecture - Separation of Concerns (FastAPI)
|
|
|
|
### Before (Tightly Coupled Code)
|
|
|
|
```python
|
|
# main.py:25
|
|
from fastapi import FastAPI
|
|
import psycopg2
|
|
|
|
app = FastAPI()
|
|
|
|
@app.get('/users/{user_id}')
|
|
def get_user(user_id: int):
|
|
# Business logic mixed with data access and presentation
|
|
conn = psycopg2.connect("dbname=mydb user=admin password=secret")
|
|
cursor = conn.cursor()
|
|
cursor.execute(f"SELECT * FROM users WHERE id = {user_id}")
|
|
user = cursor.fetchone()
|
|
conn.close()
|
|
|
|
if user:
|
|
return {'id': user[0], 'name': user[1], 'email': user[2]}
|
|
return {'error': 'User not found'}
|
|
```
|
|
|
|
### Review Comment
|
|
|
|
**Severity**: Important
|
|
**Category**: Architecture
|
|
**File**: main.py:25-38
|
|
|
|
Multiple violations of separation of concerns:
|
|
1. Database connection logic in route handler
|
|
2. SQL injection vulnerability
|
|
3. Hardcoded credentials
|
|
4. No error handling
|
|
5. Manual dict construction
|
|
6. No dependency injection
|
|
|
|
### After (Layered Architecture)
|
|
|
|
```python
|
|
# models.py
|
|
from pydantic import BaseModel
|
|
|
|
class User(BaseModel):
|
|
id: int
|
|
name: str
|
|
email: str
|
|
|
|
# database.py
|
|
from sqlalchemy import create_engine
|
|
from sqlalchemy.orm import sessionmaker
|
|
import os
|
|
|
|
SQLALCHEMY_DATABASE_URL = os.getenv('DATABASE_URL')
|
|
engine = create_engine(SQLALCHEMY_DATABASE_URL)
|
|
SessionLocal = sessionmaker(bind=engine)
|
|
|
|
def get_db():
|
|
db = SessionLocal()
|
|
try:
|
|
yield db
|
|
finally:
|
|
db.close()
|
|
|
|
# repositories.py
|
|
from sqlalchemy.orm import Session
|
|
from . import models
|
|
|
|
class UserRepository:
|
|
def get_by_id(self, db: Session, user_id: int):
|
|
return db.query(models.User).filter(models.User.id == user_id).first()
|
|
|
|
# main.py
|
|
from fastapi import FastAPI, Depends, HTTPException
|
|
from sqlalchemy.orm import Session
|
|
from . import models, database, repositories
|
|
|
|
app = FastAPI()
|
|
user_repo = UserRepository()
|
|
|
|
@app.get('/users/{user_id}', response_model=models.User)
|
|
def get_user(user_id: int, db: Session = Depends(database.get_db)):
|
|
user = user_repo.get_by_id(db, user_id)
|
|
if not user:
|
|
raise HTTPException(status_code=404, detail='User not found')
|
|
return user
|
|
```
|
|
|
|
**Benefits**:
|
|
- Clear separation of concerns
|
|
- Dependency injection
|
|
- Type safety with Pydantic
|
|
- SQL injection protection via ORM
|
|
- Reusable repository pattern
|
|
- Proper error handling
|
|
|
|
**Reference**: FastAPI Best Practices, Repository Pattern
|
|
|
|
---
|
|
|
|
## Summary of Common Issues
|
|
|
|
1. **Security**: SQL injection, XSS, hardcoded credentials, insecure cryptography
|
|
2. **Performance**: N+1 queries, inefficient loops, missing indexes, no caching
|
|
3. **Code Quality**: Mutable defaults, global state, poor naming, missing docstrings
|
|
4. **Style**: PEP 8 violations, inconsistent formatting, magic numbers
|
|
5. **Best Practices**: Missing context managers, no type hints, poor error handling
|
|
6. **Testing**: Insufficient coverage, missing edge cases, no integration tests
|
|
7. **Architecture**: Tight coupling, mixed concerns, no dependency injection
|
|
|
|
Use these examples as reference when conducting reviews. Adapt the feedback style and technical depth to the codebase context.
|