Files
2025-11-30 08:45:11 +08:00

12 KiB

Python Code Review Examples

This file contains example code review scenarios demonstrating common issues and recommended fixes.

Example 1: Security Vulnerability - SQL Injection

Before (Vulnerable Code)

# user_service.py:15
def get_user_by_email(email):
    query = f"SELECT * FROM users WHERE email = '{email}'"
    cursor.execute(query)
    return cursor.fetchone()

Review Comment

Severity: Critical Category: Security File: user_service.py:16

SQL injection vulnerability detected. User input is directly interpolated into SQL query, allowing attackers to execute arbitrary SQL commands.

Attack example:

email = "'; DROP TABLE users; --"
# Results in: SELECT * FROM users WHERE email = ''; DROP TABLE users; --'

After (Fixed Code)

# user_service.py:15
def get_user_by_email(email):
    query = "SELECT * FROM users WHERE email = %s"
    cursor.execute(query, (email,))
    return cursor.fetchone()

Reference: OWASP A03:2021 - Injection


Example 2: Performance Issue - N+1 Query Problem (Django)

Before (Inefficient Code)

# views.py:45
def get_posts_with_authors(request):
    posts = Post.objects.all()  # 1 query
    result = []
    for post in posts:
        result.append({
            'title': post.title,
            'author': post.author.name  # N additional queries!
        })
    return JsonResponse(result, safe=False)

Review Comment

Severity: Important Category: Performance File: views.py:48

N+1 query problem detected. For 100 posts, this executes 101 database queries (1 for posts + 100 for authors). This causes severe performance degradation under load.

After (Optimized Code)

# views.py:45
def get_posts_with_authors(request):
    posts = Post.objects.select_related('author').all()  # 1 query with JOIN
    result = []
    for post in posts:
        result.append({
            'title': post.title,
            'author': post.author.name
        })
    return JsonResponse(result, safe=False)

Performance gain: 101 queries → 1 query (100x improvement for 100 posts)

Reference: Django QuerySet optimization


Example 3: Code Quality - Mutable Default Argument

Before (Buggy Code)

# utils.py:22
def add_item(item, items=[]):
    items.append(item)
    return items

# Usage that reveals the bug:
list1 = add_item('a')  # ['a']
list2 = add_item('b')  # ['a', 'b'] - UNEXPECTED!

Review Comment

Severity: Important Category: Code Quality File: utils.py:22

Mutable default argument antipattern. The default list [] is created once when the function is defined, not each time it's called. All invocations share the same list object, causing unexpected state persistence.

After (Fixed Code)

# utils.py:22
def add_item(item, items=None):
    if items is None:
        items = []
    items.append(item)
    return items

# Now works correctly:
list1 = add_item('a')  # ['a']
list2 = add_item('b')  # ['b'] - CORRECT!

Reference: Common Python Gotchas


Example 4: PEP 8 Compliance - Naming Conventions

Before (Non-compliant Code)

# data_processor.py:10
def CalculateUserAge(BirthDate):
    CurrentYear = 2025
    user_birth_year = BirthDate.year
    AGE = CurrentYear - user_birth_year
    return AGE

Review Comment

Severity: Minor Category: Style File: data_processor.py:10-15

Multiple PEP 8 naming violations:

  • Function name should be snake_case, not PascalCase
  • Parameter name should be snake_case, not PascalCase
  • Local variables should be lowercase, not mixed case or UPPERCASE
  • UPPERCASE is reserved for constants

After (Compliant Code)

# data_processor.py:10
def calculate_user_age(birth_date):
    current_year = 2025
    user_birth_year = birth_date.year
    age = current_year - user_birth_year
    return age

Reference: PEP 8 - Naming Conventions


Example 5: Best Practice - Context Manager for Resource Handling

Before (Resource Leak Risk)

# file_processor.py:30
def process_log_file(filepath):
    file = open(filepath, 'r')
    data = file.read()
    results = analyze(data)
    file.close()  # May not execute if analyze() raises exception
    return results

Review Comment

Severity: Important Category: Best Practices File: file_processor.py:31

Missing context manager for file handling. If analyze() raises an exception, file.close() never executes, leaving the file handle open (resource leak).

After (Safe Code)

# file_processor.py:30
def process_log_file(filepath):
    with open(filepath, 'r') as file:
        data = file.read()
        results = analyze(data)
    # File automatically closed even if exception occurs
    return results

Bonus improvement:

# Even better with pathlib
from pathlib import Path

def process_log_file(filepath):
    data = Path(filepath).read_text()
    return analyze(data)

Reference: PEP 343 - The "with" Statement


Example 6: Security - Hardcoded Credentials

Before (Security Risk)

# config.py:5
DATABASE_CONFIG = {
    'host': 'prod-db.example.com',
    'user': 'admin',
    'password': 'SuperSecret123!',  # NEVER do this
    'database': 'production'
}

Review Comment

Severity: Critical Category: Security File: config.py:8

Hardcoded credentials detected. Passwords in source code:

  1. Are visible to anyone with repository access
  2. Get committed to version control history
  3. Can't be rotated without code changes
  4. May be exposed in logs or error messages

After (Secure Code)

# config.py:5
import os

DATABASE_CONFIG = {
    'host': os.getenv('DB_HOST', 'localhost'),
    'user': os.getenv('DB_USER'),
    'password': os.getenv('DB_PASSWORD'),
    'database': os.getenv('DB_NAME', 'production')
}

# Validate required environment variables
required_vars = ['DB_USER', 'DB_PASSWORD']
missing = [var for var in required_vars if not os.getenv(var)]
if missing:
    raise RuntimeError(f"Missing required environment variables: {missing}")

Additional security:

# Use environment files (not committed to git)
echo "DB_PASSWORD=..." > .env
echo ".env" >> .gitignore

Reference: OWASP A07:2021 - Identification and Authentication Failures


Example 7: Performance - Pandas Optimization

Before (Inefficient Code)

# data_analysis.py:50
import pandas as pd

def calculate_discounts(df):
    # Anti-pattern: Iterating over DataFrame rows
    discounts = []
    for index, row in df.iterrows():
        if row['total'] > 100:
            discount = row['total'] * 0.1
        else:
            discount = 0
        discounts.append(discount)
    df['discount'] = discounts
    return df

Review Comment

Severity: Important Category: Performance File: data_analysis.py:53

Using iterrows() on DataFrame - this is one of the slowest operations in pandas. For 10,000 rows, this can be 100x slower than vectorized operations.

After (Vectorized Code)

# data_analysis.py:50
import pandas as pd

def calculate_discounts(df):
    # Vectorized operation - operates on entire column at once
    df['discount'] = (df['total'] * 0.1).where(df['total'] > 100, 0)
    return df

# Alternative using numpy where:
import numpy as np

def calculate_discounts(df):
    df['discount'] = np.where(df['total'] > 100, df['total'] * 0.1, 0)
    return df

Performance: Vectorized operations use optimized C code, achieving 50-100x speedup on large datasets.

Reference: Pandas Performance Optimization


Example 8: Testing - Missing Edge Cases

Before (Incomplete Tests)

# test_validators.py:15
def test_email_validation():
    assert is_valid_email('user@example.com') == True
    assert is_valid_email('invalid-email') == False

Review Comment

Severity: Important Category: Testing File: test_validators.py:15

Email validation tests are insufficient. Missing edge cases:

  • Empty string
  • None value
  • Email with special characters
  • Multiple @ symbols
  • Missing domain
  • Whitespace handling
  • Maximum length validation

After (Comprehensive Tests)

# test_validators.py:15
import pytest

@pytest.mark.parametrize('email,expected', [
    # Valid emails
    ('user@example.com', True),
    ('first.last@example.co.uk', True),
    ('user+tag@example.com', True),

    # Invalid emails
    ('invalid-email', False),
    ('', False),
    ('user@', False),
    ('user@@example.com', False),
    ('@example.com', False),
    ('user @example.com', False),
    ('a' * 256 + '@example.com', False),  # Too long
])
def test_email_validation(email, expected):
    assert is_valid_email(email) == expected

def test_email_validation_with_none():
    with pytest.raises(TypeError):
        is_valid_email(None)

Reference: Testing Best Practices


Example 9: Architecture - Separation of Concerns (FastAPI)

Before (Tightly Coupled Code)

# main.py:25
from fastapi import FastAPI
import psycopg2

app = FastAPI()

@app.get('/users/{user_id}')
def get_user(user_id: int):
    # Business logic mixed with data access and presentation
    conn = psycopg2.connect("dbname=mydb user=admin password=secret")
    cursor = conn.cursor()
    cursor.execute(f"SELECT * FROM users WHERE id = {user_id}")
    user = cursor.fetchone()
    conn.close()

    if user:
        return {'id': user[0], 'name': user[1], 'email': user[2]}
    return {'error': 'User not found'}

Review Comment

Severity: Important Category: Architecture File: main.py:25-38

Multiple violations of separation of concerns:

  1. Database connection logic in route handler
  2. SQL injection vulnerability
  3. Hardcoded credentials
  4. No error handling
  5. Manual dict construction
  6. No dependency injection

After (Layered Architecture)

# models.py
from pydantic import BaseModel

class User(BaseModel):
    id: int
    name: str
    email: str

# database.py
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
import os

SQLALCHEMY_DATABASE_URL = os.getenv('DATABASE_URL')
engine = create_engine(SQLALCHEMY_DATABASE_URL)
SessionLocal = sessionmaker(bind=engine)

def get_db():
    db = SessionLocal()
    try:
        yield db
    finally:
        db.close()

# repositories.py
from sqlalchemy.orm import Session
from . import models

class UserRepository:
    def get_by_id(self, db: Session, user_id: int):
        return db.query(models.User).filter(models.User.id == user_id).first()

# main.py
from fastapi import FastAPI, Depends, HTTPException
from sqlalchemy.orm import Session
from . import models, database, repositories

app = FastAPI()
user_repo = UserRepository()

@app.get('/users/{user_id}', response_model=models.User)
def get_user(user_id: int, db: Session = Depends(database.get_db)):
    user = user_repo.get_by_id(db, user_id)
    if not user:
        raise HTTPException(status_code=404, detail='User not found')
    return user

Benefits:

  • Clear separation of concerns
  • Dependency injection
  • Type safety with Pydantic
  • SQL injection protection via ORM
  • Reusable repository pattern
  • Proper error handling

Reference: FastAPI Best Practices, Repository Pattern


Summary of Common Issues

  1. Security: SQL injection, XSS, hardcoded credentials, insecure cryptography
  2. Performance: N+1 queries, inefficient loops, missing indexes, no caching
  3. Code Quality: Mutable defaults, global state, poor naming, missing docstrings
  4. Style: PEP 8 violations, inconsistent formatting, magic numbers
  5. Best Practices: Missing context managers, no type hints, poor error handling
  6. Testing: Insufficient coverage, missing edge cases, no integration tests
  7. Architecture: Tight coupling, mixed concerns, no dependency injection

Use these examples as reference when conducting reviews. Adapt the feedback style and technical depth to the codebase context.