zhongwei/gh-olino3-forge-forge-plugin

Fork 0

Files

Zhongwei Li 42c0b6ee81 Initial commit

2025-11-30 08:45:11 +08:00

12 KiB

Raw Permalink Blame History

Python Code Review Examples

This file contains example code review scenarios demonstrating common issues and recommended fixes.

Example 1: Security Vulnerability - SQL Injection

Before (Vulnerable Code)

# user_service.py:15
def get_user_by_email(email):
    query = f"SELECT * FROM users WHERE email = '{email}'"
    cursor.execute(query)
    return cursor.fetchone()

Review Comment

Severity: Critical Category: Security File: user_service.py:16

SQL injection vulnerability detected. User input is directly interpolated into SQL query, allowing attackers to execute arbitrary SQL commands.

Attack example:

email = "'; DROP TABLE users; --"
# Results in: SELECT * FROM users WHERE email = ''; DROP TABLE users; --'

After (Fixed Code)

# user_service.py:15
def get_user_by_email(email):
    query = "SELECT * FROM users WHERE email = %s"
    cursor.execute(query, (email,))
    return cursor.fetchone()

Reference: OWASP A03:2021 - Injection

Example 2: Performance Issue - N+1 Query Problem (Django)

Before (Inefficient Code)

# views.py:45
def get_posts_with_authors(request):
    posts = Post.objects.all()  # 1 query
    result = []
    for post in posts:
        result.append({
            'title': post.title,
            'author': post.author.name  # N additional queries!
        })
    return JsonResponse(result, safe=False)

Review Comment

Severity: Important Category: Performance File: views.py:48

N+1 query problem detected. For 100 posts, this executes 101 database queries (1 for posts + 100 for authors). This causes severe performance degradation under load.

After (Optimized Code)

# views.py:45
def get_posts_with_authors(request):
    posts = Post.objects.select_related('author').all()  # 1 query with JOIN
    result = []
    for post in posts:
        result.append({
            'title': post.title,
            'author': post.author.name
        })
    return JsonResponse(result, safe=False)

Performance gain: 101 queries → 1 query (100x improvement for 100 posts)

Reference: Django QuerySet optimization

Example 3: Code Quality - Mutable Default Argument

Before (Buggy Code)

# utils.py:22
def add_item(item, items=[]):
    items.append(item)
    return items

# Usage that reveals the bug:
list1 = add_item('a')  # ['a']
list2 = add_item('b')  # ['a', 'b'] - UNEXPECTED!

Review Comment

Severity: Important Category: Code Quality File: utils.py:22

Mutable default argument antipattern. The default list [] is created once when the function is defined, not each time it's called. All invocations share the same list object, causing unexpected state persistence.

After (Fixed Code)

# utils.py:22
def add_item(item, items=None):
    if items is None:
        items = []
    items.append(item)
    return items

# Now works correctly:
list1 = add_item('a')  # ['a']
list2 = add_item('b')  # ['b'] - CORRECT!

Reference: Common Python Gotchas

Example 4: PEP 8 Compliance - Naming Conventions

Before (Non-compliant Code)

# data_processor.py:10
def CalculateUserAge(BirthDate):
    CurrentYear = 2025
    user_birth_year = BirthDate.year
    AGE = CurrentYear - user_birth_year
    return AGE

Review Comment

Severity: Minor Category: Style File: data_processor.py:10-15

Multiple PEP 8 naming violations:

Function name should be snake_case, not PascalCase
Parameter name should be snake_case, not PascalCase
Local variables should be lowercase, not mixed case or UPPERCASE
UPPERCASE is reserved for constants

After (Compliant Code)

# data_processor.py:10
def calculate_user_age(birth_date):
    current_year = 2025
    user_birth_year = birth_date.year
    age = current_year - user_birth_year
    return age

Reference: PEP 8 - Naming Conventions

Example 5: Best Practice - Context Manager for Resource Handling

Before (Resource Leak Risk)

# file_processor.py:30
def process_log_file(filepath):
    file = open(filepath, 'r')
    data = file.read()
    results = analyze(data)
    file.close()  # May not execute if analyze() raises exception
    return results

Review Comment

Severity: Important Category: Best Practices File: file_processor.py:31

Missing context manager for file handling. If analyze() raises an exception, file.close() never executes, leaving the file handle open (resource leak).

After (Safe Code)

# file_processor.py:30
def process_log_file(filepath):
    with open(filepath, 'r') as file:
        data = file.read()
        results = analyze(data)
    # File automatically closed even if exception occurs
    return results

Bonus improvement:

# Even better with pathlib
from pathlib import Path

def process_log_file(filepath):
    data = Path(filepath).read_text()
    return analyze(data)

Reference: PEP 343 - The "with" Statement

Example 6: Security - Hardcoded Credentials

Before (Security Risk)

# config.py:5
DATABASE_CONFIG = {
    'host': 'prod-db.example.com',
    'user': 'admin',
    'password': 'SuperSecret123!',  # NEVER do this
    'database': 'production'
}

Review Comment

Severity: Critical Category: Security File: config.py:8

Hardcoded credentials detected. Passwords in source code:

Are visible to anyone with repository access
Get committed to version control history
Can't be rotated without code changes
May be exposed in logs or error messages

After (Secure Code)

# config.py:5
import os

DATABASE_CONFIG = {
    'host': os.getenv('DB_HOST', 'localhost'),
    'user': os.getenv('DB_USER'),
    'password': os.getenv('DB_PASSWORD'),
    'database': os.getenv('DB_NAME', 'production')
}

# Validate required environment variables
required_vars = ['DB_USER', 'DB_PASSWORD']
missing = [var for var in required_vars if not os.getenv(var)]
if missing:
    raise RuntimeError(f"Missing required environment variables: {missing}")

Additional security:

# Use environment files (not committed to git)
echo "DB_PASSWORD=..." > .env
echo ".env" >> .gitignore

Reference: OWASP A07:2021 - Identification and Authentication Failures

Example 7: Performance - Pandas Optimization

Before (Inefficient Code)

# data_analysis.py:50
import pandas as pd

def calculate_discounts(df):
    # Anti-pattern: Iterating over DataFrame rows
    discounts = []
    for index, row in df.iterrows():
        if row['total'] > 100:
            discount = row['total'] * 0.1
        else:
            discount = 0
        discounts.append(discount)
    df['discount'] = discounts
    return df

Review Comment

Severity: Important Category: Performance File: data_analysis.py:53

Using iterrows() on DataFrame - this is one of the slowest operations in pandas. For 10,000 rows, this can be 100x slower than vectorized operations.

After (Vectorized Code)

# data_analysis.py:50
import pandas as pd

def calculate_discounts(df):
    # Vectorized operation - operates on entire column at once
    df['discount'] = (df['total'] * 0.1).where(df['total'] > 100, 0)
    return df

# Alternative using numpy where:
import numpy as np

def calculate_discounts(df):
    df['discount'] = np.where(df['total'] > 100, df['total'] * 0.1, 0)
    return df

Performance: Vectorized operations use optimized C code, achieving 50-100x speedup on large datasets.

Reference: Pandas Performance Optimization

Example 8: Testing - Missing Edge Cases

Before (Incomplete Tests)

# test_validators.py:15
def test_email_validation():
    assert is_valid_email('user@example.com') == True
    assert is_valid_email('invalid-email') == False

Review Comment

Severity: Important Category: Testing File: test_validators.py:15

Email validation tests are insufficient. Missing edge cases:

Empty string
None value
Email with special characters
Multiple @ symbols
Missing domain
Whitespace handling
Maximum length validation

After (Comprehensive Tests)

# test_validators.py:15
import pytest

@pytest.mark.parametrize('email,expected', [
    # Valid emails
    ('user@example.com', True),
    ('first.last@example.co.uk', True),
    ('user+tag@example.com', True),

    # Invalid emails
    ('invalid-email', False),
    ('', False),
    ('user@', False),
    ('user@@example.com', False),
    ('@example.com', False),
    ('user @example.com', False),
    ('a' * 256 + '@example.com', False),  # Too long
])
def test_email_validation(email, expected):
    assert is_valid_email(email) == expected

def test_email_validation_with_none():
    with pytest.raises(TypeError):
        is_valid_email(None)

Reference: Testing Best Practices

Example 9: Architecture - Separation of Concerns (FastAPI)

Before (Tightly Coupled Code)

# main.py:25
from fastapi import FastAPI
import psycopg2

app = FastAPI()

@app.get('/users/{user_id}')
def get_user(user_id: int):
    # Business logic mixed with data access and presentation
    conn = psycopg2.connect("dbname=mydb user=admin password=secret")
    cursor = conn.cursor()
    cursor.execute(f"SELECT * FROM users WHERE id = {user_id}")
    user = cursor.fetchone()
    conn.close()

    if user:
        return {'id': user[0], 'name': user[1], 'email': user[2]}
    return {'error': 'User not found'}

Review Comment

Severity: Important Category: Architecture File: main.py:25-38

Multiple violations of separation of concerns:

Database connection logic in route handler
SQL injection vulnerability
Hardcoded credentials
No error handling
Manual dict construction
No dependency injection

After (Layered Architecture)

# models.py
from pydantic import BaseModel

class User(BaseModel):
    id: int
    name: str
    email: str

# database.py
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
import os

SQLALCHEMY_DATABASE_URL = os.getenv('DATABASE_URL')
engine = create_engine(SQLALCHEMY_DATABASE_URL)
SessionLocal = sessionmaker(bind=engine)

def get_db():
    db = SessionLocal()
    try:
        yield db
    finally:
        db.close()

# repositories.py
from sqlalchemy.orm import Session
from . import models

class UserRepository:
    def get_by_id(self, db: Session, user_id: int):
        return db.query(models.User).filter(models.User.id == user_id).first()

# main.py
from fastapi import FastAPI, Depends, HTTPException
from sqlalchemy.orm import Session
from . import models, database, repositories

app = FastAPI()
user_repo = UserRepository()

@app.get('/users/{user_id}', response_model=models.User)
def get_user(user_id: int, db: Session = Depends(database.get_db)):
    user = user_repo.get_by_id(db, user_id)
    if not user:
        raise HTTPException(status_code=404, detail='User not found')
    return user

Benefits:

Clear separation of concerns
Dependency injection
Type safety with Pydantic
SQL injection protection via ORM
Reusable repository pattern
Proper error handling

Reference: FastAPI Best Practices, Repository Pattern

Summary of Common Issues

Security: SQL injection, XSS, hardcoded credentials, insecure cryptography
Performance: N+1 queries, inefficient loops, missing indexes, no caching
Code Quality: Mutable defaults, global state, poor naming, missing docstrings
Style: PEP 8 violations, inconsistent formatting, magic numbers
Best Practices: Missing context managers, no type hints, poor error handling
Testing: Insufficient coverage, missing edge cases, no integration tests
Architecture: Tight coupling, mixed concerns, no dependency injection

Use these examples as reference when conducting reviews. Adapt the feedback style and technical depth to the codebase context.

12 KiB Raw Permalink Blame History

Python Code Review Examples

Example 1: Security Vulnerability - SQL Injection

Before (Vulnerable Code)

Review Comment

After (Fixed Code)

Example 2: Performance Issue - N+1 Query Problem (Django)

Before (Inefficient Code)

Review Comment

After (Optimized Code)

Example 3: Code Quality - Mutable Default Argument

Before (Buggy Code)

Review Comment

After (Fixed Code)

Example 4: PEP 8 Compliance - Naming Conventions

Before (Non-compliant Code)

Review Comment

After (Compliant Code)

Example 5: Best Practice - Context Manager for Resource Handling

Before (Resource Leak Risk)

Review Comment

After (Safe Code)

Example 6: Security - Hardcoded Credentials

Before (Security Risk)

Review Comment

After (Secure Code)

Example 7: Performance - Pandas Optimization

Before (Inefficient Code)

Review Comment

After (Vectorized Code)

Example 8: Testing - Missing Edge Cases

Before (Incomplete Tests)

Review Comment

After (Comprehensive Tests)

Example 9: Architecture - Separation of Concerns (FastAPI)

Before (Tightly Coupled Code)

Review Comment

After (Layered Architecture)

Summary of Common Issues

12 KiB

Raw Permalink Blame History