# Python Code Review Examples This file contains example code review scenarios demonstrating common issues and recommended fixes. ## Example 1: Security Vulnerability - SQL Injection ### Before (Vulnerable Code) ```python # user_service.py:15 def get_user_by_email(email): query = f"SELECT * FROM users WHERE email = '{email}'" cursor.execute(query) return cursor.fetchone() ``` ### Review Comment **Severity**: Critical **Category**: Security **File**: user_service.py:16 SQL injection vulnerability detected. User input is directly interpolated into SQL query, allowing attackers to execute arbitrary SQL commands. **Attack example**: ```python email = "'; DROP TABLE users; --" # Results in: SELECT * FROM users WHERE email = ''; DROP TABLE users; --' ``` ### After (Fixed Code) ```python # user_service.py:15 def get_user_by_email(email): query = "SELECT * FROM users WHERE email = %s" cursor.execute(query, (email,)) return cursor.fetchone() ``` **Reference**: OWASP A03:2021 - Injection --- ## Example 2: Performance Issue - N+1 Query Problem (Django) ### Before (Inefficient Code) ```python # views.py:45 def get_posts_with_authors(request): posts = Post.objects.all() # 1 query result = [] for post in posts: result.append({ 'title': post.title, 'author': post.author.name # N additional queries! }) return JsonResponse(result, safe=False) ``` ### Review Comment **Severity**: Important **Category**: Performance **File**: views.py:48 N+1 query problem detected. For 100 posts, this executes 101 database queries (1 for posts + 100 for authors). This causes severe performance degradation under load. ### After (Optimized Code) ```python # views.py:45 def get_posts_with_authors(request): posts = Post.objects.select_related('author').all() # 1 query with JOIN result = [] for post in posts: result.append({ 'title': post.title, 'author': post.author.name }) return JsonResponse(result, safe=False) ``` **Performance gain**: 101 queries → 1 query (100x improvement for 100 posts) **Reference**: Django QuerySet optimization --- ## Example 3: Code Quality - Mutable Default Argument ### Before (Buggy Code) ```python # utils.py:22 def add_item(item, items=[]): items.append(item) return items # Usage that reveals the bug: list1 = add_item('a') # ['a'] list2 = add_item('b') # ['a', 'b'] - UNEXPECTED! ``` ### Review Comment **Severity**: Important **Category**: Code Quality **File**: utils.py:22 Mutable default argument antipattern. The default list `[]` is created once when the function is defined, not each time it's called. All invocations share the same list object, causing unexpected state persistence. ### After (Fixed Code) ```python # utils.py:22 def add_item(item, items=None): if items is None: items = [] items.append(item) return items # Now works correctly: list1 = add_item('a') # ['a'] list2 = add_item('b') # ['b'] - CORRECT! ``` **Reference**: Common Python Gotchas --- ## Example 4: PEP 8 Compliance - Naming Conventions ### Before (Non-compliant Code) ```python # data_processor.py:10 def CalculateUserAge(BirthDate): CurrentYear = 2025 user_birth_year = BirthDate.year AGE = CurrentYear - user_birth_year return AGE ``` ### Review Comment **Severity**: Minor **Category**: Style **File**: data_processor.py:10-15 Multiple PEP 8 naming violations: - Function name should be `snake_case`, not `PascalCase` - Parameter name should be `snake_case`, not `PascalCase` - Local variables should be lowercase, not mixed case or UPPERCASE - UPPERCASE is reserved for constants ### After (Compliant Code) ```python # data_processor.py:10 def calculate_user_age(birth_date): current_year = 2025 user_birth_year = birth_date.year age = current_year - user_birth_year return age ``` **Reference**: PEP 8 - Naming Conventions --- ## Example 5: Best Practice - Context Manager for Resource Handling ### Before (Resource Leak Risk) ```python # file_processor.py:30 def process_log_file(filepath): file = open(filepath, 'r') data = file.read() results = analyze(data) file.close() # May not execute if analyze() raises exception return results ``` ### Review Comment **Severity**: Important **Category**: Best Practices **File**: file_processor.py:31 Missing context manager for file handling. If `analyze()` raises an exception, `file.close()` never executes, leaving the file handle open (resource leak). ### After (Safe Code) ```python # file_processor.py:30 def process_log_file(filepath): with open(filepath, 'r') as file: data = file.read() results = analyze(data) # File automatically closed even if exception occurs return results ``` **Bonus improvement**: ```python # Even better with pathlib from pathlib import Path def process_log_file(filepath): data = Path(filepath).read_text() return analyze(data) ``` **Reference**: PEP 343 - The "with" Statement --- ## Example 6: Security - Hardcoded Credentials ### Before (Security Risk) ```python # config.py:5 DATABASE_CONFIG = { 'host': 'prod-db.example.com', 'user': 'admin', 'password': 'SuperSecret123!', # NEVER do this 'database': 'production' } ``` ### Review Comment **Severity**: Critical **Category**: Security **File**: config.py:8 Hardcoded credentials detected. Passwords in source code: 1. Are visible to anyone with repository access 2. Get committed to version control history 3. Can't be rotated without code changes 4. May be exposed in logs or error messages ### After (Secure Code) ```python # config.py:5 import os DATABASE_CONFIG = { 'host': os.getenv('DB_HOST', 'localhost'), 'user': os.getenv('DB_USER'), 'password': os.getenv('DB_PASSWORD'), 'database': os.getenv('DB_NAME', 'production') } # Validate required environment variables required_vars = ['DB_USER', 'DB_PASSWORD'] missing = [var for var in required_vars if not os.getenv(var)] if missing: raise RuntimeError(f"Missing required environment variables: {missing}") ``` **Additional security**: ```bash # Use environment files (not committed to git) echo "DB_PASSWORD=..." > .env echo ".env" >> .gitignore ``` **Reference**: OWASP A07:2021 - Identification and Authentication Failures --- ## Example 7: Performance - Pandas Optimization ### Before (Inefficient Code) ```python # data_analysis.py:50 import pandas as pd def calculate_discounts(df): # Anti-pattern: Iterating over DataFrame rows discounts = [] for index, row in df.iterrows(): if row['total'] > 100: discount = row['total'] * 0.1 else: discount = 0 discounts.append(discount) df['discount'] = discounts return df ``` ### Review Comment **Severity**: Important **Category**: Performance **File**: data_analysis.py:53 Using `iterrows()` on DataFrame - this is one of the slowest operations in pandas. For 10,000 rows, this can be 100x slower than vectorized operations. ### After (Vectorized Code) ```python # data_analysis.py:50 import pandas as pd def calculate_discounts(df): # Vectorized operation - operates on entire column at once df['discount'] = (df['total'] * 0.1).where(df['total'] > 100, 0) return df # Alternative using numpy where: import numpy as np def calculate_discounts(df): df['discount'] = np.where(df['total'] > 100, df['total'] * 0.1, 0) return df ``` **Performance**: Vectorized operations use optimized C code, achieving 50-100x speedup on large datasets. **Reference**: Pandas Performance Optimization --- ## Example 8: Testing - Missing Edge Cases ### Before (Incomplete Tests) ```python # test_validators.py:15 def test_email_validation(): assert is_valid_email('user@example.com') == True assert is_valid_email('invalid-email') == False ``` ### Review Comment **Severity**: Important **Category**: Testing **File**: test_validators.py:15 Email validation tests are insufficient. Missing edge cases: - Empty string - None value - Email with special characters - Multiple @ symbols - Missing domain - Whitespace handling - Maximum length validation ### After (Comprehensive Tests) ```python # test_validators.py:15 import pytest @pytest.mark.parametrize('email,expected', [ # Valid emails ('user@example.com', True), ('first.last@example.co.uk', True), ('user+tag@example.com', True), # Invalid emails ('invalid-email', False), ('', False), ('user@', False), ('user@@example.com', False), ('@example.com', False), ('user @example.com', False), ('a' * 256 + '@example.com', False), # Too long ]) def test_email_validation(email, expected): assert is_valid_email(email) == expected def test_email_validation_with_none(): with pytest.raises(TypeError): is_valid_email(None) ``` **Reference**: Testing Best Practices --- ## Example 9: Architecture - Separation of Concerns (FastAPI) ### Before (Tightly Coupled Code) ```python # main.py:25 from fastapi import FastAPI import psycopg2 app = FastAPI() @app.get('/users/{user_id}') def get_user(user_id: int): # Business logic mixed with data access and presentation conn = psycopg2.connect("dbname=mydb user=admin password=secret") cursor = conn.cursor() cursor.execute(f"SELECT * FROM users WHERE id = {user_id}") user = cursor.fetchone() conn.close() if user: return {'id': user[0], 'name': user[1], 'email': user[2]} return {'error': 'User not found'} ``` ### Review Comment **Severity**: Important **Category**: Architecture **File**: main.py:25-38 Multiple violations of separation of concerns: 1. Database connection logic in route handler 2. SQL injection vulnerability 3. Hardcoded credentials 4. No error handling 5. Manual dict construction 6. No dependency injection ### After (Layered Architecture) ```python # models.py from pydantic import BaseModel class User(BaseModel): id: int name: str email: str # database.py from sqlalchemy import create_engine from sqlalchemy.orm import sessionmaker import os SQLALCHEMY_DATABASE_URL = os.getenv('DATABASE_URL') engine = create_engine(SQLALCHEMY_DATABASE_URL) SessionLocal = sessionmaker(bind=engine) def get_db(): db = SessionLocal() try: yield db finally: db.close() # repositories.py from sqlalchemy.orm import Session from . import models class UserRepository: def get_by_id(self, db: Session, user_id: int): return db.query(models.User).filter(models.User.id == user_id).first() # main.py from fastapi import FastAPI, Depends, HTTPException from sqlalchemy.orm import Session from . import models, database, repositories app = FastAPI() user_repo = UserRepository() @app.get('/users/{user_id}', response_model=models.User) def get_user(user_id: int, db: Session = Depends(database.get_db)): user = user_repo.get_by_id(db, user_id) if not user: raise HTTPException(status_code=404, detail='User not found') return user ``` **Benefits**: - Clear separation of concerns - Dependency injection - Type safety with Pydantic - SQL injection protection via ORM - Reusable repository pattern - Proper error handling **Reference**: FastAPI Best Practices, Repository Pattern --- ## Summary of Common Issues 1. **Security**: SQL injection, XSS, hardcoded credentials, insecure cryptography 2. **Performance**: N+1 queries, inefficient loops, missing indexes, no caching 3. **Code Quality**: Mutable defaults, global state, poor naming, missing docstrings 4. **Style**: PEP 8 violations, inconsistent formatting, magic numbers 5. **Best Practices**: Missing context managers, no type hints, poor error handling 6. **Testing**: Insufficient coverage, missing edge cases, no integration tests 7. **Architecture**: Tight coupling, mixed concerns, no dependency injection Use these examples as reference when conducting reviews. Adapt the feedback style and technical depth to the codebase context.