Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 17:51:02 +08:00
commit ff1f4bd119
252 changed files with 72682 additions and 0 deletions

View File

@@ -0,0 +1,300 @@
# OWASP Top 10 to CWE Mapping with Semgrep Rules
## Table of Contents
- [A01:2021 - Broken Access Control](#a012021---broken-access-control)
- [A02:2021 - Cryptographic Failures](#a022021---cryptographic-failures)
- [A03:2021 - Injection](#a032021---injection)
- [A04:2021 - Insecure Design](#a042021---insecure-design)
- [A05:2021 - Security Misconfiguration](#a052021---security-misconfiguration)
- [A06:2021 - Vulnerable and Outdated Components](#a062021---vulnerable-and-outdated-components)
- [A07:2021 - Identification and Authentication Failures](#a072021---identification-and-authentication-failures)
- [A08:2021 - Software and Data Integrity Failures](#a082021---software-and-data-integrity-failures)
- [A09:2021 - Security Logging and Monitoring Failures](#a092021---security-logging-and-monitoring-failures)
- [A10:2021 - Server-Side Request Forgery (SSRF)](#a102021---server-side-request-forgery-ssrf)
## A01:2021 - Broken Access Control
### CWE Mappings
- CWE-22: Path Traversal
- CWE-23: Relative Path Traversal
- CWE-35: Path Traversal
- CWE-352: Cross-Site Request Forgery (CSRF)
- CWE-434: Unrestricted Upload of Dangerous File Type
- CWE-639: Authorization Bypass Through User-Controlled Key
- CWE-918: Server-Side Request Forgery (SSRF)
### Semgrep Rules
```bash
# Path traversal detection
semgrep --config "r/python.lang.security.audit.path-traversal"
# Missing authorization checks
semgrep --config "r/generic.secrets.security.detected-generic-secret"
# CSRF protection
semgrep --config "r/javascript.express.security.audit.express-check-csurf-middleware-usage"
```
### Detection Patterns
- Unrestricted file access using user input
- Missing or improper authorization checks
- Insecure direct object references (IDOR)
- Elevation of privilege vulnerabilities
## A02:2021 - Cryptographic Failures
### CWE Mappings
- CWE-259: Use of Hard-coded Password
- CWE-326: Inadequate Encryption Strength
- CWE-327: Use of Broken/Risky Crypto Algorithm
- CWE-328: Reversible One-Way Hash
- CWE-330: Use of Insufficiently Random Values
- CWE-780: Use of RSA Without OAEP
### Semgrep Rules
```bash
# Weak crypto algorithms
semgrep --config "p/crypto"
# Hard-coded secrets
semgrep --config "p/secrets"
# Insecure random
semgrep --config "r/python.lang.security.audit.insecure-random"
```
### Detection Patterns
- Use of MD5, SHA1 for cryptographic purposes
- Hard-coded passwords, API keys, tokens
- Weak encryption algorithms (DES, RC4)
- Insecure random number generation
## A03:2021 - Injection
### CWE Mappings
- CWE-79: Cross-site Scripting (XSS)
- CWE-89: SQL Injection
- CWE-95: Improper Neutralization of Directives in Dynamically Evaluated Code (eval injection)
- CWE-917: Expression Language Injection
- CWE-943: Improper Neutralization of Special Elements in Data Query Logic
### Semgrep Rules
```bash
# SQL Injection
semgrep --config "r/python.django.security.injection.sql"
semgrep --config "r/javascript.sequelize.security.audit.sequelize-injection"
# XSS
semgrep --config "r/javascript.express.security.audit.xss"
semgrep --config "r/python.flask.security.audit.template-xss"
# Command Injection
semgrep --config "r/python.lang.security.audit.dangerous-subprocess-use"
# Code Injection
semgrep --config "r/python.lang.security.audit.exec-used"
semgrep --config "r/javascript.lang.security.audit.eval-detected"
```
### Detection Patterns
- Unsafe SQL query construction
- Unescaped user input in HTML context
- OS command execution with user input
- Use of eval() or similar dynamic code execution
## A04:2021 - Insecure Design
### CWE Mappings
- CWE-209: Generation of Error Message with Sensitive Information
- CWE-256: Unprotected Storage of Credentials
- CWE-501: Trust Boundary Violation
- CWE-522: Insufficiently Protected Credentials
### Semgrep Rules
```bash
# Information disclosure
semgrep --config "r/python.flask.security.audit.debug-enabled"
# Missing security controls
semgrep --config "p/security-audit"
```
### Detection Patterns
- Debug mode enabled in production
- Verbose error messages exposing internals
- Missing rate limiting
- Insecure default configurations
## A05:2021 - Security Misconfiguration
### CWE Mappings
- CWE-16: Configuration
- CWE-611: Improper Restriction of XML External Entity Reference
- CWE-614: Sensitive Cookie in HTTPS Session Without 'Secure' Attribute
- CWE-756: Missing Custom Error Page
- CWE-776: Improper Restriction of Recursive Entity References in DTDs
### Semgrep Rules
```bash
# XXE vulnerabilities
semgrep --config "r/python.lang.security.audit.avoid-lxml-in-xml-parsing"
# Insecure cookie settings
semgrep --config "r/javascript.express.security.audit.express-cookie-settings"
# CORS misconfiguration
semgrep --config "r/javascript.express.security.audit.express-cors-misconfiguration"
```
### Detection Patterns
- XML External Entity (XXE) vulnerabilities
- Insecure cookie flags (missing Secure, HttpOnly, SameSite)
- Open CORS policies
- Unnecessary features enabled
## A06:2021 - Vulnerable and Outdated Components
### CWE Mappings
- CWE-1035: Using Components with Known Vulnerabilities
- CWE-1104: Use of Unmaintained Third Party Components
### Semgrep Rules
```bash
# Known vulnerable dependencies
semgrep --config "p/supply-chain"
# Deprecated APIs
semgrep --config "p/owasp-top-ten"
```
### Detection Patterns
- Outdated library versions
- Dependencies with known CVEs
- Use of deprecated/unmaintained packages
- Insecure package imports
## A07:2021 - Identification and Authentication Failures
### CWE Mappings
- CWE-287: Improper Authentication
- CWE-288: Authentication Bypass Using Alternate Path/Channel
- CWE-306: Missing Authentication for Critical Function
- CWE-307: Improper Restriction of Excessive Authentication Attempts
- CWE-521: Weak Password Requirements
- CWE-798: Use of Hard-coded Credentials
- CWE-916: Use of Password Hash With Insufficient Computational Effort
### Semgrep Rules
```bash
# Weak password hashing
semgrep --config "r/python.lang.security.audit.hashlib-md5-used"
# Missing authentication
semgrep --config "p/jwt"
# Session management
semgrep --config "r/javascript.express.security.audit.express-session-misconfiguration"
```
### Detection Patterns
- Weak password hashing (MD5, SHA1 without salt)
- Missing multi-factor authentication
- Predictable session identifiers
- Credential stuffing vulnerabilities
## A08:2021 - Software and Data Integrity Failures
### CWE Mappings
- CWE-345: Insufficient Verification of Data Authenticity
- CWE-502: Deserialization of Untrusted Data
- CWE-829: Inclusion of Functionality from Untrusted Control Sphere
- CWE-915: Improperly Controlled Modification of Dynamically-Determined Object Attributes
### Semgrep Rules
```bash
# Unsafe deserialization
semgrep --config "r/python.lang.security.audit.unsafe-pickle"
semgrep --config "r/javascript.lang.security.audit.unsafe-deserialization"
# Prototype pollution
semgrep --config "r/javascript.lang.security.audit.prototype-pollution"
```
### Detection Patterns
- Unsafe deserialization (pickle, YAML, JSON)
- Missing integrity checks on updates
- Prototype pollution in JavaScript
- Unsafe code loading from external sources
## A09:2021 - Security Logging and Monitoring Failures
### CWE Mappings
- CWE-117: Improper Output Neutralization for Logs
- CWE-223: Omission of Security-relevant Information
- CWE-532: Information Exposure Through Log Files
- CWE-778: Insufficient Logging
### Semgrep Rules
```bash
# Log injection
semgrep --config "r/python.lang.security.audit.logging-unsanitized-input"
# Sensitive data in logs
semgrep --config "p/secrets"
```
### Detection Patterns
- Log injection vulnerabilities
- Sensitive data logged (passwords, tokens)
- Missing security event logging
- Insufficient audit trails
## A10:2021 - Server-Side Request Forgery (SSRF)
### CWE Mappings
- CWE-918: Server-Side Request Forgery (SSRF)
### Semgrep Rules
```bash
# SSRF detection
semgrep --config "r/python.requests.security.audit.requests-http-request"
semgrep --config "r/javascript.lang.security.audit.detect-unsafe-url"
```
### Detection Patterns
- Unvalidated URL fetching
- Internal network access via user input
- Missing URL validation
- Bypassing access controls via SSRF
## Using This Mapping
### Scan for Specific OWASP Category
```bash
# Example: Scan for Injection vulnerabilities (A03)
semgrep --config "r/python.django.security.injection.sql" \
--config "r/python.lang.security.audit.exec-used" \
/path/to/code
```
### Comprehensive OWASP Top 10 Scan
```bash
semgrep --config="p/owasp-top-ten" /path/to/code
```
### Filter by CWE
```bash
# Scan and filter results by CWE
semgrep --config="p/security-audit" --json /path/to/code | \
jq '.results[] | select(.extra.metadata.cwe == "CWE-89")'
```
## References
- [OWASP Top 10 2021](https://owasp.org/Top10/)
- [CWE/SANS Top 25](https://cwe.mitre.org/top25/)
- [Semgrep Rule Registry](https://semgrep.dev/explore)

View File

@@ -0,0 +1,471 @@
# Vulnerability Remediation Guide
Security remediation patterns organized by vulnerability category.
## Table of Contents
- [SQL Injection](#sql-injection)
- [Cross-Site Scripting (XSS)](#cross-site-scripting-xss)
- [Command Injection](#command-injection)
- [Path Traversal](#path-traversal)
- [Insecure Deserialization](#insecure-deserialization)
- [Weak Cryptography](#weak-cryptography)
- [Authentication & Session Management](#authentication--session-management)
- [CSRF](#csrf)
- [SSRF](#ssrf)
- [XXE](#xxe)
## SQL Injection
### Vulnerability Pattern
```python
# VULNERABLE
query = f"SELECT * FROM users WHERE id = {user_id}"
cursor.execute(query)
```
### Secure Remediation
```python
# SECURE: Use parameterized queries
query = "SELECT * FROM users WHERE id = %s"
cursor.execute(query, (user_id,))
# Or use ORM
user = User.objects.get(id=user_id)
```
### Framework-Specific Solutions
**Django:**
```python
# Use Django ORM (safe by default)
User.objects.filter(email=user_email)
# For raw SQL, use parameterized queries
User.objects.raw('SELECT * FROM myapp_user WHERE email = %s', [user_email])
```
**Node.js (Sequelize):**
```javascript
// Use parameterized queries
User.findAll({
where: { email: userEmail }
});
// Or use replacements
sequelize.query(
'SELECT * FROM users WHERE email = :email',
{ replacements: { email: userEmail } }
);
```
**Java (JDBC):**
```java
// Use PreparedStatement
String query = "SELECT * FROM users WHERE id = ?";
PreparedStatement stmt = conn.prepareStatement(query);
stmt.setInt(1, userId);
ResultSet rs = stmt.executeQuery();
```
## Cross-Site Scripting (XSS)
### Vulnerability Pattern
```javascript
// VULNERABLE
element.innerHTML = userInput;
document.write(userInput);
```
### Secure Remediation
```javascript
// SECURE: Use textContent for text
element.textContent = userInput;
// Or properly escape HTML
element.innerHTML = escapeHtml(userInput);
function escapeHtml(unsafe) {
return unsafe
.replace(/&/g, "&")
.replace(/</g, "&lt;")
.replace(/>/g, "&gt;")
.replace(/"/g, "&quot;")
.replace(/'/g, "&#039;");
}
```
### Framework-Specific Solutions
**React:**
```javascript
// React auto-escapes by default
<div>{userInput}</div>
// For HTML content, sanitize first
import DOMPurify from 'dompurify';
<div dangerouslySetInnerHTML={{__html: DOMPurify.sanitize(userInput)}} />
```
**Flask/Jinja2:**
```python
# Templates auto-escape by default
{{ user_input }}
# For HTML content, sanitize
from markupsafe import Markup
import bleach
{{ Markup(bleach.clean(user_input)) }}
```
**Django:**
```django
{# Auto-escaped by default #}
{{ user_input }}
{# Mark as safe only after sanitization #}
{{ user_input|safe }}
```
## Command Injection
### Vulnerability Pattern
```python
# VULNERABLE
os.system(f"ping {user_host}")
subprocess.call(f"ls {user_directory}", shell=True)
```
### Secure Remediation
```python
# SECURE: Use subprocess with list arguments
import subprocess
subprocess.run(['ping', '-c', '1', user_host],
capture_output=True, check=True)
# Validate input against allowlist
import shlex
if not re.match(r'^[a-zA-Z0-9.-]+$', user_host):
raise ValueError("Invalid hostname")
subprocess.run(['ping', '-c', '1', user_host])
```
**Node.js:**
```javascript
// VULNERABLE
exec(`ls ${userDir}`);
// SECURE
const { execFile } = require('child_process');
execFile('ls', [userDir], (error, stdout) => {
// Handle output
});
```
## Path Traversal
### Vulnerability Pattern
```python
# VULNERABLE
file_path = os.path.join('/uploads', user_filename)
with open(file_path) as f:
return f.read()
```
### Secure Remediation
```python
# SECURE: Validate and normalize path
import os
from pathlib import Path
def safe_join(directory, user_path):
# Normalize and resolve path
base_dir = Path(directory).resolve()
file_path = (base_dir / user_path).resolve()
# Ensure it's within base directory
if not str(file_path).startswith(str(base_dir)):
raise ValueError("Path traversal detected")
return file_path
try:
safe_path = safe_join('/uploads', user_filename)
with open(safe_path) as f:
return f.read()
except ValueError:
return "Invalid filename"
```
## Insecure Deserialization
### Vulnerability Pattern
```python
# VULNERABLE
import pickle
data = pickle.loads(user_data)
```
### Secure Remediation
```python
# SECURE: Use safe formats like JSON
import json
data = json.loads(user_data)
# If you must deserialize, validate and restrict
import yaml
data = yaml.safe_load(user_data) # Use safe_load, not load
```
**Node.js:**
```javascript
// VULNERABLE
const data = eval(userInput);
const obj = Function(userInput)();
// SECURE
const data = JSON.parse(userInput);
// For complex objects, use schema validation
const Joi = require('joi');
const schema = Joi.object({
name: Joi.string().required(),
email: Joi.string().email().required()
});
const { value, error } = schema.validate(JSON.parse(userInput));
```
## Weak Cryptography
### Vulnerability Pattern
```python
# VULNERABLE
import hashlib
password_hash = hashlib.md5(password.encode()).hexdigest()
```
### Secure Remediation
```python
# SECURE: Use bcrypt or argon2
import bcrypt
# Hashing
password_hash = bcrypt.hashpw(password.encode(), bcrypt.gensalt())
# Verification
if bcrypt.checkpw(password.encode(), stored_hash):
print("Password correct")
# Or use argon2
from argon2 import PasswordHasher
ph = PasswordHasher()
hash = ph.hash(password)
ph.verify(hash, password)
```
**Encryption:**
```python
# VULNERABLE
from Crypto.Cipher import DES
cipher = DES.new(key, DES.MODE_ECB)
# SECURE: Use AES-GCM
from cryptography.hazmat.primitives.ciphers.aead import AESGCM
import os
key = AESGCM.generate_key(bit_length=256)
aesgcm = AESGCM(key)
nonce = os.urandom(12)
ciphertext = aesgcm.encrypt(nonce, plaintext, associated_data)
```
## Authentication & Session Management
### Vulnerability Pattern
```javascript
// VULNERABLE
app.use(session({
secret: 'weak-secret',
cookie: { secure: false }
}));
```
### Secure Remediation
```javascript
// SECURE
const session = require('express-session');
app.use(session({
secret: process.env.SESSION_SECRET, // Strong random secret
resave: false,
saveUninitialized: false,
cookie: {
secure: true, // HTTPS only
httpOnly: true, // No JavaScript access
sameSite: 'strict', // CSRF protection
maxAge: 3600000 // 1 hour
}
}));
```
**Password Requirements:**
```python
# Implement strong password policy
import re
def validate_password(password):
if len(password) < 12:
return False
if not re.search(r'[A-Z]', password):
return False
if not re.search(r'[a-z]', password):
return False
if not re.search(r'[0-9]', password):
return False
if not re.search(r'[!@#$%^&*(),.?":{}|<>]', password):
return False
return True
```
## CSRF
### Vulnerability Pattern
```python
# VULNERABLE: No CSRF protection
@app.route('/transfer', methods=['POST'])
def transfer():
amount = request.form['amount']
to_account = request.form['to']
# Process transfer
```
### Secure Remediation
```python
# SECURE: Use CSRF tokens
from flask_wtf.csrf import CSRFProtect
csrf = CSRFProtect(app)
@app.route('/transfer', methods=['POST'])
@csrf.exempt # Only if using custom CSRF
def transfer():
# CSRF token automatically validated
amount = request.form['amount']
to_account = request.form['to']
```
**Express.js:**
```javascript
const csrf = require('csurf');
const csrfProtection = csrf({ cookie: true });
app.post('/transfer', csrfProtection, (req, res) => {
// CSRF token validated
const { amount, to } = req.body;
});
```
## SSRF
### Vulnerability Pattern
```python
# VULNERABLE
import requests
url = request.args.get('url')
response = requests.get(url)
```
### Secure Remediation
```python
# SECURE: Validate URLs and use allowlist
import requests
from urllib.parse import urlparse
ALLOWED_DOMAINS = ['api.example.com', 'cdn.example.com']
def safe_fetch(url):
parsed = urlparse(url)
# Check protocol
if parsed.scheme not in ['http', 'https']:
raise ValueError("Invalid protocol")
# Check domain against allowlist
if parsed.netloc not in ALLOWED_DOMAINS:
raise ValueError("Domain not allowed")
# Block internal IPs
import ipaddress
try:
ip = ipaddress.ip_address(parsed.hostname)
if ip.is_private:
raise ValueError("Private IP not allowed")
except ValueError:
pass # Not an IP, continue
return requests.get(url, timeout=5)
```
## XXE
### Vulnerability Pattern
```python
# VULNERABLE
from lxml import etree
tree = etree.parse(user_xml)
```
### Secure Remediation
```python
# SECURE: Disable external entities
from lxml import etree
parser = etree.XMLParser(
resolve_entities=False,
no_network=True,
dtd_validation=False
)
tree = etree.parse(user_xml, parser)
# Or use defusedxml
from defusedxml import ElementTree
tree = ElementTree.parse(user_xml)
```
**Node.js:**
```javascript
// Use secure XML parser
const libxmljs = require('libxmljs');
const xml = libxmljs.parseXml(userXml, {
noent: false, // Disable entity expansion
dtdload: false,
dtdvalid: false
});
```
## General Security Principles
1. **Input Validation**: Validate all user input against expected format
2. **Output Encoding**: Encode output based on context (HTML, URL, SQL, etc.)
3. **Least Privilege**: Grant minimum necessary permissions
4. **Defense in Depth**: Use multiple layers of security controls
5. **Fail Securely**: Ensure failures don't expose sensitive data
6. **Secure Defaults**: Use secure configuration by default
7. **Keep Dependencies Updated**: Regularly update libraries and frameworks
## Testing Remediation
After applying fixes:
1. **Verify with Semgrep**: Re-scan to ensure vulnerability is resolved
```bash
semgrep --config <ruleset> fixed_file.py
```
2. **Manual Testing**: Attempt to exploit the vulnerability
3. **Code Review**: Have peer review the fix
4. **Integration Tests**: Add tests to prevent regression
## References
- [OWASP Cheat Sheet Series](https://cheatsheetseries.owasp.org/)
- [CWE Mitigations](https://cwe.mitre.org/)
- [Semgrep Autofix](https://semgrep.dev/docs/writing-rules/autofix/)

View File

@@ -0,0 +1,425 @@
# Semgrep Rule Library
Curated collection of useful Semgrep rulesets and custom rule writing guidance.
## Table of Contents
- [Official Rulesets](#official-rulesets)
- [Language-Specific Rules](#language-specific-rules)
- [Framework-Specific Rules](#framework-specific-rules)
- [Custom Rule Writing](#custom-rule-writing)
- [Rule Testing](#rule-testing)
## Official Rulesets
### Comprehensive Rulesets
| Ruleset | Config | Description | Use Case |
|---------|--------|-------------|----------|
| Auto | `auto` | Automatically selected rules based on detected languages | Quick scans, baseline |
| Security Audit | `p/security-audit` | Comprehensive security rules across languages | Deep security review |
| OWASP Top 10 | `p/owasp-top-ten` | OWASP Top 10 2021 coverage | Compliance, security gates |
| CWE Top 25 | `p/cwe-top-25` | SANS/CWE Top 25 dangerous errors | Critical vulnerability detection |
| CI | `p/ci` | Fast, low false-positive rules for CI/CD | Pull request gates |
| Default | `p/default` | Balanced security and quality rules | General purpose scanning |
### Specialized Rulesets
| Ruleset | Config | Focus Area |
|---------|--------|------------|
| Secrets | `p/secrets` | Hard-coded credentials, API keys |
| Cryptography | `p/crypto` | Weak crypto, hashing issues |
| Supply Chain | `p/supply-chain` | Dependency vulnerabilities |
| JWT | `p/jwt` | JSON Web Token security |
| SQL Injection | `p/sql-injection` | SQL injection patterns |
| XSS | `p/xss` | Cross-site scripting |
| Command Injection | `p/command-injection` | OS command injection |
## Language-Specific Rules
### Python
```bash
# Django security
semgrep --config "p/django"
# Flask security
semgrep --config "r/python.flask.security"
# General Python security
semgrep --config "r/python.lang.security"
# Specific vulnerabilities
semgrep --config "r/python.lang.security.audit.exec-used"
semgrep --config "r/python.lang.security.audit.unsafe-pickle"
semgrep --config "r/python.lang.security.audit.dangerous-subprocess-use"
```
**Key Python Rules:**
- `python.django.security.injection.sql.sql-injection-db-cursor-execute`
- `python.flask.security.xss.audit.template-xss`
- `python.lang.security.audit.exec-used`
- `python.lang.security.audit.dangerous-os-module-methods`
- `python.lang.security.audit.hashlib-md5-used`
### JavaScript/TypeScript
```bash
# Express.js security
semgrep --config "p/express"
# React security
semgrep --config "p/react"
# Node.js security
semgrep --config "r/javascript.lang.security"
# Specific vulnerabilities
semgrep --config "r/javascript.lang.security.audit.eval-detected"
semgrep --config "r/javascript.lang.security.audit.unsafe-exec"
```
**Key JavaScript Rules:**
- `javascript.express.security.audit.xss.mustache.var-in-href`
- `javascript.lang.security.audit.eval-detected`
- `javascript.lang.security.audit.path-traversal`
- `javascript.sequelize.security.audit.sequelize-injection-express`
### Java
```bash
# Spring security
semgrep --config "p/spring"
# General Java security
semgrep --config "r/java.lang.security"
# Specific frameworks
semgrep --config "r/java.spring.security"
```
**Key Java Rules:**
- `java.lang.security.audit.sqli.jdbc-sqli`
- `java.lang.security.audit.xxe.xmlinputfactory-xxe`
- `java.spring.security.audit.spring-cookie-missing-httponly`
### Go
```bash
# Go security rules
semgrep --config "r/go.lang.security"
# Specific vulnerabilities
semgrep --config "r/go.lang.security.audit.net.use-of-tls-with-go-sql-driver"
semgrep --config "r/go.lang.security.audit.crypto.use_of_weak_crypto"
```
### PHP
```bash
# PHP security
semgrep --config "p/php"
# Laravel security
semgrep --config "r/php.laravel.security"
# Specific vulnerabilities
semgrep --config "r/php.lang.security.audit.sqli"
semgrep --config "r/php.lang.security.audit.dangerous-exec"
```
## Framework-Specific Rules
### Web Frameworks
**Django:**
```bash
semgrep --config "p/django"
# Covers: SQL injection, XSS, CSRF, auth issues
```
**Flask:**
```bash
semgrep --config "r/python.flask.security"
# Covers: XSS, debug mode, secure cookies
```
**Express.js:**
```bash
semgrep --config "p/express"
# Covers: XSS, CSRF, session config, CORS
```
**Spring Boot:**
```bash
semgrep --config "p/spring"
# Covers: SQL injection, XXE, auth, SSRF
```
### Cloud & Infrastructure
**Terraform:**
```bash
semgrep --config "r/terraform.lang.security"
# Covers: S3 buckets, security groups, encryption
```
**Kubernetes:**
```bash
semgrep --config "r/yaml.kubernetes.security"
# Covers: privileged containers, secrets, rbac
```
**Docker:**
```bash
semgrep --config "r/dockerfile.security"
# Covers: unsafe base images, secrets, root user
```
## Custom Rule Writing
### Rule Anatomy
```yaml
rules:
- id: custom-rule-id
pattern: execute($SQL)
message: Potential security issue detected
severity: WARNING
languages: [python]
metadata:
category: security
cwe: "CWE-89"
owasp: "A03:2021-Injection"
confidence: HIGH
```
### Pattern Types
**1. Basic Pattern**
```yaml
pattern: dangerous_function($ARG)
```
**2. Pattern-Inside (Context)**
```yaml
patterns:
- pattern: execute($QUERY)
- pattern-inside: |
$QUERY = $USER_INPUT + ...
```
**3. Pattern-Not (Exclusion)**
```yaml
patterns:
- pattern: execute($QUERY)
- pattern-not: execute("SELECT * FROM safe_table")
```
**4. Pattern-Either (OR logic)**
```yaml
pattern-either:
- pattern: eval($ARG)
- pattern: exec($ARG)
```
**5. Metavariable Comparison**
```yaml
patterns:
- pattern: crypto.encrypt($DATA, $KEY)
- metavariable-comparison:
metavariable: $KEY
comparison: len($KEY) < 16
```
### Example Custom Rules
**Detect Hard-coded AWS Keys:**
```yaml
rules:
- id: hardcoded-aws-key
patterns:
- pattern-regex: 'AKIA[0-9A-Z]{16}'
message: Hard-coded AWS access key detected
severity: ERROR
languages: [python, javascript, java, go]
metadata:
category: security
cwe: "CWE-798"
confidence: HIGH
```
**Detect Unsafe File Operations:**
```yaml
rules:
- id: unsafe-file-read
patterns:
- pattern: open($PATH, ...)
- pattern-inside: |
def $FUNC(..., $USER_INPUT, ...):
...
$PATH = ... + $USER_INPUT + ...
...
message: File path constructed from user input (path traversal risk)
severity: WARNING
languages: [python]
metadata:
cwe: "CWE-22"
owasp: "A01:2021-Broken-Access-Control"
```
**Detect Missing CSRF Protection:**
```yaml
rules:
- id: flask-missing-csrf
patterns:
- pattern: |
@app.route($PATH, methods=[..., "POST", ...])
def $FUNC(...):
...
- pattern-not-inside: |
@csrf.exempt
...
- pattern-not-inside: |
csrf_token = ...
...
message: POST route without CSRF protection
severity: ERROR
languages: [python]
metadata:
cwe: "CWE-352"
owasp: "A01:2021-Broken-Access-Control"
```
**Detect Insecure Random:**
```yaml
rules:
- id: insecure-random-for-crypto
patterns:
- pattern-either:
- pattern: random.random()
- pattern: random.randint(...)
- pattern-inside: |
def ..._token(...):
...
message: Using insecure random for security token
severity: ERROR
languages: [python]
metadata:
cwe: "CWE-330"
fix: "Use secrets module: secrets.token_bytes(32)"
```
### Rule Metadata Best Practices
Include comprehensive metadata:
```yaml
metadata:
category: security # Type of issue
cwe: "CWE-XXX" # CWE mapping
owasp: "AXX:2021-Name" # OWASP category
confidence: HIGH|MEDIUM|LOW # Detection confidence
likelihood: HIGH|MEDIUM|LOW # Exploitation likelihood
impact: HIGH|MEDIUM|LOW # Security impact
subcategory: [vuln-type] # More specific categorization
source-rule: url # If adapted from elsewhere
references:
- https://example.com/docs
```
## Rule Testing
### Test File Structure
```
custom-rules/
├── rules.yaml # Your custom rules
└── tests/
├── test-sqli.py # Test cases
└── test-xss.js # Test cases
```
### Writing Tests
```python
# tests/test-sqli.py
# ruleid: custom-sql-injection
cursor.execute(f"SELECT * FROM users WHERE id = {user_id}")
# ok: custom-sql-injection
cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))
```
### Running Tests
```bash
# Test custom rules
semgrep --config rules.yaml --test tests/
# Validate rule syntax
semgrep --validate --config rules.yaml
```
## Rule Performance Optimization
### 1. Use Specific Patterns
```yaml
# SLOW
pattern: $X
# FAST
pattern: dangerous_function($X)
```
### 2. Limit Language Scope
```yaml
# Only scan relevant languages
languages: [python, javascript]
```
### 3. Use Pattern-Inside Wisely
```yaml
# Narrow down context early
patterns:
- pattern-inside: |
def handle_request(...):
...
- pattern: execute($QUERY)
```
### 4. Exclude Test Files
```yaml
paths:
exclude:
- "*/test_*.py"
- "*/tests/*"
- "*_test.go"
```
## Community Rules
Explore community-contributed rules:
```bash
# Browse rules by technology
semgrep --config "r/python.django"
semgrep --config "r/javascript.react"
semgrep --config "r/go.gorilla"
# Browse by vulnerability type
semgrep --config "r/generic.secrets"
semgrep --config "r/generic.html-templates"
```
**Useful Community Rulesets:**
- `r/python.aws-lambda.security` - AWS Lambda security
- `r/terraform.aws.security` - AWS Terraform
- `r/dockerfile.best-practice` - Docker best practices
- `r/yaml.github-actions.security` - GitHub Actions security
## References
- [Semgrep Rule Syntax](https://semgrep.dev/docs/writing-rules/rule-syntax/)
- [Semgrep Registry](https://semgrep.dev/explore)
- [Pattern Examples](https://semgrep.dev/docs/writing-rules/pattern-examples/)
- [Rule Writing Tutorial](https://semgrep.dev/learn)