gh-agentsecops-secopsagentkit/skills/appsec/sast-semgrep/references/rule_library.md

# Semgrep Rule Library

Curated collection of useful Semgrep rulesets and custom rule writing guidance.

## Table of Contents

- [Official Rulesets](#official-rulesets)
- [Language-Specific Rules](#language-specific-rules)
- [Framework-Specific Rules](#framework-specific-rules)
- [Custom Rule Writing](#custom-rule-writing)
- [Rule Testing](#rule-testing)

## Official Rulesets

### Comprehensive Rulesets

| Ruleset | Config | Description | Use Case |
|---------|--------|-------------|----------|
| Auto | `auto` | Automatically selected rules based on detected languages | Quick scans, baseline |
| Security Audit | `p/security-audit` | Comprehensive security rules across languages | Deep security review |
| OWASP Top 10 | `p/owasp-top-ten` | OWASP Top 10 2021 coverage | Compliance, security gates |
| CWE Top 25 | `p/cwe-top-25` | SANS/CWE Top 25 dangerous errors | Critical vulnerability detection |
| CI | `p/ci` | Fast, low false-positive rules for CI/CD | Pull request gates |
| Default | `p/default` | Balanced security and quality rules | General purpose scanning |

### Specialized Rulesets

| Ruleset | Config | Focus Area |
|---------|--------|------------|
| Secrets | `p/secrets` | Hard-coded credentials, API keys |
| Cryptography | `p/crypto` | Weak crypto, hashing issues |
| Supply Chain | `p/supply-chain` | Dependency vulnerabilities |
| JWT | `p/jwt` | JSON Web Token security |
| SQL Injection | `p/sql-injection` | SQL injection patterns |
| XSS | `p/xss` | Cross-site scripting |
| Command Injection | `p/command-injection` | OS command injection |

## Language-Specific Rules

### Python

```bash
# Django security
semgrep --config "p/django"

# Flask security
semgrep --config "r/python.flask.security"

# General Python security
semgrep --config "r/python.lang.security"

# Specific vulnerabilities
semgrep --config "r/python.lang.security.audit.exec-used"
semgrep --config "r/python.lang.security.audit.unsafe-pickle"
semgrep --config "r/python.lang.security.audit.dangerous-subprocess-use"
```

**Key Python Rules:**
- `python.django.security.injection.sql.sql-injection-db-cursor-execute`
- `python.flask.security.xss.audit.template-xss`
- `python.lang.security.audit.exec-used`
- `python.lang.security.audit.dangerous-os-module-methods`
- `python.lang.security.audit.hashlib-md5-used`

### JavaScript/TypeScript

```bash
# Express.js security
semgrep --config "p/express"

# React security
semgrep --config "p/react"

# Node.js security
semgrep --config "r/javascript.lang.security"

# Specific vulnerabilities
semgrep --config "r/javascript.lang.security.audit.eval-detected"
semgrep --config "r/javascript.lang.security.audit.unsafe-exec"
```

**Key JavaScript Rules:**
- `javascript.express.security.audit.xss.mustache.var-in-href`
- `javascript.lang.security.audit.eval-detected`
- `javascript.lang.security.audit.path-traversal`
- `javascript.sequelize.security.audit.sequelize-injection-express`

### Java

```bash
# Spring security
semgrep --config "p/spring"

# General Java security
semgrep --config "r/java.lang.security"

# Specific frameworks
semgrep --config "r/java.spring.security"
```

**Key Java Rules:**
- `java.lang.security.audit.sqli.jdbc-sqli`
- `java.lang.security.audit.xxe.xmlinputfactory-xxe`
- `java.spring.security.audit.spring-cookie-missing-httponly`

### Go

```bash
# Go security rules
semgrep --config "r/go.lang.security"

# Specific vulnerabilities
semgrep --config "r/go.lang.security.audit.net.use-of-tls-with-go-sql-driver"
semgrep --config "r/go.lang.security.audit.crypto.use_of_weak_crypto"
```

### PHP

```bash
# PHP security
semgrep --config "p/php"

# Laravel security
semgrep --config "r/php.laravel.security"

# Specific vulnerabilities
semgrep --config "r/php.lang.security.audit.sqli"
semgrep --config "r/php.lang.security.audit.dangerous-exec"
```

## Framework-Specific Rules

### Web Frameworks

**Django:**
```bash
semgrep --config "p/django"
# Covers: SQL injection, XSS, CSRF, auth issues
```

**Flask:**
```bash
semgrep --config "r/python.flask.security"
# Covers: XSS, debug mode, secure cookies
```

**Express.js:**
```bash
semgrep --config "p/express"
# Covers: XSS, CSRF, session config, CORS
```

**Spring Boot:**
```bash
semgrep --config "p/spring"
# Covers: SQL injection, XXE, auth, SSRF
```

### Cloud & Infrastructure

**Terraform:**
```bash
semgrep --config "r/terraform.lang.security"
# Covers: S3 buckets, security groups, encryption
```

**Kubernetes:**
```bash
semgrep --config "r/yaml.kubernetes.security"
# Covers: privileged containers, secrets, rbac
```

**Docker:**
```bash
semgrep --config "r/dockerfile.security"
# Covers: unsafe base images, secrets, root user
```

## Custom Rule Writing

### Rule Anatomy

```yaml
rules:
  - id: custom-rule-id
    pattern: execute($SQL)
    message: Potential security issue detected
    severity: WARNING
    languages: [python]
    metadata:
      category: security
      cwe: "CWE-89"
      owasp: "A03:2021-Injection"
      confidence: HIGH
```

### Pattern Types

**1. Basic Pattern**
```yaml
pattern: dangerous_function($ARG)
```

**2. Pattern-Inside (Context)**
```yaml
patterns:
  - pattern: execute($QUERY)
  - pattern-inside: |
      $QUERY = $USER_INPUT + ...
```

**3. Pattern-Not (Exclusion)**
```yaml
patterns:
  - pattern: execute($QUERY)
  - pattern-not: execute("SELECT * FROM safe_table")
```

**4. Pattern-Either (OR logic)**
```yaml
pattern-either:
  - pattern: eval($ARG)
  - pattern: exec($ARG)
```

**5. Metavariable Comparison**
```yaml
patterns:
  - pattern: crypto.encrypt($DATA, $KEY)
  - metavariable-comparison:
      metavariable: $KEY
      comparison: len($KEY) < 16
```

### Example Custom Rules

**Detect Hard-coded AWS Keys:**
```yaml
rules:
  - id: hardcoded-aws-key
    patterns:
      - pattern-regex: 'AKIA[0-9A-Z]{16}'
    message: Hard-coded AWS access key detected
    severity: ERROR
    languages: [python, javascript, java, go]
    metadata:
      category: security
      cwe: "CWE-798"
      confidence: HIGH
```

**Detect Unsafe File Operations:**
```yaml
rules:
  - id: unsafe-file-read
    patterns:
      - pattern: open($PATH, ...)
      - pattern-inside: |
          def $FUNC(..., $USER_INPUT, ...):
            ...
            $PATH = ... + $USER_INPUT + ...
            ...
    message: File path constructed from user input (path traversal risk)
    severity: WARNING
    languages: [python]
    metadata:
      cwe: "CWE-22"
      owasp: "A01:2021-Broken-Access-Control"
```

**Detect Missing CSRF Protection:**
```yaml
rules:
  - id: flask-missing-csrf
    patterns:
      - pattern: |
          @app.route($PATH, methods=[..., "POST", ...])
          def $FUNC(...):
            ...
      - pattern-not-inside: |
          @csrf.exempt
          ...
      - pattern-not-inside: |
          csrf_token = ...
          ...
    message: POST route without CSRF protection
    severity: ERROR
    languages: [python]
    metadata:
      cwe: "CWE-352"
      owasp: "A01:2021-Broken-Access-Control"
```

**Detect Insecure Random:**
```yaml
rules:
  - id: insecure-random-for-crypto
    patterns:
      - pattern-either:
          - pattern: random.random()
          - pattern: random.randint(...)
      - pattern-inside: |
          def ..._token(...):
            ...
    message: Using insecure random for security token
    severity: ERROR
    languages: [python]
    metadata:
      cwe: "CWE-330"
      fix: "Use secrets module: secrets.token_bytes(32)"
```

### Rule Metadata Best Practices

Include comprehensive metadata:
```yaml
metadata:
  category: security          # Type of issue
  cwe: "CWE-XXX"             # CWE mapping
  owasp: "AXX:2021-Name"     # OWASP category
  confidence: HIGH|MEDIUM|LOW # Detection confidence
  likelihood: HIGH|MEDIUM|LOW # Exploitation likelihood
  impact: HIGH|MEDIUM|LOW     # Security impact
  subcategory: [vuln-type]   # More specific categorization
  source-rule: url           # If adapted from elsewhere
  references:
    - https://example.com/docs
```

## Rule Testing

### Test File Structure
```
custom-rules/
├── rules.yaml          # Your custom rules
└── tests/
    ├── test-sqli.py   # Test cases
    └── test-xss.js    # Test cases
```

### Writing Tests

```python
# tests/test-sqli.py

# ruleid: custom-sql-injection
cursor.execute(f"SELECT * FROM users WHERE id = {user_id}")

# ok: custom-sql-injection
cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))
```

### Running Tests

```bash
# Test custom rules
semgrep --config rules.yaml --test tests/

# Validate rule syntax
semgrep --validate --config rules.yaml
```

## Rule Performance Optimization

### 1. Use Specific Patterns
```yaml
# SLOW
pattern: $X

# FAST
pattern: dangerous_function($X)
```

### 2. Limit Language Scope
```yaml
# Only scan relevant languages
languages: [python, javascript]
```

### 3. Use Pattern-Inside Wisely
```yaml
# Narrow down context early
patterns:
  - pattern-inside: |
      def handle_request(...):
        ...
  - pattern: execute($QUERY)
```

### 4. Exclude Test Files
```yaml
paths:
  exclude:
    - "*/test_*.py"
    - "*/tests/*"
    - "*_test.go"
```

## Community Rules

Explore community-contributed rules:

```bash
# Browse rules by technology
semgrep --config "r/python.django"
semgrep --config "r/javascript.react"
semgrep --config "r/go.gorilla"

# Browse by vulnerability type
semgrep --config "r/generic.secrets"
semgrep --config "r/generic.html-templates"
```

**Useful Community Rulesets:**
- `r/python.aws-lambda.security` - AWS Lambda security
- `r/terraform.aws.security` - AWS Terraform
- `r/dockerfile.best-practice` - Docker best practices
- `r/yaml.github-actions.security` - GitHub Actions security

## References

- [Semgrep Rule Syntax](https://semgrep.dev/docs/writing-rules/rule-syntax/)
- [Semgrep Registry](https://semgrep.dev/explore)
- [Pattern Examples](https://semgrep.dev/docs/writing-rules/pattern-examples/)
- [Rule Writing Tutorial](https://semgrep.dev/learn)