Initial commit

2025-11-29 17:51:02 +08:00
commit ff1f4bd119
252 changed files with 72682 additions and 0 deletions
--- a/skills/appsec/sast-semgrep/SKILL.md
+++ b/skills/appsec/sast-semgrep/SKILL.md
@@ -0,0 +1,284 @@
+---
+name: sast-semgrep
+description: >
+  Static application security testing (SAST) using Semgrep for vulnerability detection,
+  security code review, and secure coding guidance with OWASP and CWE framework mapping.
+  Use when: (1) Scanning code for security vulnerabilities across multiple languages,
+  (2) Performing security code reviews with pattern-based detection, (3) Integrating
+  SAST checks into CI/CD pipelines, (4) Providing remediation guidance with OWASP Top 10
+  and CWE mappings, (5) Creating custom security rules for organization-specific patterns,
+  (6) Analyzing dependencies for known vulnerabilities.
+version: 0.1.0
+maintainer: SirAppSec
+category: appsec
+tags: [sast, semgrep, vulnerability-scanning, code-security, owasp, cwe, security-review]
+frameworks: [OWASP, CWE, SANS-25]
+dependencies:
+  python: ">=3.8"
+  packages: [semgrep]
+  tools: [git]
+references:
+  - https://semgrep.dev/docs/
+  - https://owasp.org/Top10/
+  - https://cwe.mitre.org/
+---
+
+# SAST with Semgrep
+
+## Overview
+
+Perform comprehensive static application security testing using Semgrep, a fast, open-source
+static analysis tool. This skill provides automated vulnerability detection, security code
+review workflows, and remediation guidance mapped to OWASP Top 10 and CWE standards.
+
+## Quick Start
+
+Scan a codebase for security vulnerabilities:
+
+```bash
+semgrep --config=auto --severity=ERROR --severity=WARNING /path/to/code
+```
+
+Run with OWASP Top 10 ruleset:
+
+```bash
+semgrep --config="p/owasp-top-ten" /path/to/code
+```
+
+## Core Workflows
+
+### Workflow 1: Initial Security Scan
+
+1. Identify the primary languages in the codebase
+2. Run `scripts/semgrep_scan.py` with appropriate rulesets
+3. Parse findings and categorize by severity (CRITICAL, HIGH, MEDIUM, LOW)
+4. Map findings to OWASP Top 10 and CWE categories
+5. Generate prioritized remediation report
+
+### Workflow 2: Security Code Review
+
+1. For pull requests or commits, run targeted scans on changed files
+2. Use `semgrep --diff` to scan only modified code
+3. Flag high-severity findings as blocking issues
+4. Provide inline remediation guidance from `references/remediation_guide.md`
+5. Link findings to secure coding patterns
+
+### Workflow 3: Custom Rule Development
+
+1. Identify organization-specific security patterns to detect
+2. Create custom Semgrep rules in YAML format using `assets/rule_template.yaml`
+3. Test rules against known vulnerable code samples
+4. Integrate custom rules into CI/CD pipeline
+5. Document rules in `references/custom_rules.md`
+
+### Workflow 4: CI/CD Integration
+
+1. Add Semgrep to CI/CD pipeline using `assets/ci_config_examples/`
+2. Configure baseline scanning for pull requests
+3. Set severity thresholds (fail on CRITICAL/HIGH)
+4. Generate SARIF output for security dashboards
+5. Track metrics: vulnerabilities found, fix rate, false positives
+
+## Security Considerations
+
+- **Sensitive Data Handling**: Semgrep scans code locally; ensure scan results don't leak
+  secrets or proprietary code patterns. Use `--max-lines-per-finding` to limit output.
+
+- **Access Control**: Semgrep scans require read access to source code. Restrict scan
+  result access to authorized security and development teams.
+
+- **Audit Logging**: Log all scan executions with timestamps, user, commit hash, and
+  findings count for compliance auditing.
+
+- **Compliance**: SAST scanning supports SOC2, PCI-DSS, and GDPR compliance requirements.
+  Maintain scan history and remediation tracking.
+
+- **Safe Defaults**: Use `--config=auto` for balanced detection. For security-critical
+  applications, use `--config="p/security-audit"` for comprehensive coverage.
+
+## Language Support
+
+Semgrep supports 30+ languages including:
+- **Web**: JavaScript, TypeScript, Python, Ruby, PHP, Java, C#, Go
+- **Mobile**: Swift, Kotlin, Java (Android)
+- **Infrastructure**: Terraform, Dockerfile, YAML, JSON
+- **Other**: C, C++, Rust, Scala, Solidity
+
+## Bundled Resources
+
+### Scripts
+
+- `scripts/semgrep_scan.py` - Full-featured scanning with OWASP/CWE mapping and reporting
+- `scripts/baseline_scan.sh` - Quick baseline scan for CI/CD
+- `scripts/diff_scan.sh` - Scan only changed files (for PRs)
+
+### References
+
+- `references/owasp_cwe_mapping.md` - OWASP Top 10 to CWE mapping with Semgrep rules
+- `references/remediation_guide.md` - Vulnerability remediation patterns by category
+- `references/rule_library.md` - Curated list of useful Semgrep rulesets
+
+### Assets
+
+- `assets/rule_template.yaml` - Template for creating custom Semgrep rules
+- `assets/ci_config_examples/` - CI/CD integration examples (GitHub Actions, GitLab CI)
+- `assets/semgrep_config.yaml` - Recommended Semgrep configuration
+
+## Common Patterns
+
+### Pattern 1: Daily Security Baseline Scan
+
+```bash
+# Run comprehensive scan and generate report
+scripts/semgrep_scan.py --config security-audit \
+  --output results.json \
+  --format json \
+  --severity HIGH CRITICAL
+```
+
+### Pattern 2: Pull Request Security Gate
+
+```bash
+# Scan only changed files, fail on HIGH/CRITICAL
+scripts/diff_scan.sh --fail-on high \
+  --base-branch main \
+  --output sarif
+```
+
+### Pattern 3: Vulnerability Research
+
+```bash
+# Search for specific vulnerability patterns
+semgrep --config "r/javascript.lang.security.audit.xss" \
+  --json /path/to/code | jq '.results'
+```
+
+### Pattern 4: Custom Rule Validation
+
+```bash
+# Test custom rule against vulnerable samples
+semgrep --config assets/custom_rules.yaml \
+  --test tests/vulnerable_samples/
+```
+
+## Integration Points
+
+### CI/CD Integration
+
+- **GitHub Actions**: Use `semgrep/semgrep-action@v1` with SARIF upload
+- **GitLab CI**: Run as security scanning job with artifact reports
+- **Jenkins**: Execute as build step with quality gate integration
+- **pre-commit hooks**: Run lightweight scans on staged files
+
+See `assets/ci_config_examples/` for ready-to-use configurations.
+
+### Security Tool Integration
+
+- **SIEM/SOAR**: Export findings in JSON/SARIF for ingestion
+- **Vulnerability Management**: Integrate with Jira, DefectDojo, or ThreadFix
+- **IDE Integration**: Use Semgrep IDE plugins for real-time detection
+- **Secret Scanning**: Combine with tools like trufflehog, gitleaks
+
+### SDLC Integration
+
+- **Requirements Phase**: Define security requirements and custom rules
+- **Development**: IDE plugins provide real-time feedback
+- **Code Review**: Automated security review in PR workflow
+- **Testing**: Integrate with security testing framework
+- **Deployment**: Final security gate before production
+
+## Severity Classification
+
+Semgrep findings are classified by severity:
+
+- **CRITICAL**: Exploitable vulnerabilities (SQLi, RCE, Auth bypass)
+- **HIGH**: Significant security risks (XSS, CSRF, sensitive data exposure)
+- **MEDIUM**: Security weaknesses (weak crypto, missing validation)
+- **LOW**: Code quality issues with security implications
+- **INFO**: Security best practice recommendations
+
+## Performance Optimization
+
+For large codebases:
+
+```bash
+# Use --jobs for parallel scanning
+semgrep --config auto --jobs 4
+
+# Exclude vendor/test code
+semgrep --config auto --exclude "vendor/" --exclude "test/"
+
+# Use lightweight rulesets for faster feedback
+semgrep --config "p/owasp-top-ten" --exclude-rule "generic.*"
+```
+
+## Troubleshooting
+
+### Issue: Too Many False Positives
+
+**Solution**:
+- Use `--exclude-rule` to disable noisy rules
+- Create `.semgrepignore` file to exclude false positive patterns
+- Tune rules using `--severity` filtering
+- Add `# nosemgrep` comments for confirmed false positives (with justification)
+
+### Issue: Scan Taking Too Long
+
+**Solution**:
+- Use `--exclude` for vendor/generated code
+- Increase `--jobs` for parallel processing
+- Use targeted rulesets instead of `--config=auto`
+- Run incremental scans with `--diff`
+
+### Issue: Missing Vulnerabilities
+
+**Solution**:
+- Use comprehensive rulesets: `p/security-audit` or `p/owasp-top-ten`
+- Consult `references/rule_library.md` for specialized rules
+- Create custom rules for organization-specific patterns
+- Combine with dynamic analysis (DAST) and dependency scanning
+
+## Advanced Usage
+
+### Creating Custom Rules
+
+See `references/rule_library.md` for guidance on writing effective Semgrep rules.
+Use `assets/rule_template.yaml` as a starting point.
+
+Example rule structure:
+```yaml
+rules:
+  - id: custom-sql-injection
+    patterns:
+      - pattern: execute($QUERY)
+      - pattern-inside: |
+          $QUERY = $USER_INPUT + ...
+    message: Potential SQL injection from user input concatenation
+    severity: ERROR
+    languages: [python]
+    metadata:
+      cwe: "CWE-89"
+      owasp: "A03:2021-Injection"
+```
+
+### OWASP Top 10 Coverage
+
+This skill provides detection for all OWASP Top 10 2021 categories.
+See `references/owasp_cwe_mapping.md` for complete coverage matrix.
+
+## Best Practices
+
+1. **Baseline First**: Establish security baseline before enforcing gates
+2. **Progressive Rollout**: Start with HIGH/CRITICAL, expand to MEDIUM over time
+3. **Developer Training**: Educate team on common vulnerabilities and fixes
+4. **Rule Maintenance**: Regularly update rulesets and tune for your stack
+5. **Metrics Tracking**: Monitor vulnerability trends, MTTR, and false positive rate
+6. **Defense in Depth**: Combine with DAST, SCA, and manual code review
+
+## References
+
+- [Semgrep Documentation](https://semgrep.dev/docs/)
+- [Semgrep Rule Registry](https://semgrep.dev/explore)
+- [OWASP Top 10 2021](https://owasp.org/Top10/)
+- [CWE Top 25](https://cwe.mitre.org/top25/)
+- [SANS Top 25](https://www.sans.org/top25-software-errors/)
--- a/skills/appsec/sast-semgrep/assets/ci_config_examples/github-actions.yml
+++ b/skills/appsec/sast-semgrep/assets/ci_config_examples/github-actions.yml
@@ -0,0 +1,141 @@
+# GitHub Actions - Semgrep Security Scanning
+# Save as .github/workflows/semgrep.yml
+
+name: Semgrep Security Scan
+
+on:
+  # Scan on push to main/master
+  push:
+    branches:
+      - main
+      - master
+  # Scan pull requests
+  pull_request:
+    branches:
+      - main
+      - master
+  # Manual trigger
+  workflow_dispatch:
+  # Schedule daily scans
+  schedule:
+    - cron: '0 0 * * *'  # Run at midnight UTC
+
+jobs:
+  semgrep:
+    name: SAST Security Scan
+    runs-on: ubuntu-latest
+
+    # Required for uploading results to GitHub Security
+    permissions:
+      security-events: write
+      actions: read
+      contents: read
+
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@v4
+
+      - name: Run Semgrep
+        uses: semgrep/semgrep-action@v1
+        with:
+          # Ruleset to use
+          config: >-
+            p/security-audit
+            p/owasp-top-ten
+            p/cwe-top-25
+
+          # Generate SARIF for GitHub Security
+          publishToken: ${{ secrets.SEMGREP_APP_TOKEN }}
+          publishDeployment: ${{ secrets.SEMGREP_DEPLOYMENT_ID }}
+
+          # Fail on HIGH/ERROR severity
+          # auditOn: push
+
+      - name: Upload SARIF to GitHub Security
+        if: always()
+        uses: github/codeql-action/upload-sarif@v3
+        with:
+          sarif_file: semgrep.sarif
+
+      - name: Upload scan results as artifact
+        if: always()
+        uses: actions/upload-artifact@v4
+        with:
+          name: semgrep-results
+          path: semgrep.sarif
+
+# Alternative: Simpler configuration without Semgrep Cloud
+---
+name: Semgrep Security Scan (Simple)
+
+on:
+  pull_request:
+    branches: [main, master]
+  push:
+    branches: [main, master]
+
+jobs:
+  semgrep:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: '3.11'
+
+      - name: Install Semgrep
+        run: pip install semgrep
+
+      - name: Run Semgrep Scan
+        run: |
+          semgrep --config="p/security-audit" \
+                   --config="p/owasp-top-ten" \
+                   --sarif \
+                   --output=semgrep-results.sarif \
+                   --severity=ERROR \
+                   --severity=WARNING
+
+      - name: Upload SARIF results
+        if: always()
+        uses: github/codeql-action/upload-sarif@v3
+        with:
+          sarif_file: semgrep-results.sarif
+
+# PR-specific: Only scan changed files
+---
+name: Semgrep PR Scan
+
+on:
+  pull_request:
+
+jobs:
+  semgrep-diff:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+        with:
+          fetch-depth: 0  # Fetch full history for diff
+
+      - name: Install Semgrep
+        run: pip install semgrep
+
+      - name: Scan changed files only
+        run: |
+          semgrep --config="p/security-audit" \
+                   --baseline-commit="${{ github.event.pull_request.base.sha }}" \
+                   --json \
+                   --output=results.json
+
+      - name: Check for findings
+        run: |
+          FINDINGS=$(jq '.results | length' results.json)
+          echo "Found $FINDINGS security issues"
+          if [ "$FINDINGS" -gt 0 ]; then
+            echo "❌ Security issues detected!"
+            jq '.results[] | "[\(.extra.severity)] \(.check_id) - \(.path):\(.start.line)"' results.json
+            exit 1
+          else
+            echo "✅ No security issues found"
+          fi
--- a/skills/appsec/sast-semgrep/assets/ci_config_examples/gitlab-ci.yml
+++ b/skills/appsec/sast-semgrep/assets/ci_config_examples/gitlab-ci.yml
@@ -0,0 +1,106 @@
+# GitLab CI - Semgrep Security Scanning
+# Add to .gitlab-ci.yml
+
+stages:
+  - test
+  - security
+
+# Basic Semgrep scan
+semgrep-scan:
+  stage: security
+  image: semgrep/semgrep:latest
+  script:
+    - semgrep --config="p/security-audit"
+              --config="p/owasp-top-ten"
+              --gitlab-sast
+              --output=gl-sast-report.json
+  artifacts:
+    reports:
+      sast: gl-sast-report.json
+    paths:
+      - gl-sast-report.json
+    expire_in: 1 week
+  rules:
+    - if: $CI_MERGE_REQUEST_ID  # Run on MRs
+    - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH  # Run on default branch
+
+# Advanced: Fail on HIGH severity findings
+semgrep-strict:
+  stage: security
+  image: python:3.11-slim
+  before_script:
+    - pip install semgrep
+  script:
+    - |
+      semgrep --config="p/security-audit" \
+               --severity=ERROR \
+               --json \
+               --output=results.json
+
+      CRITICAL=$(jq '[.results[] | select(.extra.severity == "ERROR")] | length' results.json)
+      echo "Found $CRITICAL critical findings"
+
+      if [ "$CRITICAL" -gt 0 ]; then
+        echo "❌ Critical security issues detected!"
+        jq '.results[] | select(.extra.severity == "ERROR")' results.json
+        exit 1
+      fi
+  artifacts:
+    paths:
+      - results.json
+    expire_in: 1 week
+    when: always
+  allow_failure: false
+
+# Differential scanning - only new findings in MR
+semgrep-diff:
+  stage: security
+  image: semgrep/semgrep:latest
+  script:
+    - git fetch origin $CI_MERGE_REQUEST_TARGET_BRANCH_NAME
+    - |
+      semgrep --config="p/security-audit" \
+               --baseline-commit="origin/$CI_MERGE_REQUEST_TARGET_BRANCH_NAME" \
+               --gitlab-sast \
+               --output=gl-sast-report.json
+  artifacts:
+    reports:
+      sast: gl-sast-report.json
+  rules:
+    - if: $CI_MERGE_REQUEST_ID
+
+# Scheduled full scan (daily)
+semgrep-scheduled:
+  stage: security
+  image: semgrep/semgrep:latest
+  script:
+    - |
+      semgrep --config="p/security-audit" \
+               --config="p/owasp-top-ten" \
+               --config="p/cwe-top-25" \
+               --json \
+               --output=full-scan-results.json
+  artifacts:
+    paths:
+      - full-scan-results.json
+    expire_in: 30 days
+  rules:
+    - if: $CI_PIPELINE_SOURCE == "schedule"
+
+# Custom rules integration
+semgrep-custom:
+  stage: security
+  image: semgrep/semgrep:latest
+  script:
+    - |
+      semgrep --config="p/owasp-top-ten" \
+               --config="custom-rules/security.yaml" \
+               --gitlab-sast \
+               --output=gl-sast-report.json
+  artifacts:
+    reports:
+      sast: gl-sast-report.json
+  rules:
+    - if: $CI_MERGE_REQUEST_ID
+      exists:
+        - custom-rules/security.yaml
--- a/skills/appsec/sast-semgrep/assets/ci_config_examples/jenkins.groovy
+++ b/skills/appsec/sast-semgrep/assets/ci_config_examples/jenkins.groovy
@@ -0,0 +1,190 @@
+// Jenkinsfile - Semgrep Security Scanning
+// Basic pipeline with Semgrep security gate
+
+pipeline {
+    agent any
+
+    environment {
+        SEMGREP_VERSION = '1.50.0'  // Pin to specific version
+    }
+
+    stages {
+        stage('Checkout') {
+            steps {
+                checkout scm
+            }
+        }
+
+        stage('Security Scan') {
+            steps {
+                script {
+                    // Install Semgrep
+                    sh 'pip3 install semgrep==${SEMGREP_VERSION}'
+
+                    // Run Semgrep scan
+                    sh '''
+                        semgrep --config="p/security-audit" \
+                                --config="p/owasp-top-ten" \
+                                --json \
+                                --output=semgrep-results.json \
+                                --severity=ERROR \
+                                --severity=WARNING
+                    '''
+                }
+            }
+        }
+
+        stage('Process Results') {
+            steps {
+                script {
+                    // Parse results
+                    def results = readJSON file: 'semgrep-results.json'
+                    def findings = results.results.size()
+                    def critical = results.results.findAll {
+                        it.extra.severity == 'ERROR'
+                    }.size()
+
+                    echo "Total findings: ${findings}"
+                    echo "Critical findings: ${critical}"
+
+                    // Fail build if critical findings
+                    if (critical > 0) {
+                        error("❌ Critical security vulnerabilities detected!")
+                    }
+                }
+            }
+        }
+    }
+
+    post {
+        always {
+            // Archive scan results
+            archiveArtifacts artifacts: 'semgrep-results.json',
+                           fingerprint: true
+
+            // Publish results (if using warnings-ng plugin)
+            // recordIssues(
+            //     tools: [semgrep(pattern: 'semgrep-results.json')],
+            //     qualityGates: [[threshold: 1, type: 'TOTAL', unstable: false]]
+            // )
+        }
+        failure {
+            echo '❌ Security scan failed - review findings'
+        }
+        success {
+            echo '✅ No critical security issues detected'
+        }
+    }
+}
+
+// Advanced: Differential scanning for PRs
+pipeline {
+    agent any
+
+    environment {
+        TARGET_BRANCH = env.CHANGE_TARGET ?: 'main'
+    }
+
+    stages {
+        stage('Checkout') {
+            steps {
+                checkout scm
+
+                script {
+                    // Fetch target branch for comparison
+                    sh """
+                        git fetch origin ${TARGET_BRANCH}:${TARGET_BRANCH}
+                    """
+                }
+            }
+        }
+
+        stage('Differential Scan') {
+            when {
+                changeRequest()  // Only for pull requests
+            }
+            steps {
+                sh """
+                    pip3 install semgrep
+
+                    semgrep --config="p/security-audit" \
+                            --baseline-commit="${TARGET_BRANCH}" \
+                            --json \
+                            --output=semgrep-diff.json
+                """
+
+                script {
+                    def results = readJSON file: 'semgrep-diff.json'
+                    def newFindings = results.results.size()
+
+                    if (newFindings > 0) {
+                        echo "❌ ${newFindings} new security issues introduced"
+                        error("Fix security issues before merging")
+                    } else {
+                        echo "✅ No new security issues"
+                    }
+                }
+            }
+        }
+
+        stage('Full Scan') {
+            when {
+                branch 'main'  // Full scan on main branch
+            }
+            steps {
+                sh """
+                    semgrep --config="p/security-audit" \
+                            --config="p/owasp-top-ten" \
+                            --config="p/cwe-top-25" \
+                            --json \
+                            --output=semgrep-full.json
+                """
+            }
+        }
+    }
+
+    post {
+        always {
+            archiveArtifacts artifacts: 'semgrep-*.json',
+                           allowEmptyArchive: true
+        }
+    }
+}
+
+// With custom rules
+pipeline {
+    agent any
+
+    stages {
+        stage('Security Scan with Custom Rules') {
+            steps {
+                sh """
+                    pip3 install semgrep
+
+                    # Run with both official and custom rules
+                    semgrep --config="p/owasp-top-ten" \
+                            --config="custom-rules/" \
+                            --json \
+                            --output=results.json
+                """
+
+                script {
+                    // Generate HTML report (requires additional tooling)
+                    sh """
+                        python3 -c "
+import json
+with open('semgrep-results.json') as f:
+    results = json.load(f)
+    findings = results['results']
+    print(f'Security Scan Complete:')
+    print(f'  Total Findings: {len(findings)}')
+    for severity in ['ERROR', 'WARNING', 'INFO']:
+        count = len([f for f in findings if f.get('extra', {}).get('severity') == severity])
+        print(f'  {severity}: {count}')
+"
+                    """
+                }
+            }
+        }
+    }
+}
--- a/skills/appsec/sast-semgrep/assets/rule_template.yaml
+++ b/skills/appsec/sast-semgrep/assets/rule_template.yaml
@@ -0,0 +1,120 @@
+rules:
+  - id: custom-rule-template
+    # Pattern matching - choose one or combine multiple
+    pattern: dangerous_function($ARG)
+    # OR use pattern combinations:
+    # patterns:
+    #   - pattern: execute($QUERY)
+    #   - pattern-inside: |
+    #       $QUERY = $USER_INPUT + ...
+    #   - pattern-not: execute("SAFE_QUERY")
+
+    # Message shown when rule matches
+    message: |
+      Potential security vulnerability detected.
+      Explain the risk and provide remediation guidance.
+
+    # Severity level
+    severity: ERROR  # ERROR, WARNING, or INFO
+
+    # Supported languages
+    languages: [python]  # python, javascript, java, go, etc.
+
+    # Metadata for categorization and tracking
+    metadata:
+      category: security
+      technology: [web-app]
+      cwe:
+        - "CWE-XXX: Vulnerability Name"
+      owasp:
+        - "AXX:2021-Category Name"
+      confidence: HIGH  # HIGH, MEDIUM, LOW
+      likelihood: MEDIUM  # How likely is exploitation
+      impact: HIGH  # Potential security impact
+      references:
+        - https://owasp.org/...
+        - https://cwe.mitre.org/data/definitions/XXX.html
+      subcategory:
+        - vuln-type  # e.g., sqli, xss, command-injection
+
+    # Optional: Autofix suggestion
+    # fix: |
+    #   safe_function($ARG)
+
+    # Optional: Path filtering
+    # paths:
+    #   include:
+    #     - "src/"
+    #   exclude:
+    #     - "*/tests/*"
+    #     - "*/test_*.py"
+
+# Example: SQL Injection Detection
+  - id: example-sql-injection
+    patterns:
+      - pattern-either:
+          - pattern: cursor.execute(f"... {$VAR} ...")
+          - pattern: cursor.execute("..." + $VAR + "...")
+      - pattern-not: cursor.execute("...", ...)
+    message: |
+      SQL injection vulnerability detected. User input is concatenated into SQL query.
+
+      Remediation:
+      - Use parameterized queries: cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))
+      - Use ORM methods that automatically parameterize queries
+    severity: ERROR
+    languages: [python]
+    metadata:
+      category: security
+      cwe: ["CWE-89: SQL Injection"]
+      owasp: ["A03:2021-Injection"]
+      confidence: HIGH
+      likelihood: HIGH
+      impact: HIGH
+      references:
+        - https://owasp.org/Top10/A03_2021-Injection/
+
+# Example: Hard-coded Secret Detection
+  - id: example-hardcoded-secret
+    pattern-regex: |
+      (password|passwd|pwd|secret|token|api[_-]?key)\s*=\s*['"][^'"]{8,}['"]
+    message: |
+      Potential hard-coded secret detected.
+
+      Remediation:
+      - Use environment variables: os.getenv('API_KEY')
+      - Use secrets management: AWS Secrets Manager, HashiCorp Vault
+      - Never commit secrets to version control
+    severity: WARNING
+    languages: [python, javascript, java, go]
+    metadata:
+      category: security
+      cwe: ["CWE-798: Use of Hard-coded Credentials"]
+      owasp: ["A07:2021-Identification-and-Authentication-Failures"]
+      confidence: MEDIUM
+
+# Example: Insecure Deserialization
+  - id: example-unsafe-deserialization
+    patterns:
+      - pattern-either:
+          - pattern: pickle.loads($DATA)
+          - pattern: pickle.load($FILE)
+      - pattern-not-inside: |
+          # Safe pickle usage
+          ...
+    message: |
+      Unsafe deserialization using pickle. Attackers can execute arbitrary code.
+
+      Remediation:
+      - Use JSON for serialization: json.loads(data)
+      - If pickle is required, validate and sanitize data source
+      - Never deserialize data from untrusted sources
+    severity: ERROR
+    languages: [python]
+    metadata:
+      category: security
+      cwe: ["CWE-502: Deserialization of Untrusted Data"]
+      owasp: ["A08:2021-Software-and-Data-Integrity-Failures"]
+      confidence: HIGH
+      likelihood: HIGH
+      impact: CRITICAL
--- a/skills/appsec/sast-semgrep/assets/semgrep_config.yaml
+++ b/skills/appsec/sast-semgrep/assets/semgrep_config.yaml
@@ -0,0 +1,80 @@
+# Recommended Semgrep Configuration
+# Save as .semgrepconfig or semgrep.yml in your project root
+
+# Rules to run
+rules: p/security-audit
+
+# Alternative: Specify multiple rulesets
+# rules:
+#   - p/owasp-top-ten
+#   - p/cwe-top-25
+#   - path/to/custom-rules.yaml
+
+# Paths to exclude from scanning
+exclude:
+  - "*/node_modules/*"
+  - "*/vendor/*"
+  - "*/.venv/*"
+  - "*/venv/*"
+  - "*/dist/*"
+  - "*/build/*"
+  - "*/.git/*"
+  - "*/tests/*"
+  - "*/test/*"
+  - "*_test.go"
+  - "test_*.py"
+  - "*.test.js"
+  - "*.spec.js"
+  - "*.min.js"
+  - "*.bundle.js"
+
+# Paths to include (optional - scans all by default)
+# include:
+#   - "src/"
+#   - "app/"
+#   - "lib/"
+
+# Maximum file size to scan (in bytes)
+max_target_bytes: 1000000  # 1MB
+
+# Timeout for each file (in seconds)
+timeout: 30
+
+# Number of jobs for parallel scanning
+# jobs: 4
+
+# Metrics and telemetry (disable for privacy)
+metrics: off
+
+# Autofix mode (use with caution)
+# autofix: false
+
+# Output format
+# Can be: text, json, sarif, gitlab-sast, junit-xml, emacs, vim
+# Set via CLI: semgrep --config=<this-file> --json
+# output_format: text
+
+# Severity thresholds
+# Only report findings at or above this severity
+# Can be: ERROR, WARNING, INFO
+# min_severity: WARNING
+
+# Scan statistics
+# Show timing and performance stats
+# time: false
+# Show stats after scanning
+# verbose: false
+
+# CI/CD specific settings
+# These are typically set via CLI or CI environment
+
+# Fail on findings
+# Set exit code 1 if findings are detected
+# error: true
+
+# Baseline commit for diff scanning
+# baseline_commit: origin/main
+
+# SARIF output settings (for GitHub Security, etc.)
+# sarif:
+#   output: semgrep-results.sarif
--- a/skills/appsec/sast-semgrep/references/owasp_cwe_mapping.md
+++ b/skills/appsec/sast-semgrep/references/owasp_cwe_mapping.md
@@ -0,0 +1,300 @@
+# OWASP Top 10 to CWE Mapping with Semgrep Rules
+
+## Table of Contents
+
+- [A01:2021 - Broken Access Control](#a012021---broken-access-control)
+- [A02:2021 - Cryptographic Failures](#a022021---cryptographic-failures)
+- [A03:2021 - Injection](#a032021---injection)
+- [A04:2021 - Insecure Design](#a042021---insecure-design)
+- [A05:2021 - Security Misconfiguration](#a052021---security-misconfiguration)
+- [A06:2021 - Vulnerable and Outdated Components](#a062021---vulnerable-and-outdated-components)
+- [A07:2021 - Identification and Authentication Failures](#a072021---identification-and-authentication-failures)
+- [A08:2021 - Software and Data Integrity Failures](#a082021---software-and-data-integrity-failures)
+- [A09:2021 - Security Logging and Monitoring Failures](#a092021---security-logging-and-monitoring-failures)
+- [A10:2021 - Server-Side Request Forgery (SSRF)](#a102021---server-side-request-forgery-ssrf)
+
+## A01:2021 - Broken Access Control
+
+### CWE Mappings
+- CWE-22: Path Traversal
+- CWE-23: Relative Path Traversal
+- CWE-35: Path Traversal
+- CWE-352: Cross-Site Request Forgery (CSRF)
+- CWE-434: Unrestricted Upload of Dangerous File Type
+- CWE-639: Authorization Bypass Through User-Controlled Key
+- CWE-918: Server-Side Request Forgery (SSRF)
+
+### Semgrep Rules
+```bash
+# Path traversal detection
+semgrep --config "r/python.lang.security.audit.path-traversal"
+
+# Missing authorization checks
+semgrep --config "r/generic.secrets.security.detected-generic-secret"
+
+# CSRF protection
+semgrep --config "r/javascript.express.security.audit.express-check-csurf-middleware-usage"
+```
+
+### Detection Patterns
+- Unrestricted file access using user input
+- Missing or improper authorization checks
+- Insecure direct object references (IDOR)
+- Elevation of privilege vulnerabilities
+
+## A02:2021 - Cryptographic Failures
+
+### CWE Mappings
+- CWE-259: Use of Hard-coded Password
+- CWE-326: Inadequate Encryption Strength
+- CWE-327: Use of Broken/Risky Crypto Algorithm
+- CWE-328: Reversible One-Way Hash
+- CWE-330: Use of Insufficiently Random Values
+- CWE-780: Use of RSA Without OAEP
+
+### Semgrep Rules
+```bash
+# Weak crypto algorithms
+semgrep --config "p/crypto"
+
+# Hard-coded secrets
+semgrep --config "p/secrets"
+
+# Insecure random
+semgrep --config "r/python.lang.security.audit.insecure-random"
+```
+
+### Detection Patterns
+- Use of MD5, SHA1 for cryptographic purposes
+- Hard-coded passwords, API keys, tokens
+- Weak encryption algorithms (DES, RC4)
+- Insecure random number generation
+
+## A03:2021 - Injection
+
+### CWE Mappings
+- CWE-79: Cross-site Scripting (XSS)
+- CWE-89: SQL Injection
+- CWE-95: Improper Neutralization of Directives in Dynamically Evaluated Code (eval injection)
+- CWE-917: Expression Language Injection
+- CWE-943: Improper Neutralization of Special Elements in Data Query Logic
+
+### Semgrep Rules
+```bash
+# SQL Injection
+semgrep --config "r/python.django.security.injection.sql"
+semgrep --config "r/javascript.sequelize.security.audit.sequelize-injection"
+
+# XSS
+semgrep --config "r/javascript.express.security.audit.xss"
+semgrep --config "r/python.flask.security.audit.template-xss"
+
+# Command Injection
+semgrep --config "r/python.lang.security.audit.dangerous-subprocess-use"
+
+# Code Injection
+semgrep --config "r/python.lang.security.audit.exec-used"
+semgrep --config "r/javascript.lang.security.audit.eval-detected"
+```
+
+### Detection Patterns
+- Unsafe SQL query construction
+- Unescaped user input in HTML context
+- OS command execution with user input
+- Use of eval() or similar dynamic code execution
+
+## A04:2021 - Insecure Design
+
+### CWE Mappings
+- CWE-209: Generation of Error Message with Sensitive Information
+- CWE-256: Unprotected Storage of Credentials
+- CWE-501: Trust Boundary Violation
+- CWE-522: Insufficiently Protected Credentials
+
+### Semgrep Rules
+```bash
+# Information disclosure
+semgrep --config "r/python.flask.security.audit.debug-enabled"
+
+# Missing security controls
+semgrep --config "p/security-audit"
+```
+
+### Detection Patterns
+- Debug mode enabled in production
+- Verbose error messages exposing internals
+- Missing rate limiting
+- Insecure default configurations
+
+## A05:2021 - Security Misconfiguration
+
+### CWE Mappings
+- CWE-16: Configuration
+- CWE-611: Improper Restriction of XML External Entity Reference
+- CWE-614: Sensitive Cookie in HTTPS Session Without 'Secure' Attribute
+- CWE-756: Missing Custom Error Page
+- CWE-776: Improper Restriction of Recursive Entity References in DTDs
+
+### Semgrep Rules
+```bash
+# XXE vulnerabilities
+semgrep --config "r/python.lang.security.audit.avoid-lxml-in-xml-parsing"
+
+# Insecure cookie settings
+semgrep --config "r/javascript.express.security.audit.express-cookie-settings"
+
+# CORS misconfiguration
+semgrep --config "r/javascript.express.security.audit.express-cors-misconfiguration"
+```
+
+### Detection Patterns
+- XML External Entity (XXE) vulnerabilities
+- Insecure cookie flags (missing Secure, HttpOnly, SameSite)
+- Open CORS policies
+- Unnecessary features enabled
+
+## A06:2021 - Vulnerable and Outdated Components
+
+### CWE Mappings
+- CWE-1035: Using Components with Known Vulnerabilities
+- CWE-1104: Use of Unmaintained Third Party Components
+
+### Semgrep Rules
+```bash
+# Known vulnerable dependencies
+semgrep --config "p/supply-chain"
+
+# Deprecated APIs
+semgrep --config "p/owasp-top-ten"
+```
+
+### Detection Patterns
+- Outdated library versions
+- Dependencies with known CVEs
+- Use of deprecated/unmaintained packages
+- Insecure package imports
+
+## A07:2021 - Identification and Authentication Failures
+
+### CWE Mappings
+- CWE-287: Improper Authentication
+- CWE-288: Authentication Bypass Using Alternate Path/Channel
+- CWE-306: Missing Authentication for Critical Function
+- CWE-307: Improper Restriction of Excessive Authentication Attempts
+- CWE-521: Weak Password Requirements
+- CWE-798: Use of Hard-coded Credentials
+- CWE-916: Use of Password Hash With Insufficient Computational Effort
+
+### Semgrep Rules
+```bash
+# Weak password hashing
+semgrep --config "r/python.lang.security.audit.hashlib-md5-used"
+
+# Missing authentication
+semgrep --config "p/jwt"
+
+# Session management
+semgrep --config "r/javascript.express.security.audit.express-session-misconfiguration"
+```
+
+### Detection Patterns
+- Weak password hashing (MD5, SHA1 without salt)
+- Missing multi-factor authentication
+- Predictable session identifiers
+- Credential stuffing vulnerabilities
+
+## A08:2021 - Software and Data Integrity Failures
+
+### CWE Mappings
+- CWE-345: Insufficient Verification of Data Authenticity
+- CWE-502: Deserialization of Untrusted Data
+- CWE-829: Inclusion of Functionality from Untrusted Control Sphere
+- CWE-915: Improperly Controlled Modification of Dynamically-Determined Object Attributes
+
+### Semgrep Rules
+```bash
+# Unsafe deserialization
+semgrep --config "r/python.lang.security.audit.unsafe-pickle"
+semgrep --config "r/javascript.lang.security.audit.unsafe-deserialization"
+
+# Prototype pollution
+semgrep --config "r/javascript.lang.security.audit.prototype-pollution"
+```
+
+### Detection Patterns
+- Unsafe deserialization (pickle, YAML, JSON)
+- Missing integrity checks on updates
+- Prototype pollution in JavaScript
+- Unsafe code loading from external sources
+
+## A09:2021 - Security Logging and Monitoring Failures
+
+### CWE Mappings
+- CWE-117: Improper Output Neutralization for Logs
+- CWE-223: Omission of Security-relevant Information
+- CWE-532: Information Exposure Through Log Files
+- CWE-778: Insufficient Logging
+
+### Semgrep Rules
+```bash
+# Log injection
+semgrep --config "r/python.lang.security.audit.logging-unsanitized-input"
+
+# Sensitive data in logs
+semgrep --config "p/secrets"
+```
+
+### Detection Patterns
+- Log injection vulnerabilities
+- Sensitive data logged (passwords, tokens)
+- Missing security event logging
+- Insufficient audit trails
+
+## A10:2021 - Server-Side Request Forgery (SSRF)
+
+### CWE Mappings
+- CWE-918: Server-Side Request Forgery (SSRF)
+
+### Semgrep Rules
+```bash
+# SSRF detection
+semgrep --config "r/python.requests.security.audit.requests-http-request"
+semgrep --config "r/javascript.lang.security.audit.detect-unsafe-url"
+```
+
+### Detection Patterns
+- Unvalidated URL fetching
+- Internal network access via user input
+- Missing URL validation
+- Bypassing access controls via SSRF
+
+## Using This Mapping
+
+### Scan for Specific OWASP Category
+
+```bash
+# Example: Scan for Injection vulnerabilities (A03)
+semgrep --config "r/python.django.security.injection.sql" \
+        --config "r/python.lang.security.audit.exec-used" \
+        /path/to/code
+```
+
+### Comprehensive OWASP Top 10 Scan
+
+```bash
+semgrep --config="p/owasp-top-ten" /path/to/code
+```
+
+### Filter by CWE
+
+```bash
+# Scan and filter results by CWE
+semgrep --config="p/security-audit" --json /path/to/code | \
+  jq '.results[] | select(.extra.metadata.cwe == "CWE-89")'
+```
+
+## References
+
+- [OWASP Top 10 2021](https://owasp.org/Top10/)
+- [CWE/SANS Top 25](https://cwe.mitre.org/top25/)
+- [Semgrep Rule Registry](https://semgrep.dev/explore)
--- a/skills/appsec/sast-semgrep/references/remediation_guide.md
+++ b/skills/appsec/sast-semgrep/references/remediation_guide.md
@@ -0,0 +1,471 @@
+# Vulnerability Remediation Guide
+
+Security remediation patterns organized by vulnerability category.
+
+## Table of Contents
+
+- [SQL Injection](#sql-injection)
+- [Cross-Site Scripting (XSS)](#cross-site-scripting-xss)
+- [Command Injection](#command-injection)
+- [Path Traversal](#path-traversal)
+- [Insecure Deserialization](#insecure-deserialization)
+- [Weak Cryptography](#weak-cryptography)
+- [Authentication & Session Management](#authentication--session-management)
+- [CSRF](#csrf)
+- [SSRF](#ssrf)
+- [XXE](#xxe)
+
+## SQL Injection
+
+### Vulnerability Pattern
+```python
+# VULNERABLE
+query = f"SELECT * FROM users WHERE id = {user_id}"
+cursor.execute(query)
+```
+
+### Secure Remediation
+```python
+# SECURE: Use parameterized queries
+query = "SELECT * FROM users WHERE id = %s"
+cursor.execute(query, (user_id,))
+
+# Or use ORM
+user = User.objects.get(id=user_id)
+```
+
+### Framework-Specific Solutions
+
+**Django:**
+```python
+# Use Django ORM (safe by default)
+User.objects.filter(email=user_email)
+
+# For raw SQL, use parameterized queries
+User.objects.raw('SELECT * FROM myapp_user WHERE email = %s', [user_email])
+```
+
+**Node.js (Sequelize):**
+```javascript
+// Use parameterized queries
+User.findAll({
+  where: { email: userEmail }
+});
+
+// Or use replacements
+sequelize.query(
+  'SELECT * FROM users WHERE email = :email',
+  { replacements: { email: userEmail } }
+);
+```
+
+**Java (JDBC):**
+```java
+// Use PreparedStatement
+String query = "SELECT * FROM users WHERE id = ?";
+PreparedStatement stmt = conn.prepareStatement(query);
+stmt.setInt(1, userId);
+ResultSet rs = stmt.executeQuery();
+```
+
+## Cross-Site Scripting (XSS)
+
+### Vulnerability Pattern
+```javascript
+// VULNERABLE
+element.innerHTML = userInput;
+document.write(userInput);
+```
+
+### Secure Remediation
+```javascript
+// SECURE: Use textContent for text
+element.textContent = userInput;
+
+// Or properly escape HTML
+element.innerHTML = escapeHtml(userInput);
+
+function escapeHtml(unsafe) {
+  return unsafe
+    .replace(/&/g, "&amp;")
+    .replace(/</g, "&lt;")
+    .replace(/>/g, "&gt;")
+    .replace(/"/g, "&quot;")
+    .replace(/'/g, "&#039;");
+}
+```
+
+### Framework-Specific Solutions
+
+**React:**
+```javascript
+// React auto-escapes by default
+<div>{userInput}</div>
+
+// For HTML content, sanitize first
+import DOMPurify from 'dompurify';
+<div dangerouslySetInnerHTML={{__html: DOMPurify.sanitize(userInput)}} />
+```
+
+**Flask/Jinja2:**
+```python
+# Templates auto-escape by default
+{{ user_input }}
+
+# For HTML content, sanitize
+from markupsafe import Markup
+import bleach
+{{ Markup(bleach.clean(user_input)) }}
+```
+
+**Django:**
+```django
+{# Auto-escaped by default #}
+{{ user_input }}
+
+{# Mark as safe only after sanitization #}
+{{ user_input|safe }}
+```
+
+## Command Injection
+
+### Vulnerability Pattern
+```python
+# VULNERABLE
+os.system(f"ping {user_host}")
+subprocess.call(f"ls {user_directory}", shell=True)
+```
+
+### Secure Remediation
+```python
+# SECURE: Use subprocess with list arguments
+import subprocess
+subprocess.run(['ping', '-c', '1', user_host],
+               capture_output=True, check=True)
+
+# Validate input against allowlist
+import shlex
+if not re.match(r'^[a-zA-Z0-9.-]+$', user_host):
+    raise ValueError("Invalid hostname")
+subprocess.run(['ping', '-c', '1', user_host])
+```
+
+**Node.js:**
+```javascript
+// VULNERABLE
+exec(`ls ${userDir}`);
+
+// SECURE
+const { execFile } = require('child_process');
+execFile('ls', [userDir], (error, stdout) => {
+  // Handle output
+});
+```
+
+## Path Traversal
+
+### Vulnerability Pattern
+```python
+# VULNERABLE
+file_path = os.path.join('/uploads', user_filename)
+with open(file_path) as f:
+    return f.read()
+```
+
+### Secure Remediation
+```python
+# SECURE: Validate and normalize path
+import os
+from pathlib import Path
+
+def safe_join(directory, user_path):
+    # Normalize and resolve path
+    base_dir = Path(directory).resolve()
+    file_path = (base_dir / user_path).resolve()
+
+    # Ensure it's within base directory
+    if not str(file_path).startswith(str(base_dir)):
+        raise ValueError("Path traversal detected")
+
+    return file_path
+
+try:
+    safe_path = safe_join('/uploads', user_filename)
+    with open(safe_path) as f:
+        return f.read()
+except ValueError:
+    return "Invalid filename"
+```
+
+## Insecure Deserialization
+
+### Vulnerability Pattern
+```python
+# VULNERABLE
+import pickle
+data = pickle.loads(user_data)
+```
+
+### Secure Remediation
+```python
+# SECURE: Use safe formats like JSON
+import json
+data = json.loads(user_data)
+
+# If you must deserialize, validate and restrict
+import yaml
+data = yaml.safe_load(user_data)  # Use safe_load, not load
+```
+
+**Node.js:**
+```javascript
+// VULNERABLE
+const data = eval(userInput);
+const obj = Function(userInput)();
+
+// SECURE
+const data = JSON.parse(userInput);
+
+// For complex objects, use schema validation
+const Joi = require('joi');
+const schema = Joi.object({
+  name: Joi.string().required(),
+  email: Joi.string().email().required()
+});
+const { value, error } = schema.validate(JSON.parse(userInput));
+```
+
+## Weak Cryptography
+
+### Vulnerability Pattern
+```python
+# VULNERABLE
+import hashlib
+password_hash = hashlib.md5(password.encode()).hexdigest()
+```
+
+### Secure Remediation
+```python
+# SECURE: Use bcrypt or argon2
+import bcrypt
+
+# Hashing
+password_hash = bcrypt.hashpw(password.encode(), bcrypt.gensalt())
+
+# Verification
+if bcrypt.checkpw(password.encode(), stored_hash):
+    print("Password correct")
+
+# Or use argon2
+from argon2 import PasswordHasher
+ph = PasswordHasher()
+hash = ph.hash(password)
+ph.verify(hash, password)
+```
+
+**Encryption:**
+```python
+# VULNERABLE
+from Crypto.Cipher import DES
+cipher = DES.new(key, DES.MODE_ECB)
+
+# SECURE: Use AES-GCM
+from cryptography.hazmat.primitives.ciphers.aead import AESGCM
+import os
+
+key = AESGCM.generate_key(bit_length=256)
+aesgcm = AESGCM(key)
+nonce = os.urandom(12)
+ciphertext = aesgcm.encrypt(nonce, plaintext, associated_data)
+```
+
+## Authentication & Session Management
+
+### Vulnerability Pattern
+```javascript
+// VULNERABLE
+app.use(session({
+  secret: 'weak-secret',
+  cookie: { secure: false }
+}));
+```
+
+### Secure Remediation
+```javascript
+// SECURE
+const session = require('express-session');
+app.use(session({
+  secret: process.env.SESSION_SECRET, // Strong random secret
+  resave: false,
+  saveUninitialized: false,
+  cookie: {
+    secure: true,      // HTTPS only
+    httpOnly: true,    // No JavaScript access
+    sameSite: 'strict', // CSRF protection
+    maxAge: 3600000    // 1 hour
+  }
+}));
+```
+
+**Password Requirements:**
+```python
+# Implement strong password policy
+import re
+
+def validate_password(password):
+    if len(password) < 12:
+        return False
+    if not re.search(r'[A-Z]', password):
+        return False
+    if not re.search(r'[a-z]', password):
+        return False
+    if not re.search(r'[0-9]', password):
+        return False
+    if not re.search(r'[!@#$%^&*(),.?":{}|<>]', password):
+        return False
+    return True
+```
+
+## CSRF
+
+### Vulnerability Pattern
+```python
+# VULNERABLE: No CSRF protection
+@app.route('/transfer', methods=['POST'])
+def transfer():
+    amount = request.form['amount']
+    to_account = request.form['to']
+    # Process transfer
+```
+
+### Secure Remediation
+```python
+# SECURE: Use CSRF tokens
+from flask_wtf.csrf import CSRFProtect
+csrf = CSRFProtect(app)
+
+@app.route('/transfer', methods=['POST'])
+@csrf.exempt  # Only if using custom CSRF
+def transfer():
+    # CSRF token automatically validated
+    amount = request.form['amount']
+    to_account = request.form['to']
+```
+
+**Express.js:**
+```javascript
+const csrf = require('csurf');
+const csrfProtection = csrf({ cookie: true });
+
+app.post('/transfer', csrfProtection, (req, res) => {
+  // CSRF token validated
+  const { amount, to } = req.body;
+});
+```
+
+## SSRF
+
+### Vulnerability Pattern
+```python
+# VULNERABLE
+import requests
+url = request.args.get('url')
+response = requests.get(url)
+```
+
+### Secure Remediation
+```python
+# SECURE: Validate URLs and use allowlist
+import requests
+from urllib.parse import urlparse
+
+ALLOWED_DOMAINS = ['api.example.com', 'cdn.example.com']
+
+def safe_fetch(url):
+    parsed = urlparse(url)
+
+    # Check protocol
+    if parsed.scheme not in ['http', 'https']:
+        raise ValueError("Invalid protocol")
+
+    # Check domain against allowlist
+    if parsed.netloc not in ALLOWED_DOMAINS:
+        raise ValueError("Domain not allowed")
+
+    # Block internal IPs
+    import ipaddress
+    try:
+        ip = ipaddress.ip_address(parsed.hostname)
+        if ip.is_private:
+            raise ValueError("Private IP not allowed")
+    except ValueError:
+        pass  # Not an IP, continue
+
+    return requests.get(url, timeout=5)
+```
+
+## XXE
+
+### Vulnerability Pattern
+```python
+# VULNERABLE
+from lxml import etree
+tree = etree.parse(user_xml)
+```
+
+### Secure Remediation
+```python
+# SECURE: Disable external entities
+from lxml import etree
+
+parser = etree.XMLParser(
+    resolve_entities=False,
+    no_network=True,
+    dtd_validation=False
+)
+tree = etree.parse(user_xml, parser)
+
+# Or use defusedxml
+from defusedxml import ElementTree
+tree = ElementTree.parse(user_xml)
+```
+
+**Node.js:**
+```javascript
+// Use secure XML parser
+const libxmljs = require('libxmljs');
+const xml = libxmljs.parseXml(userXml, {
+  noent: false,  // Disable entity expansion
+  dtdload: false,
+  dtdvalid: false
+});
+```
+
+## General Security Principles
+
+1. **Input Validation**: Validate all user input against expected format
+2. **Output Encoding**: Encode output based on context (HTML, URL, SQL, etc.)
+3. **Least Privilege**: Grant minimum necessary permissions
+4. **Defense in Depth**: Use multiple layers of security controls
+5. **Fail Securely**: Ensure failures don't expose sensitive data
+6. **Secure Defaults**: Use secure configuration by default
+7. **Keep Dependencies Updated**: Regularly update libraries and frameworks
+
+## Testing Remediation
+
+After applying fixes:
+
+1. **Verify with Semgrep**: Re-scan to ensure vulnerability is resolved
+   ```bash
+   semgrep --config <ruleset> fixed_file.py
+   ```
+
+2. **Manual Testing**: Attempt to exploit the vulnerability
+3. **Code Review**: Have peer review the fix
+4. **Integration Tests**: Add tests to prevent regression
+
+## References
+
+- [OWASP Cheat Sheet Series](https://cheatsheetseries.owasp.org/)
+- [CWE Mitigations](https://cwe.mitre.org/)
+- [Semgrep Autofix](https://semgrep.dev/docs/writing-rules/autofix/)
--- a/skills/appsec/sast-semgrep/references/rule_library.md
+++ b/skills/appsec/sast-semgrep/references/rule_library.md
@@ -0,0 +1,425 @@
+# Semgrep Rule Library
+
+Curated collection of useful Semgrep rulesets and custom rule writing guidance.
+
+## Table of Contents
+
+- [Official Rulesets](#official-rulesets)
+- [Language-Specific Rules](#language-specific-rules)
+- [Framework-Specific Rules](#framework-specific-rules)
+- [Custom Rule Writing](#custom-rule-writing)
+- [Rule Testing](#rule-testing)
+
+## Official Rulesets
+
+### Comprehensive Rulesets
+
+| Ruleset | Config | Description | Use Case |
+|---------|--------|-------------|----------|
+| Auto | `auto` | Automatically selected rules based on detected languages | Quick scans, baseline |
+| Security Audit | `p/security-audit` | Comprehensive security rules across languages | Deep security review |
+| OWASP Top 10 | `p/owasp-top-ten` | OWASP Top 10 2021 coverage | Compliance, security gates |
+| CWE Top 25 | `p/cwe-top-25` | SANS/CWE Top 25 dangerous errors | Critical vulnerability detection |
+| CI | `p/ci` | Fast, low false-positive rules for CI/CD | Pull request gates |
+| Default | `p/default` | Balanced security and quality rules | General purpose scanning |
+
+### Specialized Rulesets
+
+| Ruleset | Config | Focus Area |
+|---------|--------|------------|
+| Secrets | `p/secrets` | Hard-coded credentials, API keys |
+| Cryptography | `p/crypto` | Weak crypto, hashing issues |
+| Supply Chain | `p/supply-chain` | Dependency vulnerabilities |
+| JWT | `p/jwt` | JSON Web Token security |
+| SQL Injection | `p/sql-injection` | SQL injection patterns |
+| XSS | `p/xss` | Cross-site scripting |
+| Command Injection | `p/command-injection` | OS command injection |
+
+## Language-Specific Rules
+
+### Python
+
+```bash
+# Django security
+semgrep --config "p/django"
+
+# Flask security
+semgrep --config "r/python.flask.security"
+
+# General Python security
+semgrep --config "r/python.lang.security"
+
+# Specific vulnerabilities
+semgrep --config "r/python.lang.security.audit.exec-used"
+semgrep --config "r/python.lang.security.audit.unsafe-pickle"
+semgrep --config "r/python.lang.security.audit.dangerous-subprocess-use"
+```
+
+**Key Python Rules:**
+- `python.django.security.injection.sql.sql-injection-db-cursor-execute`
+- `python.flask.security.xss.audit.template-xss`
+- `python.lang.security.audit.exec-used`
+- `python.lang.security.audit.dangerous-os-module-methods`
+- `python.lang.security.audit.hashlib-md5-used`
+
+### JavaScript/TypeScript
+
+```bash
+# Express.js security
+semgrep --config "p/express"
+
+# React security
+semgrep --config "p/react"
+
+# Node.js security
+semgrep --config "r/javascript.lang.security"
+
+# Specific vulnerabilities
+semgrep --config "r/javascript.lang.security.audit.eval-detected"
+semgrep --config "r/javascript.lang.security.audit.unsafe-exec"
+```
+
+**Key JavaScript Rules:**
+- `javascript.express.security.audit.xss.mustache.var-in-href`
+- `javascript.lang.security.audit.eval-detected`
+- `javascript.lang.security.audit.path-traversal`
+- `javascript.sequelize.security.audit.sequelize-injection-express`
+
+### Java
+
+```bash
+# Spring security
+semgrep --config "p/spring"
+
+# General Java security
+semgrep --config "r/java.lang.security"
+
+# Specific frameworks
+semgrep --config "r/java.spring.security"
+```
+
+**Key Java Rules:**
+- `java.lang.security.audit.sqli.jdbc-sqli`
+- `java.lang.security.audit.xxe.xmlinputfactory-xxe`
+- `java.spring.security.audit.spring-cookie-missing-httponly`
+
+### Go
+
+```bash
+# Go security rules
+semgrep --config "r/go.lang.security"
+
+# Specific vulnerabilities
+semgrep --config "r/go.lang.security.audit.net.use-of-tls-with-go-sql-driver"
+semgrep --config "r/go.lang.security.audit.crypto.use_of_weak_crypto"
+```
+
+### PHP
+
+```bash
+# PHP security
+semgrep --config "p/php"
+
+# Laravel security
+semgrep --config "r/php.laravel.security"
+
+# Specific vulnerabilities
+semgrep --config "r/php.lang.security.audit.sqli"
+semgrep --config "r/php.lang.security.audit.dangerous-exec"
+```
+
+## Framework-Specific Rules
+
+### Web Frameworks
+
+**Django:**
+```bash
+semgrep --config "p/django"
+# Covers: SQL injection, XSS, CSRF, auth issues
+```
+
+**Flask:**
+```bash
+semgrep --config "r/python.flask.security"
+# Covers: XSS, debug mode, secure cookies
+```
+
+**Express.js:**
+```bash
+semgrep --config "p/express"
+# Covers: XSS, CSRF, session config, CORS
+```
+
+**Spring Boot:**
+```bash
+semgrep --config "p/spring"
+# Covers: SQL injection, XXE, auth, SSRF
+```
+
+### Cloud & Infrastructure
+
+**Terraform:**
+```bash
+semgrep --config "r/terraform.lang.security"
+# Covers: S3 buckets, security groups, encryption
+```
+
+**Kubernetes:**
+```bash
+semgrep --config "r/yaml.kubernetes.security"
+# Covers: privileged containers, secrets, rbac
+```
+
+**Docker:**
+```bash
+semgrep --config "r/dockerfile.security"
+# Covers: unsafe base images, secrets, root user
+```
+
+## Custom Rule Writing
+
+### Rule Anatomy
+
+```yaml
+rules:
+  - id: custom-rule-id
+    pattern: execute($SQL)
+    message: Potential security issue detected
+    severity: WARNING
+    languages: [python]
+    metadata:
+      category: security
+      cwe: "CWE-89"
+      owasp: "A03:2021-Injection"
+      confidence: HIGH
+```
+
+### Pattern Types
+
+**1. Basic Pattern**
+```yaml
+pattern: dangerous_function($ARG)
+```
+
+**2. Pattern-Inside (Context)**
+```yaml
+patterns:
+  - pattern: execute($QUERY)
+  - pattern-inside: |
+      $QUERY = $USER_INPUT + ...
+```
+
+**3. Pattern-Not (Exclusion)**
+```yaml
+patterns:
+  - pattern: execute($QUERY)
+  - pattern-not: execute("SELECT * FROM safe_table")
+```
+
+**4. Pattern-Either (OR logic)**
+```yaml
+pattern-either:
+  - pattern: eval($ARG)
+  - pattern: exec($ARG)
+```
+
+**5. Metavariable Comparison**
+```yaml
+patterns:
+  - pattern: crypto.encrypt($DATA, $KEY)
+  - metavariable-comparison:
+      metavariable: $KEY
+      comparison: len($KEY) < 16
+```
+
+### Example Custom Rules
+
+**Detect Hard-coded AWS Keys:**
+```yaml
+rules:
+  - id: hardcoded-aws-key
+    patterns:
+      - pattern-regex: 'AKIA[0-9A-Z]{16}'
+    message: Hard-coded AWS access key detected
+    severity: ERROR
+    languages: [python, javascript, java, go]
+    metadata:
+      category: security
+      cwe: "CWE-798"
+      confidence: HIGH
+```
+
+**Detect Unsafe File Operations:**
+```yaml
+rules:
+  - id: unsafe-file-read
+    patterns:
+      - pattern: open($PATH, ...)
+      - pattern-inside: |
+          def $FUNC(..., $USER_INPUT, ...):
+            ...
+            $PATH = ... + $USER_INPUT + ...
+            ...
+    message: File path constructed from user input (path traversal risk)
+    severity: WARNING
+    languages: [python]
+    metadata:
+      cwe: "CWE-22"
+      owasp: "A01:2021-Broken-Access-Control"
+```
+
+**Detect Missing CSRF Protection:**
+```yaml
+rules:
+  - id: flask-missing-csrf
+    patterns:
+      - pattern: |
+          @app.route($PATH, methods=[..., "POST", ...])
+          def $FUNC(...):
+            ...
+      - pattern-not-inside: |
+          @csrf.exempt
+          ...
+      - pattern-not-inside: |
+          csrf_token = ...
+          ...
+    message: POST route without CSRF protection
+    severity: ERROR
+    languages: [python]
+    metadata:
+      cwe: "CWE-352"
+      owasp: "A01:2021-Broken-Access-Control"
+```
+
+**Detect Insecure Random:**
+```yaml
+rules:
+  - id: insecure-random-for-crypto
+    patterns:
+      - pattern-either:
+          - pattern: random.random()
+          - pattern: random.randint(...)
+      - pattern-inside: |
+          def ..._token(...):
+            ...
+    message: Using insecure random for security token
+    severity: ERROR
+    languages: [python]
+    metadata:
+      cwe: "CWE-330"
+      fix: "Use secrets module: secrets.token_bytes(32)"
+```
+
+### Rule Metadata Best Practices
+
+Include comprehensive metadata:
+```yaml
+metadata:
+  category: security          # Type of issue
+  cwe: "CWE-XXX"             # CWE mapping
+  owasp: "AXX:2021-Name"     # OWASP category
+  confidence: HIGH|MEDIUM|LOW # Detection confidence
+  likelihood: HIGH|MEDIUM|LOW # Exploitation likelihood
+  impact: HIGH|MEDIUM|LOW     # Security impact
+  subcategory: [vuln-type]   # More specific categorization
+  source-rule: url           # If adapted from elsewhere
+  references:
+    - https://example.com/docs
+```
+
+## Rule Testing
+
+### Test File Structure
+```
+custom-rules/
+├── rules.yaml          # Your custom rules
+└── tests/
+    ├── test-sqli.py   # Test cases
+    └── test-xss.js    # Test cases
+```
+
+### Writing Tests
+
+```python
+# tests/test-sqli.py
+
+# ruleid: custom-sql-injection
+cursor.execute(f"SELECT * FROM users WHERE id = {user_id}")
+
+# ok: custom-sql-injection
+cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))
+```
+
+### Running Tests
+
+```bash
+# Test custom rules
+semgrep --config rules.yaml --test tests/
+
+# Validate rule syntax
+semgrep --validate --config rules.yaml
+```
+
+## Rule Performance Optimization
+
+### 1. Use Specific Patterns
+```yaml
+# SLOW
+pattern: $X
+
+# FAST
+pattern: dangerous_function($X)
+```
+
+### 2. Limit Language Scope
+```yaml
+# Only scan relevant languages
+languages: [python, javascript]
+```
+
+### 3. Use Pattern-Inside Wisely
+```yaml
+# Narrow down context early
+patterns:
+  - pattern-inside: |
+      def handle_request(...):
+        ...
+  - pattern: execute($QUERY)
+```
+
+### 4. Exclude Test Files
+```yaml
+paths:
+  exclude:
+    - "*/test_*.py"
+    - "*/tests/*"
+    - "*_test.go"
+```
+
+## Community Rules
+
+Explore community-contributed rules:
+
+```bash
+# Browse rules by technology
+semgrep --config "r/python.django"
+semgrep --config "r/javascript.react"
+semgrep --config "r/go.gorilla"
+
+# Browse by vulnerability type
+semgrep --config "r/generic.secrets"
+semgrep --config "r/generic.html-templates"
+```
+
+**Useful Community Rulesets:**
+- `r/python.aws-lambda.security` - AWS Lambda security
+- `r/terraform.aws.security` - AWS Terraform
+- `r/dockerfile.best-practice` - Docker best practices
+- `r/yaml.github-actions.security` - GitHub Actions security
+
+## References
+
+- [Semgrep Rule Syntax](https://semgrep.dev/docs/writing-rules/rule-syntax/)
+- [Semgrep Registry](https://semgrep.dev/explore)
+- [Pattern Examples](https://semgrep.dev/docs/writing-rules/pattern-examples/)
+- [Rule Writing Tutorial](https://semgrep.dev/learn)