Initial commit

2025-11-30 08:50:28 +08:00
commit 86a246cf0e
7 changed files with 583 additions and 0 deletions
--- a/skills/haveibeenpwned/SKILL.md
+++ b/skills/haveibeenpwned/SKILL.md
@@ -0,0 +1,479 @@
+---
+name: haveibeenpwned
+description: HaveIBeenPwned API Documentation - Check if email accounts or passwords have been compromised in data breaches
+---
+
+# Have I Been Pwned API Skill
+
+Expert assistance for integrating the Have I Been Pwned (HIBP) API v3 to check for compromised accounts, passwords, and data breaches. This skill provides comprehensive guidance for building security tools, breach notification systems, and password validation features.
+
+## When to Use This Skill
+
+This skill should be triggered when:
+- **Checking if emails/accounts appear in data breaches** - "check if this email was pwned"
+- **Validating password security** - "check if password is in breach database"
+- **Building breach notification systems** - "notify users about compromised accounts"
+- **Implementing password validation** - "prevent users from choosing pwned passwords"
+- **Querying stealer logs** - "check if credentials were stolen by malware"
+- **Integrating HIBP into authentication flows** - "add breach checking to login"
+- **Monitoring domains for compromised emails** - "track breaches affecting our domain"
+- **Working with the HIBP API** - any questions about authentication, rate limits, or endpoints
+
+## Quick Reference
+
+### 1. Basic Account Breach Check
+
+```python
+import requests
+
+def check_account_breaches(email, api_key):
+    """Check if an account appears in any breaches"""
+    headers = {
+        'hibp-api-key': api_key,
+        'user-agent': 'MyApp/1.0'
+    }
+
+    url = f'https://haveibeenpwned.com/api/v3/breachedaccount/{email}'
+    response = requests.get(url, headers=headers)
+
+    if response.status_code == 200:
+        return response.json()  # List of breach objects
+    elif response.status_code == 404:
+        return []  # Account not found in breaches
+    else:
+        response.raise_for_status()
+
+# Usage
+breaches = check_account_breaches('user@example.com', 'your-api-key')
+print(f"Found in {len(breaches)} breaches")
+```
+
+### 2. Password Breach Check (k-Anonymity)
+
+```python
+import hashlib
+import requests
+
+def check_password_pwned(password):
+    """Check if password appears in breaches using k-anonymity"""
+    # Hash password with SHA-1
+    sha1_hash = hashlib.sha1(password.encode('utf-8')).hexdigest().upper()
+    prefix = sha1_hash[:5]
+    suffix = sha1_hash[5:]
+
+    # Query API with first 5 characters only
+    url = f'https://api.pwnedpasswords.com/range/{prefix}'
+    response = requests.get(url)
+
+    # Parse response for matching suffix
+    hashes = (line.split(':') for line in response.text.splitlines())
+    for hash_suffix, count in hashes:
+        if hash_suffix == suffix:
+            return int(count)  # Times password appears in breaches
+    return 0  # Password not found
+
+# Usage
+count = check_password_pwned('password123')
+if count > 0:
+    print(f"⚠️ Password found {count} times in breaches!")
+```
+
+### 3. Get All Breaches in System
+
+```python
+import requests
+
+def get_all_breaches(domain=None):
+    """Retrieve all breaches, optionally filtered by domain"""
+    url = 'https://haveibeenpwned.com/api/v3/breaches'
+    params = {'domain': domain} if domain else {}
+
+    headers = {'user-agent': 'MyApp/1.0'}
+    response = requests.get(url, headers=headers, params=params)
+
+    return response.json()
+
+# Usage - no authentication required
+breaches = get_all_breaches()
+print(f"Total breaches: {len(breaches)}")
+
+# Filter by domain
+adobe_breaches = get_all_breaches(domain='adobe.com')
+```
+
+### 4. Monitor for New Breaches
+
+```python
+import requests
+import time
+
+def monitor_latest_breach(check_interval=3600):
+    """Poll for new breaches every hour"""
+    last_breach_name = None
+
+    while True:
+        url = 'https://haveibeenpwned.com/api/v3/latestbreach'
+        headers = {'user-agent': 'MyApp/1.0'}
+        response = requests.get(url, headers=headers)
+
+        if response.status_code == 200:
+            breach = response.json()
+            if breach['Name'] != last_breach_name:
+                print(f"🆕 New breach: {breach['Title']}")
+                print(f"   Accounts affected: {breach['PwnCount']:,}")
+                last_breach_name = breach['Name']
+
+        time.sleep(check_interval)
+```
+
+### 5. Domain-Wide Breach Search
+
+```python
+import requests
+
+def search_domain_breaches(domain, api_key):
+    """Search for all breached emails in a verified domain"""
+    headers = {
+        'hibp-api-key': api_key,
+        'user-agent': 'MyApp/1.0'
+    }
+
+    url = f'https://haveibeenpwned.com/api/v3/breacheddomain/{domain}'
+    response = requests.get(url, headers=headers)
+
+    if response.status_code == 200:
+        results = response.json()
+        # Returns: {"alias1": ["Adobe"], "alias2": ["Adobe", "Gawker"]}
+        total_affected = len(results)
+        print(f"Found {total_affected} compromised accounts")
+        return results
+    else:
+        response.raise_for_status()
+```
+
+### 6. Check Pastes for Account
+
+```python
+import requests
+
+def check_pastes(email, api_key):
+    """Check if email appears in any pastes"""
+    headers = {
+        'hibp-api-key': api_key,
+        'user-agent': 'MyApp/1.0'
+    }
+
+    url = f'https://haveibeenpwned.com/api/v3/pasteaccount/{email}'
+    response = requests.get(url, headers=headers)
+
+    if response.status_code == 200:
+        pastes = response.json()
+        for paste in pastes:
+            print(f"{paste['Source']}: {paste['Title']}")
+            print(f"  Date: {paste['Date']}")
+            print(f"  Emails found: {paste['EmailCount']}")
+        return pastes
+    elif response.status_code == 404:
+        return []  # No pastes found
+```
+
+### 7. Enhanced Password Check with Padding
+
+```python
+import hashlib
+import requests
+
+def check_password_secure(password):
+    """Check password with padding to prevent inference attacks"""
+    sha1_hash = hashlib.sha1(password.encode('utf-8')).hexdigest().upper()
+    prefix = sha1_hash[:5]
+    suffix = sha1_hash[5:]
+
+    headers = {'Add-Padding': 'true'}
+    url = f'https://api.pwnedpasswords.com/range/{prefix}'
+    response = requests.get(url, headers=headers)
+
+    # Parse response, ignore padded entries (count=0)
+    for line in response.text.splitlines():
+        hash_suffix, count = line.split(':')
+        if hash_suffix == suffix and int(count) > 0:
+            return int(count)
+    return 0
+```
+
+### 8. Handle Rate Limiting
+
+```python
+import requests
+import time
+
+def api_call_with_retry(url, headers, max_retries=3):
+    """Make API call with automatic retry on rate limit"""
+    for attempt in range(max_retries):
+        response = requests.get(url, headers=headers)
+
+        if response.status_code == 429:
+            # Rate limited - wait and retry
+            retry_after = int(response.headers.get('retry-after', 2))
+            print(f"Rate limited, waiting {retry_after}s...")
+            time.sleep(retry_after)
+            continue
+
+        return response
+
+    raise Exception("Max retries exceeded")
+```
+
+### 9. Check Subscription Status
+
+```python
+import requests
+
+def get_subscription_info(api_key):
+    """Retrieve API subscription details and limits"""
+    headers = {
+        'hibp-api-key': api_key,
+        'user-agent': 'MyApp/1.0'
+    }
+
+    url = 'https://haveibeenpwned.com/api/v3/subscription/status'
+    response = requests.get(url, headers=headers)
+
+    if response.status_code == 200:
+        info = response.json()
+        print(f"Plan: {info['SubscriptionName']}")
+        print(f"Rate limit: {info['Rpm']} requests/minute")
+        print(f"Valid until: {info['SubscribedUntil']}")
+        return info
+```
+
+### 10. Stealer Logs Search
+
+```python
+import requests
+
+def check_stealer_logs(email, api_key):
+    """Check if credentials appear in info stealer malware logs"""
+    headers = {
+        'hibp-api-key': api_key,
+        'user-agent': 'MyApp/1.0'
+    }
+
+    url = f'https://haveibeenpwned.com/api/v3/stealerlogsbyemail/{email}'
+    response = requests.get(url, headers=headers)
+
+    if response.status_code == 200:
+        domains = response.json()  # List of website domains
+        print(f"Credentials found for {len(domains)} websites")
+        return domains
+    elif response.status_code == 404:
+        return []  # Not found in stealer logs
+
+# Requires Pwned 5+ subscription
+```
+
+## Key Concepts
+
+### Authentication
+- **API Key Format**: 32-character hexadecimal string
+- **Header**: `hibp-api-key: {your-key}`
+- **User-Agent Required**: Must set valid user-agent header (returns 403 if missing)
+- **Test Key**: `00000000000000000000000000000000` for integration testing
+
+### k-Anonymity Model
+The Pwned Passwords API uses **k-anonymity** to protect user privacy:
+1. Client hashes password locally with SHA-1
+2. Sends only **first 5 characters** of hash to API
+3. API returns ~800 matching hash suffixes
+4. Client checks locally if full hash matches
+
+This ensures the actual password **never leaves your system**.
+
+### Rate Limiting
+- **Varies by subscription tier**: Pwned 5 = 1,000 requests/minute
+- **HTTP 429 response** when exceeded with `retry-after` header
+- **Pwned Passwords API**: No rate limit
+- **Best practice**: Implement exponential backoff on 429 responses
+
+### Breach Model Attributes
+Key fields in breach objects:
+- **Name**: Unique identifier (e.g., "Adobe")
+- **Title**: Human-readable name
+- **BreachDate**: When breach occurred (ISO 8601)
+- **PwnCount**: Total compromised accounts
+- **DataClasses**: Types of data exposed (emails, passwords, etc.)
+- **IsVerified**: Breach authenticity confirmed
+- **IsSensitive**: Excluded from public searches
+
+### Response Codes
+| Code | Meaning |
+|------|---------|
+| 200 | Success - data found |
+| 404 | Not found (account not in breaches) |
+| 401 | Unauthorized (invalid API key) |
+| 403 | Forbidden (missing user-agent) |
+| 429 | Rate limit exceeded |
+
+## Reference Files
+
+This skill includes comprehensive API documentation in `references/`:
+
+- **other.md** - Complete HIBP API v3 reference with all endpoints, authentication, and usage examples
+
+The reference file contains:
+- **All API endpoints** - Breaches, pastes, passwords, stealer logs
+- **Request/response formats** - Headers, parameters, JSON structures
+- **Authentication details** - API key setup and usage
+- **Rate limiting information** - Subscription tiers and retry strategies
+- **Test accounts** - Pre-configured test data for integration
+- **Code examples** - Real-world implementation patterns
+
+Use `view` to read the reference file when you need detailed information about specific endpoints or advanced features.
+
+## Working with This Skill
+
+### For Beginners
+Start by understanding the core concepts:
+1. **Password checking** - Use Pwned Passwords API (no authentication required)
+2. **Account breaches** - Requires API key from haveibeenpwned.com
+3. **k-Anonymity** - Learn how password hashing protects privacy
+
+Begin with Quick Reference examples #1 (breach check) and #2 (password check).
+
+### For Integration Projects
+Focus on:
+1. **Authentication setup** - Get API key and configure headers
+2. **Rate limiting** - Implement retry logic (example #8)
+3. **Error handling** - Handle 404, 401, 429 responses properly
+4. **User experience** - Provide clear messaging about breach exposure
+
+Review Quick Reference examples #5 (domain search) and #9 (subscription info).
+
+### For Production Systems
+Consider:
+1. **Caching** - Store breach results to reduce API calls
+2. **Background processing** - Check breaches asynchronously
+3. **Monitoring** - Track new breaches with latest breach endpoint (example #4)
+4. **Privacy** - Never log passwords, use k-anonymity model
+5. **Compliance** - Follow attribution requirements (CC BY 4.0)
+
+### For Security Tools
+Advanced patterns:
+1. **Stealer logs** - Check malware-stolen credentials (example #10)
+2. **Domain monitoring** - Track all compromised accounts in your organization
+3. **Paste monitoring** - Alert on email exposure in public pastes (example #6)
+4. **Padding** - Use response padding to prevent inference attacks (example #7)
+
+## Common Patterns
+
+### Pattern 1: Sign-up Password Validation
+```python
+# Prevent users from choosing compromised passwords
+def validate_signup_password(password):
+    count = check_password_pwned(password)
+    if count > 0:
+        return False, f"This password appears in {count} data breaches"
+    return True, "Password is secure"
+```
+
+### Pattern 2: Breach Notification System
+```python
+# Notify users when their account appears in new breach
+def notify_affected_users():
+    latest = get_latest_breach()
+    affected_users = query_users_in_breach(latest['Name'])
+    for user in affected_users:
+        send_notification(user, latest)
+```
+
+### Pattern 3: Compliance Check
+```python
+# Verify all domain accounts for compliance reporting
+def domain_security_audit(domain, api_key):
+    breached = search_domain_breaches(domain, api_key)
+    report = {
+        'total_accounts': len(breached),
+        'affected_accounts': breached,
+        'timestamp': datetime.now()
+    }
+    return report
+```
+
+## API Endpoints Summary
+
+### Authenticated Endpoints (Require API Key)
+- `GET /breachedaccount/{account}` - Check account breaches
+- `GET /pasteaccount/{account}` - Check pastes
+- `GET /breacheddomain/{domain}` - Domain-wide search
+- `GET /subscribeddomains` - List verified domains
+- `GET /subscription/status` - Check subscription
+- `GET /stealerlogsbyemail/{email}` - Stealer logs by email
+- `GET /stealerlogsbywebsitedomain/{domain}` - Stealer logs by site
+- `GET /stealerlogsbyemaildomain/{domain}` - Stealer logs by email domain
+
+### Public Endpoints (No Authentication)
+- `GET /breaches` - All breaches in system
+- `GET /breach/{name}` - Single breach details
+- `GET /latestbreach` - Most recent breach
+- `GET /dataclasses` - List of data types
+- `GET https://api.pwnedpasswords.com/range/{prefix}` - Password check
+
+## Testing
+
+### Test Accounts
+Use these on domain `hibp-integration-tests.com`:
+- `account-exists@` - Has breaches and pastes
+- `multiple-breaches@` - Three different breaches
+- `spam-list-only@` - Only spam-flagged breach
+- `stealer-log@` - In stealer logs
+- `opt-out@` - No results (opted out)
+
+### Test API Key
+Use `00000000000000000000000000000000` for integration testing.
+
+## Best Practices
+
+1. **Always set User-Agent** - Required header, returns 403 without it
+2. **Use HTTPS only** - API requires TLS 1.2+
+3. **Implement retry logic** - Handle 429 rate limits gracefully
+4. **Cache breach data** - Reduce API calls for frequently checked accounts
+5. **Never log passwords** - Use k-anonymity model, hash locally
+6. **Provide attribution** - Link to haveibeenpwned.com (CC BY 4.0 license)
+7. **Handle 404 gracefully** - "Not found" is good news for users
+8. **Use padding for passwords** - Add `Add-Padding: true` header
+
+## Resources
+
+### Official Links
+- API Documentation: https://haveibeenpwned.com/API/v3
+- Get API Key: https://haveibeenpwned.com/API/Key
+- Dashboard: https://haveibeenpwned.com/DomainSearch
+
+### Community Tools
+- **PwnedPasswordsDownloader** (GitHub) - Download full password database
+- Integration libraries available for Python, JavaScript, Go, C#, and more
+
+## Acceptable Use
+
+**Permitted:**
+- Security tools and breach notifications
+- Password validation in authentication systems
+- Compliance and security audits
+- Educational and research purposes
+
+**Prohibited:**
+- Targeting or harming breach victims
+- Denial-of-service attacks
+- Circumventing security measures
+- Misrepresenting data source
+- Automating undocumented APIs
+
+Violations may result in API key revocation or IP blocking.
+
+## Notes
+
+- Breach data licensed under **Creative Commons Attribution 4.0**
+- Pwned Passwords has no licensing requirements
+- CORS only supported for unauthenticated endpoints
+- Never expose API keys in client-side code
+- Service tracks **917+ breaches** as of API documentation date