Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:50:28 +08:00
commit 86a246cf0e
7 changed files with 583 additions and 0 deletions

View File

@@ -0,0 +1,479 @@
---
name: haveibeenpwned
description: HaveIBeenPwned API Documentation - Check if email accounts or passwords have been compromised in data breaches
---
# Have I Been Pwned API Skill
Expert assistance for integrating the Have I Been Pwned (HIBP) API v3 to check for compromised accounts, passwords, and data breaches. This skill provides comprehensive guidance for building security tools, breach notification systems, and password validation features.
## When to Use This Skill
This skill should be triggered when:
- **Checking if emails/accounts appear in data breaches** - "check if this email was pwned"
- **Validating password security** - "check if password is in breach database"
- **Building breach notification systems** - "notify users about compromised accounts"
- **Implementing password validation** - "prevent users from choosing pwned passwords"
- **Querying stealer logs** - "check if credentials were stolen by malware"
- **Integrating HIBP into authentication flows** - "add breach checking to login"
- **Monitoring domains for compromised emails** - "track breaches affecting our domain"
- **Working with the HIBP API** - any questions about authentication, rate limits, or endpoints
## Quick Reference
### 1. Basic Account Breach Check
```python
import requests
def check_account_breaches(email, api_key):
"""Check if an account appears in any breaches"""
headers = {
'hibp-api-key': api_key,
'user-agent': 'MyApp/1.0'
}
url = f'https://haveibeenpwned.com/api/v3/breachedaccount/{email}'
response = requests.get(url, headers=headers)
if response.status_code == 200:
return response.json() # List of breach objects
elif response.status_code == 404:
return [] # Account not found in breaches
else:
response.raise_for_status()
# Usage
breaches = check_account_breaches('user@example.com', 'your-api-key')
print(f"Found in {len(breaches)} breaches")
```
### 2. Password Breach Check (k-Anonymity)
```python
import hashlib
import requests
def check_password_pwned(password):
"""Check if password appears in breaches using k-anonymity"""
# Hash password with SHA-1
sha1_hash = hashlib.sha1(password.encode('utf-8')).hexdigest().upper()
prefix = sha1_hash[:5]
suffix = sha1_hash[5:]
# Query API with first 5 characters only
url = f'https://api.pwnedpasswords.com/range/{prefix}'
response = requests.get(url)
# Parse response for matching suffix
hashes = (line.split(':') for line in response.text.splitlines())
for hash_suffix, count in hashes:
if hash_suffix == suffix:
return int(count) # Times password appears in breaches
return 0 # Password not found
# Usage
count = check_password_pwned('password123')
if count > 0:
print(f"⚠️ Password found {count} times in breaches!")
```
### 3. Get All Breaches in System
```python
import requests
def get_all_breaches(domain=None):
"""Retrieve all breaches, optionally filtered by domain"""
url = 'https://haveibeenpwned.com/api/v3/breaches'
params = {'domain': domain} if domain else {}
headers = {'user-agent': 'MyApp/1.0'}
response = requests.get(url, headers=headers, params=params)
return response.json()
# Usage - no authentication required
breaches = get_all_breaches()
print(f"Total breaches: {len(breaches)}")
# Filter by domain
adobe_breaches = get_all_breaches(domain='adobe.com')
```
### 4. Monitor for New Breaches
```python
import requests
import time
def monitor_latest_breach(check_interval=3600):
"""Poll for new breaches every hour"""
last_breach_name = None
while True:
url = 'https://haveibeenpwned.com/api/v3/latestbreach'
headers = {'user-agent': 'MyApp/1.0'}
response = requests.get(url, headers=headers)
if response.status_code == 200:
breach = response.json()
if breach['Name'] != last_breach_name:
print(f"🆕 New breach: {breach['Title']}")
print(f" Accounts affected: {breach['PwnCount']:,}")
last_breach_name = breach['Name']
time.sleep(check_interval)
```
### 5. Domain-Wide Breach Search
```python
import requests
def search_domain_breaches(domain, api_key):
"""Search for all breached emails in a verified domain"""
headers = {
'hibp-api-key': api_key,
'user-agent': 'MyApp/1.0'
}
url = f'https://haveibeenpwned.com/api/v3/breacheddomain/{domain}'
response = requests.get(url, headers=headers)
if response.status_code == 200:
results = response.json()
# Returns: {"alias1": ["Adobe"], "alias2": ["Adobe", "Gawker"]}
total_affected = len(results)
print(f"Found {total_affected} compromised accounts")
return results
else:
response.raise_for_status()
```
### 6. Check Pastes for Account
```python
import requests
def check_pastes(email, api_key):
"""Check if email appears in any pastes"""
headers = {
'hibp-api-key': api_key,
'user-agent': 'MyApp/1.0'
}
url = f'https://haveibeenpwned.com/api/v3/pasteaccount/{email}'
response = requests.get(url, headers=headers)
if response.status_code == 200:
pastes = response.json()
for paste in pastes:
print(f"{paste['Source']}: {paste['Title']}")
print(f" Date: {paste['Date']}")
print(f" Emails found: {paste['EmailCount']}")
return pastes
elif response.status_code == 404:
return [] # No pastes found
```
### 7. Enhanced Password Check with Padding
```python
import hashlib
import requests
def check_password_secure(password):
"""Check password with padding to prevent inference attacks"""
sha1_hash = hashlib.sha1(password.encode('utf-8')).hexdigest().upper()
prefix = sha1_hash[:5]
suffix = sha1_hash[5:]
headers = {'Add-Padding': 'true'}
url = f'https://api.pwnedpasswords.com/range/{prefix}'
response = requests.get(url, headers=headers)
# Parse response, ignore padded entries (count=0)
for line in response.text.splitlines():
hash_suffix, count = line.split(':')
if hash_suffix == suffix and int(count) > 0:
return int(count)
return 0
```
### 8. Handle Rate Limiting
```python
import requests
import time
def api_call_with_retry(url, headers, max_retries=3):
"""Make API call with automatic retry on rate limit"""
for attempt in range(max_retries):
response = requests.get(url, headers=headers)
if response.status_code == 429:
# Rate limited - wait and retry
retry_after = int(response.headers.get('retry-after', 2))
print(f"Rate limited, waiting {retry_after}s...")
time.sleep(retry_after)
continue
return response
raise Exception("Max retries exceeded")
```
### 9. Check Subscription Status
```python
import requests
def get_subscription_info(api_key):
"""Retrieve API subscription details and limits"""
headers = {
'hibp-api-key': api_key,
'user-agent': 'MyApp/1.0'
}
url = 'https://haveibeenpwned.com/api/v3/subscription/status'
response = requests.get(url, headers=headers)
if response.status_code == 200:
info = response.json()
print(f"Plan: {info['SubscriptionName']}")
print(f"Rate limit: {info['Rpm']} requests/minute")
print(f"Valid until: {info['SubscribedUntil']}")
return info
```
### 10. Stealer Logs Search
```python
import requests
def check_stealer_logs(email, api_key):
"""Check if credentials appear in info stealer malware logs"""
headers = {
'hibp-api-key': api_key,
'user-agent': 'MyApp/1.0'
}
url = f'https://haveibeenpwned.com/api/v3/stealerlogsbyemail/{email}'
response = requests.get(url, headers=headers)
if response.status_code == 200:
domains = response.json() # List of website domains
print(f"Credentials found for {len(domains)} websites")
return domains
elif response.status_code == 404:
return [] # Not found in stealer logs
# Requires Pwned 5+ subscription
```
## Key Concepts
### Authentication
- **API Key Format**: 32-character hexadecimal string
- **Header**: `hibp-api-key: {your-key}`
- **User-Agent Required**: Must set valid user-agent header (returns 403 if missing)
- **Test Key**: `00000000000000000000000000000000` for integration testing
### k-Anonymity Model
The Pwned Passwords API uses **k-anonymity** to protect user privacy:
1. Client hashes password locally with SHA-1
2. Sends only **first 5 characters** of hash to API
3. API returns ~800 matching hash suffixes
4. Client checks locally if full hash matches
This ensures the actual password **never leaves your system**.
### Rate Limiting
- **Varies by subscription tier**: Pwned 5 = 1,000 requests/minute
- **HTTP 429 response** when exceeded with `retry-after` header
- **Pwned Passwords API**: No rate limit
- **Best practice**: Implement exponential backoff on 429 responses
### Breach Model Attributes
Key fields in breach objects:
- **Name**: Unique identifier (e.g., "Adobe")
- **Title**: Human-readable name
- **BreachDate**: When breach occurred (ISO 8601)
- **PwnCount**: Total compromised accounts
- **DataClasses**: Types of data exposed (emails, passwords, etc.)
- **IsVerified**: Breach authenticity confirmed
- **IsSensitive**: Excluded from public searches
### Response Codes
| Code | Meaning |
|------|---------|
| 200 | Success - data found |
| 404 | Not found (account not in breaches) |
| 401 | Unauthorized (invalid API key) |
| 403 | Forbidden (missing user-agent) |
| 429 | Rate limit exceeded |
## Reference Files
This skill includes comprehensive API documentation in `references/`:
- **other.md** - Complete HIBP API v3 reference with all endpoints, authentication, and usage examples
The reference file contains:
- **All API endpoints** - Breaches, pastes, passwords, stealer logs
- **Request/response formats** - Headers, parameters, JSON structures
- **Authentication details** - API key setup and usage
- **Rate limiting information** - Subscription tiers and retry strategies
- **Test accounts** - Pre-configured test data for integration
- **Code examples** - Real-world implementation patterns
Use `view` to read the reference file when you need detailed information about specific endpoints or advanced features.
## Working with This Skill
### For Beginners
Start by understanding the core concepts:
1. **Password checking** - Use Pwned Passwords API (no authentication required)
2. **Account breaches** - Requires API key from haveibeenpwned.com
3. **k-Anonymity** - Learn how password hashing protects privacy
Begin with Quick Reference examples #1 (breach check) and #2 (password check).
### For Integration Projects
Focus on:
1. **Authentication setup** - Get API key and configure headers
2. **Rate limiting** - Implement retry logic (example #8)
3. **Error handling** - Handle 404, 401, 429 responses properly
4. **User experience** - Provide clear messaging about breach exposure
Review Quick Reference examples #5 (domain search) and #9 (subscription info).
### For Production Systems
Consider:
1. **Caching** - Store breach results to reduce API calls
2. **Background processing** - Check breaches asynchronously
3. **Monitoring** - Track new breaches with latest breach endpoint (example #4)
4. **Privacy** - Never log passwords, use k-anonymity model
5. **Compliance** - Follow attribution requirements (CC BY 4.0)
### For Security Tools
Advanced patterns:
1. **Stealer logs** - Check malware-stolen credentials (example #10)
2. **Domain monitoring** - Track all compromised accounts in your organization
3. **Paste monitoring** - Alert on email exposure in public pastes (example #6)
4. **Padding** - Use response padding to prevent inference attacks (example #7)
## Common Patterns
### Pattern 1: Sign-up Password Validation
```python
# Prevent users from choosing compromised passwords
def validate_signup_password(password):
count = check_password_pwned(password)
if count > 0:
return False, f"This password appears in {count} data breaches"
return True, "Password is secure"
```
### Pattern 2: Breach Notification System
```python
# Notify users when their account appears in new breach
def notify_affected_users():
latest = get_latest_breach()
affected_users = query_users_in_breach(latest['Name'])
for user in affected_users:
send_notification(user, latest)
```
### Pattern 3: Compliance Check
```python
# Verify all domain accounts for compliance reporting
def domain_security_audit(domain, api_key):
breached = search_domain_breaches(domain, api_key)
report = {
'total_accounts': len(breached),
'affected_accounts': breached,
'timestamp': datetime.now()
}
return report
```
## API Endpoints Summary
### Authenticated Endpoints (Require API Key)
- `GET /breachedaccount/{account}` - Check account breaches
- `GET /pasteaccount/{account}` - Check pastes
- `GET /breacheddomain/{domain}` - Domain-wide search
- `GET /subscribeddomains` - List verified domains
- `GET /subscription/status` - Check subscription
- `GET /stealerlogsbyemail/{email}` - Stealer logs by email
- `GET /stealerlogsbywebsitedomain/{domain}` - Stealer logs by site
- `GET /stealerlogsbyemaildomain/{domain}` - Stealer logs by email domain
### Public Endpoints (No Authentication)
- `GET /breaches` - All breaches in system
- `GET /breach/{name}` - Single breach details
- `GET /latestbreach` - Most recent breach
- `GET /dataclasses` - List of data types
- `GET https://api.pwnedpasswords.com/range/{prefix}` - Password check
## Testing
### Test Accounts
Use these on domain `hibp-integration-tests.com`:
- `account-exists@` - Has breaches and pastes
- `multiple-breaches@` - Three different breaches
- `spam-list-only@` - Only spam-flagged breach
- `stealer-log@` - In stealer logs
- `opt-out@` - No results (opted out)
### Test API Key
Use `00000000000000000000000000000000` for integration testing.
## Best Practices
1. **Always set User-Agent** - Required header, returns 403 without it
2. **Use HTTPS only** - API requires TLS 1.2+
3. **Implement retry logic** - Handle 429 rate limits gracefully
4. **Cache breach data** - Reduce API calls for frequently checked accounts
5. **Never log passwords** - Use k-anonymity model, hash locally
6. **Provide attribution** - Link to haveibeenpwned.com (CC BY 4.0 license)
7. **Handle 404 gracefully** - "Not found" is good news for users
8. **Use padding for passwords** - Add `Add-Padding: true` header
## Resources
### Official Links
- API Documentation: https://haveibeenpwned.com/API/v3
- Get API Key: https://haveibeenpwned.com/API/Key
- Dashboard: https://haveibeenpwned.com/DomainSearch
### Community Tools
- **PwnedPasswordsDownloader** (GitHub) - Download full password database
- Integration libraries available for Python, JavaScript, Go, C#, and more
## Acceptable Use
**Permitted:**
- Security tools and breach notifications
- Password validation in authentication systems
- Compliance and security audits
- Educational and research purposes
**Prohibited:**
- Targeting or harming breach victims
- Denial-of-service attacks
- Circumventing security measures
- Misrepresenting data source
- Automating undocumented APIs
Violations may result in API key revocation or IP blocking.
## Notes
- Breach data licensed under **Creative Commons Attribution 4.0**
- Pwned Passwords has no licensing requirements
- CORS only supported for unauthenticated endpoints
- Never expose API keys in client-side code
- Service tracks **917+ breaches** as of API documentation date