Initial commit
This commit is contained in:
371
skills/openalex-database/references/api_guide.md
Normal file
371
skills/openalex-database/references/api_guide.md
Normal file
@@ -0,0 +1,371 @@
|
||||
# OpenAlex API Complete Guide
|
||||
|
||||
## Base Information
|
||||
|
||||
**Base URL:** `https://api.openalex.org`
|
||||
**Authentication:** None required
|
||||
**Rate Limits:**
|
||||
- Default: 1 request/second, 100k requests/day
|
||||
- Polite pool (with email): 10 requests/second, 100k requests/day
|
||||
|
||||
## Critical Best Practices
|
||||
|
||||
### ✅ DO: Use `?sample` parameter for random sampling
|
||||
```
|
||||
https://api.openalex.org/works?sample=20&seed=123
|
||||
```
|
||||
For large samples (10k+), use multiple seeds and deduplicate.
|
||||
|
||||
### ❌ DON'T: Use random page numbers for sampling
|
||||
Incorrect: `?page=5`, `?page=17` - This biases results!
|
||||
|
||||
### ✅ DO: Use two-step lookup for entity filtering
|
||||
```
|
||||
1. Find entity ID: /authors?search=einstein
|
||||
2. Use ID: /works?filter=authorships.author.id:A5023888391
|
||||
```
|
||||
|
||||
### ❌ DON'T: Filter by entity names directly
|
||||
Incorrect: `/works?filter=author_name:Einstein` - Names are ambiguous!
|
||||
|
||||
### ✅ DO: Use maximum page size for bulk extraction
|
||||
```
|
||||
?per-page=200
|
||||
```
|
||||
This is 8x faster than default (25).
|
||||
|
||||
### ❌ DON'T: Use default page sizes
|
||||
Default is only 25 results per page.
|
||||
|
||||
### ✅ DO: Use OR filter (pipe |) for batch lookups
|
||||
```
|
||||
/works?filter=doi:10.1/abc|10.2/def|10.3/ghi
|
||||
```
|
||||
Up to 50 values per filter.
|
||||
|
||||
### ❌ DON'T: Make sequential API calls for lists
|
||||
Making 100 separate calls when you can batch them is inefficient.
|
||||
|
||||
### ✅ DO: Implement exponential backoff for retries
|
||||
```python
|
||||
for attempt in range(max_retries):
|
||||
try:
|
||||
response = requests.get(url)
|
||||
if response.status_code == 200:
|
||||
return response.json()
|
||||
except:
|
||||
wait_time = 2 ** attempt
|
||||
time.sleep(wait_time)
|
||||
```
|
||||
|
||||
### ✅ DO: Add email for 10x rate limit boost
|
||||
```
|
||||
?mailto=yourname@example.edu
|
||||
```
|
||||
Increases from 1 req/sec → 10 req/sec.
|
||||
|
||||
## Entity Endpoints
|
||||
|
||||
- `/works` - 240M+ scholarly documents
|
||||
- `/authors` - Researcher profiles
|
||||
- `/sources` - Journals, repositories, conferences
|
||||
- `/institutions` - Universities, research organizations
|
||||
- `/topics` - Subject classifications (3-level hierarchy)
|
||||
- `/publishers` - Publishing organizations
|
||||
- `/funders` - Funding agencies
|
||||
- `/text` - Tag your own text with topics/keywords (POST)
|
||||
|
||||
## Essential Query Parameters
|
||||
|
||||
| Parameter | Description | Example |
|
||||
|-----------|-------------|---------|
|
||||
| `filter=` | Filter results | `?filter=publication_year:2020` |
|
||||
| `search=` | Full-text search | `?search=machine+learning` |
|
||||
| `sort=` | Sort results | `?sort=cited_by_count:desc` |
|
||||
| `per-page=` | Results per page (max 200) | `?per-page=200` |
|
||||
| `page=` | Page number | `?page=2` |
|
||||
| `sample=` | Random results | `?sample=50&seed=42` |
|
||||
| `select=` | Limit fields | `?select=id,title` |
|
||||
| `group_by=` | Aggregate by field | `?group_by=publication_year` |
|
||||
| `mailto=` | Email for polite pool | `?mailto=you@example.edu` |
|
||||
|
||||
## Filter Syntax
|
||||
|
||||
### Basic Filtering
|
||||
```
|
||||
Single filter: ?filter=publication_year:2020
|
||||
Multiple (AND): ?filter=publication_year:2020,is_oa:true
|
||||
Values (OR): ?filter=type:journal-article|book
|
||||
Negation: ?filter=type:!journal-article
|
||||
```
|
||||
|
||||
### Comparison Operators
|
||||
```
|
||||
Greater than: ?filter=cited_by_count:>100
|
||||
Less than: ?filter=publication_year:<2020
|
||||
Range: ?filter=publication_year:2020-2023
|
||||
```
|
||||
|
||||
### Multiple Values in Same Attribute
|
||||
```
|
||||
Repeat filter: ?filter=institutions.country_code:us,institutions.country_code:gb
|
||||
Use + symbol: ?filter=institutions.country_code:us+gb
|
||||
```
|
||||
Both mean: "works with author from US AND author from GB"
|
||||
|
||||
### OR Queries
|
||||
```
|
||||
Any of these: ?filter=institutions.country_code:us|gb|ca
|
||||
Batch IDs: ?filter=doi:10.1/abc|10.2/def
|
||||
```
|
||||
Up to 50 values with pipes.
|
||||
|
||||
## Common Query Patterns
|
||||
|
||||
### Get Random Sample
|
||||
```bash
|
||||
# Small sample
|
||||
https://api.openalex.org/works?sample=20&seed=42
|
||||
|
||||
# Large sample (10k+) - make multiple requests
|
||||
https://api.openalex.org/works?sample=1000&seed=1
|
||||
https://api.openalex.org/works?sample=1000&seed=2
|
||||
# Then deduplicate by ID
|
||||
```
|
||||
|
||||
### Search Works
|
||||
```bash
|
||||
# Simple search
|
||||
https://api.openalex.org/works?search=machine+learning
|
||||
|
||||
# Search specific field
|
||||
https://api.openalex.org/works?filter=title.search:CRISPR
|
||||
|
||||
# Search + filter
|
||||
https://api.openalex.org/works?search=climate&filter=publication_year:2023
|
||||
```
|
||||
|
||||
### Find Works by Author (Two-Step)
|
||||
```bash
|
||||
# Step 1: Get author ID
|
||||
https://api.openalex.org/authors?search=Heather+Piwowar
|
||||
# Returns: "id": "https://openalex.org/A5023888391"
|
||||
|
||||
# Step 2: Get their works
|
||||
https://api.openalex.org/works?filter=authorships.author.id:A5023888391
|
||||
```
|
||||
|
||||
### Find Works by Institution (Two-Step)
|
||||
```bash
|
||||
# Step 1: Get institution ID
|
||||
https://api.openalex.org/institutions?search=MIT
|
||||
# Returns: "id": "https://openalex.org/I136199984"
|
||||
|
||||
# Step 2: Get their works
|
||||
https://api.openalex.org/works?filter=authorships.institutions.id:I136199984
|
||||
```
|
||||
|
||||
### Highly Cited Recent Papers
|
||||
```bash
|
||||
https://api.openalex.org/works?filter=publication_year:>2020&sort=cited_by_count:desc&per-page=200
|
||||
```
|
||||
|
||||
### Open Access Works
|
||||
```bash
|
||||
# All OA
|
||||
https://api.openalex.org/works?filter=is_oa:true
|
||||
|
||||
# Gold OA only
|
||||
https://api.openalex.org/works?filter=open_access.oa_status:gold
|
||||
```
|
||||
|
||||
### Multiple Criteria
|
||||
```bash
|
||||
# Recent OA works about COVID from top institutions
|
||||
https://api.openalex.org/works?filter=publication_year:2022,is_oa:true,title.search:covid,authorships.institutions.id:I136199984|I27837315
|
||||
```
|
||||
|
||||
### Bulk DOI Lookup
|
||||
```bash
|
||||
# Get specific works by DOI (up to 50 per request)
|
||||
https://api.openalex.org/works?filter=doi:https://doi.org/10.1371/journal.pone.0266781|https://doi.org/10.1371/journal.pone.0267149&per-page=50
|
||||
```
|
||||
|
||||
### Aggregate Data
|
||||
```bash
|
||||
# Top topics
|
||||
https://api.openalex.org/works?group_by=topics.id
|
||||
|
||||
# Papers per year
|
||||
https://api.openalex.org/works?group_by=publication_year
|
||||
|
||||
# Most prolific institutions
|
||||
https://api.openalex.org/works?group_by=authorships.institutions.id
|
||||
```
|
||||
|
||||
### Pagination
|
||||
```bash
|
||||
# First page
|
||||
https://api.openalex.org/works?filter=publication_year:2023&per-page=200
|
||||
|
||||
# Next pages
|
||||
https://api.openalex.org/works?filter=publication_year:2023&per-page=200&page=2
|
||||
```
|
||||
|
||||
## Response Structure
|
||||
|
||||
### List Endpoints
|
||||
```json
|
||||
{
|
||||
"meta": {
|
||||
"count": 240523418,
|
||||
"db_response_time_ms": 42,
|
||||
"page": 1,
|
||||
"per_page": 25
|
||||
},
|
||||
"results": [
|
||||
{ /* entity object */ }
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Single Entity
|
||||
```
|
||||
https://api.openalex.org/works/W2741809807
|
||||
→ Returns Work object directly (no meta/results wrapper)
|
||||
```
|
||||
|
||||
### Group By
|
||||
```json
|
||||
{
|
||||
"meta": { "count": 100 },
|
||||
"group_by": [
|
||||
{
|
||||
"key": "https://openalex.org/T10001",
|
||||
"key_display_name": "Artificial Intelligence",
|
||||
"count": 15234
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Works Filters (Most Common)
|
||||
|
||||
| Filter | Description | Example |
|
||||
|--------|-------------|---------|
|
||||
| `authorships.author.id` | Author's OpenAlex ID | `A5023888391` |
|
||||
| `authorships.institutions.id` | Institution's ID | `I136199984` |
|
||||
| `cited_by_count` | Citation count | `>100` |
|
||||
| `is_oa` | Is open access | `true/false` |
|
||||
| `publication_year` | Year published | `2020`, `>2020`, `2018-2022` |
|
||||
| `primary_location.source.id` | Source (journal) ID | `S137773608` |
|
||||
| `topics.id` | Topic ID | `T10001` |
|
||||
| `type` | Document type | `article`, `book`, `dataset` |
|
||||
| `has_doi` | Has DOI | `true/false` |
|
||||
| `has_fulltext` | Has fulltext | `true/false` |
|
||||
|
||||
## Authors Filters
|
||||
|
||||
| Filter | Description |
|
||||
|--------|-------------|
|
||||
| `last_known_institution.id` | Current/last institution |
|
||||
| `works_count` | Number of works |
|
||||
| `cited_by_count` | Total citations |
|
||||
| `orcid` | ORCID identifier |
|
||||
|
||||
## External ID Support
|
||||
|
||||
### Works
|
||||
```
|
||||
DOI: /works/https://doi.org/10.7717/peerj.4375
|
||||
PMID: /works/pmid:29844763
|
||||
```
|
||||
|
||||
### Authors
|
||||
```
|
||||
ORCID: /authors/https://orcid.org/0000-0003-1613-5981
|
||||
```
|
||||
|
||||
### Institutions
|
||||
```
|
||||
ROR: /institutions/https://ror.org/02y3ad647
|
||||
```
|
||||
|
||||
### Sources
|
||||
```
|
||||
ISSN: /sources/issn:0028-0836
|
||||
```
|
||||
|
||||
## Performance Tips
|
||||
|
||||
1. **Use maximum page size**: `?per-page=200` (8x fewer calls)
|
||||
2. **Batch ID lookups**: Use pipe operator for up to 50 IDs
|
||||
3. **Select only needed fields**: `?select=id,title,publication_year`
|
||||
4. **Use concurrent requests**: With rate limiting (10 req/sec with email)
|
||||
5. **Add email**: `?mailto=you@example.edu` for 10x speed boost
|
||||
|
||||
## Error Handling
|
||||
|
||||
### HTTP Status Codes
|
||||
- `200` - Success
|
||||
- `400` - Bad request (check filter syntax)
|
||||
- `403` - Rate limit exceeded (implement backoff)
|
||||
- `404` - Entity doesn't exist
|
||||
- `500` - Server error (retry with backoff)
|
||||
|
||||
### Exponential Backoff
|
||||
```python
|
||||
def fetch_with_retry(url, max_retries=5):
|
||||
for attempt in range(max_retries):
|
||||
try:
|
||||
response = requests.get(url, timeout=30)
|
||||
if response.status_code == 200:
|
||||
return response.json()
|
||||
elif response.status_code in [403, 500, 502, 503, 504]:
|
||||
wait_time = 2 ** attempt
|
||||
time.sleep(wait_time)
|
||||
else:
|
||||
response.raise_for_status()
|
||||
except requests.exceptions.Timeout:
|
||||
if attempt < max_retries - 1:
|
||||
time.sleep(2 ** attempt)
|
||||
else:
|
||||
raise
|
||||
raise Exception(f"Failed after {max_retries} retries")
|
||||
```
|
||||
|
||||
## Rate Limiting
|
||||
|
||||
### Without Email (Default Pool)
|
||||
- 1 request/second
|
||||
- 100,000 requests/day
|
||||
|
||||
### With Email (Polite Pool)
|
||||
- 10 requests/second
|
||||
- 100,000 requests/day
|
||||
- **Always use for production**
|
||||
|
||||
### Concurrent Request Strategy
|
||||
1. Track requests per second globally
|
||||
2. Use semaphore or rate limiter across threads
|
||||
3. Monitor for 403 responses
|
||||
4. Back off if limits hit
|
||||
|
||||
## Common Mistakes to Avoid
|
||||
|
||||
1. ❌ Using page numbers for sampling → ✅ Use `?sample=`
|
||||
2. ❌ Filtering by entity names → ✅ Get IDs first
|
||||
3. ❌ Default page size → ✅ Use `per-page=200`
|
||||
4. ❌ Sequential ID lookups → ✅ Batch with pipe operator
|
||||
5. ❌ No error handling → ✅ Implement retry with backoff
|
||||
6. ❌ Ignoring rate limits → ✅ Global rate limiting
|
||||
7. ❌ Not including email → ✅ Add `mailto=`
|
||||
8. ❌ Fetching all fields → ✅ Use `select=`
|
||||
|
||||
## Additional Resources
|
||||
|
||||
- Full documentation: https://docs.openalex.org
|
||||
- API Overview: https://docs.openalex.org/how-to-use-the-api/api-overview
|
||||
- Entity schemas: https://docs.openalex.org/api-entities
|
||||
- Help: https://openalex.org/help
|
||||
- User group: https://groups.google.com/g/openalex-users
|
||||
381
skills/openalex-database/references/common_queries.md
Normal file
381
skills/openalex-database/references/common_queries.md
Normal file
@@ -0,0 +1,381 @@
|
||||
# Common OpenAlex Query Examples
|
||||
|
||||
This document provides practical examples for common research queries using OpenAlex.
|
||||
|
||||
## Finding Papers by Author
|
||||
|
||||
**User query**: "Find papers by Albert Einstein"
|
||||
|
||||
**Approach**: Two-step pattern
|
||||
1. Search for author to get ID
|
||||
2. Filter works by author ID
|
||||
|
||||
**Python example**:
|
||||
```python
|
||||
from scripts.openalex_client import OpenAlexClient
|
||||
from scripts.query_helpers import find_author_works
|
||||
|
||||
client = OpenAlexClient(email="your-email@example.edu")
|
||||
works = find_author_works("Albert Einstein", client, limit=100)
|
||||
|
||||
for work in works:
|
||||
print(f"{work['title']} ({work['publication_year']})")
|
||||
```
|
||||
|
||||
## Finding Papers from an Institution
|
||||
|
||||
**User query**: "What papers has MIT published in the last year?"
|
||||
|
||||
**Approach**: Two-step pattern with date filter
|
||||
1. Search for institution to get ID
|
||||
2. Filter works by institution ID and year
|
||||
|
||||
**Python example**:
|
||||
```python
|
||||
from scripts.query_helpers import find_institution_works
|
||||
|
||||
works = find_institution_works("MIT", client, limit=200)
|
||||
|
||||
# Filter for recent papers
|
||||
import datetime
|
||||
current_year = datetime.datetime.now().year
|
||||
recent_works = [w for w in works if w['publication_year'] == current_year]
|
||||
```
|
||||
|
||||
## Highly Cited Papers on a Topic
|
||||
|
||||
**User query**: "Find the most cited papers on CRISPR from the last 5 years"
|
||||
|
||||
**Approach**: Search + filter + sort
|
||||
|
||||
**Python example**:
|
||||
```python
|
||||
works = client.search_works(
|
||||
search="CRISPR",
|
||||
filter_params={
|
||||
"publication_year": ">2019"
|
||||
},
|
||||
sort="cited_by_count:desc",
|
||||
per_page=100
|
||||
)
|
||||
|
||||
for work in works['results']:
|
||||
title = work['title']
|
||||
citations = work['cited_by_count']
|
||||
year = work['publication_year']
|
||||
print(f"{title} ({year}): {citations} citations")
|
||||
```
|
||||
|
||||
## Open Access Papers on a Topic
|
||||
|
||||
**User query**: "Find open access papers about climate change"
|
||||
|
||||
**Approach**: Search + OA filter
|
||||
|
||||
**Python example**:
|
||||
```python
|
||||
from scripts.query_helpers import get_open_access_papers
|
||||
|
||||
papers = get_open_access_papers(
|
||||
search_term="climate change",
|
||||
client=client,
|
||||
oa_status="any", # or "gold", "green", "hybrid", "bronze"
|
||||
limit=200
|
||||
)
|
||||
|
||||
for paper in papers:
|
||||
print(f"{paper['title']}")
|
||||
print(f" OA Status: {paper['open_access']['oa_status']}")
|
||||
print(f" URL: {paper['open_access']['oa_url']}")
|
||||
```
|
||||
|
||||
## Publication Trends Analysis
|
||||
|
||||
**User query**: "Show me publication trends for machine learning over the years"
|
||||
|
||||
**Approach**: Use group_by to aggregate by year
|
||||
|
||||
**Python example**:
|
||||
```python
|
||||
from scripts.query_helpers import get_publication_trends
|
||||
|
||||
trends = get_publication_trends(
|
||||
search_term="machine learning",
|
||||
client=client
|
||||
)
|
||||
|
||||
# Sort by year
|
||||
trends_sorted = sorted(trends, key=lambda x: x['key'])
|
||||
|
||||
for trend in trends_sorted[-10:]: # Last 10 years
|
||||
year = trend['key']
|
||||
count = trend['count']
|
||||
print(f"{year}: {count} publications")
|
||||
```
|
||||
|
||||
## Analyzing Research Output
|
||||
|
||||
**User query**: "Analyze the research output of Stanford University from 2020-2024"
|
||||
|
||||
**Approach**: Multiple aggregations for comprehensive analysis
|
||||
|
||||
**Python example**:
|
||||
```python
|
||||
from scripts.query_helpers import analyze_research_output
|
||||
|
||||
analysis = analyze_research_output(
|
||||
entity_type='institution',
|
||||
entity_name='Stanford University',
|
||||
client=client,
|
||||
years='2020-2024'
|
||||
)
|
||||
|
||||
print(f"Institution: {analysis['entity_name']}")
|
||||
print(f"Total works: {analysis['total_works']}")
|
||||
print(f"Open access: {analysis['open_access_percentage']}%")
|
||||
print("\nTop topics:")
|
||||
for topic in analysis['top_topics'][:5]:
|
||||
print(f" - {topic['key_display_name']}: {topic['count']} works")
|
||||
```
|
||||
|
||||
## Finding Papers by DOI (Batch)
|
||||
|
||||
**User query**: "Get information for these 10 DOIs: ..."
|
||||
|
||||
**Approach**: Batch lookup with pipe separator
|
||||
|
||||
**Python example**:
|
||||
```python
|
||||
dois = [
|
||||
"https://doi.org/10.1371/journal.pone.0266781",
|
||||
"https://doi.org/10.1371/journal.pone.0267149",
|
||||
"https://doi.org/10.1038/s41586-021-03819-2",
|
||||
# ... up to 50 DOIs
|
||||
]
|
||||
|
||||
works = client.batch_lookup(
|
||||
entity_type='works',
|
||||
ids=dois,
|
||||
id_field='doi'
|
||||
)
|
||||
|
||||
for work in works:
|
||||
print(f"{work['title']} - {work['publication_year']}")
|
||||
```
|
||||
|
||||
## Random Sample of Papers
|
||||
|
||||
**User query**: "Give me 50 random papers from 2023"
|
||||
|
||||
**Approach**: Use sample parameter with seed for reproducibility
|
||||
|
||||
**Python example**:
|
||||
```python
|
||||
works = client.sample_works(
|
||||
sample_size=50,
|
||||
seed=42, # For reproducibility
|
||||
filter_params={
|
||||
"publication_year": "2023",
|
||||
"is_oa": "true"
|
||||
}
|
||||
)
|
||||
|
||||
print(f"Got {len(works)} random papers from 2023")
|
||||
```
|
||||
|
||||
## Papers from Multiple Institutions
|
||||
|
||||
**User query**: "Find papers with authors from both MIT and Stanford"
|
||||
|
||||
**Approach**: Use + operator for AND within same attribute
|
||||
|
||||
**Python example**:
|
||||
```python
|
||||
# First, get institution IDs
|
||||
mit_response = client._make_request(
|
||||
'/institutions',
|
||||
params={'search': 'MIT', 'per-page': 1}
|
||||
)
|
||||
mit_id = mit_response['results'][0]['id'].split('/')[-1]
|
||||
|
||||
stanford_response = client._make_request(
|
||||
'/institutions',
|
||||
params={'search': 'Stanford', 'per-page': 1}
|
||||
)
|
||||
stanford_id = stanford_response['results'][0]['id'].split('/')[-1]
|
||||
|
||||
# Find works with authors from both institutions
|
||||
works = client.search_works(
|
||||
filter_params={
|
||||
"authorships.institutions.id": f"{mit_id}+{stanford_id}"
|
||||
},
|
||||
per_page=100
|
||||
)
|
||||
|
||||
print(f"Found {works['meta']['count']} collaborative papers")
|
||||
```
|
||||
|
||||
## Papers in a Specific Journal
|
||||
|
||||
**User query**: "Get all papers from Nature published in 2023"
|
||||
|
||||
**Approach**: Two-step - find journal ID, then filter works
|
||||
|
||||
**Python example**:
|
||||
```python
|
||||
# Step 1: Find journal source ID
|
||||
source_response = client._make_request(
|
||||
'/sources',
|
||||
params={'search': 'Nature', 'per-page': 1}
|
||||
)
|
||||
source = source_response['results'][0]
|
||||
source_id = source['id'].split('/')[-1]
|
||||
|
||||
print(f"Found journal: {source['display_name']} (ID: {source_id})")
|
||||
|
||||
# Step 2: Get works from that source
|
||||
works = client.search_works(
|
||||
filter_params={
|
||||
"primary_location.source.id": source_id,
|
||||
"publication_year": "2023"
|
||||
},
|
||||
per_page=200
|
||||
)
|
||||
|
||||
print(f"Found {works['meta']['count']} papers from Nature in 2023")
|
||||
```
|
||||
|
||||
## Topic Analysis by Institution
|
||||
|
||||
**User query**: "What topics does MIT research most?"
|
||||
|
||||
**Approach**: Filter by institution, group by topics
|
||||
|
||||
**Python example**:
|
||||
```python
|
||||
# Get MIT ID
|
||||
inst_response = client._make_request(
|
||||
'/institutions',
|
||||
params={'search': 'MIT', 'per-page': 1}
|
||||
)
|
||||
mit_id = inst_response['results'][0]['id'].split('/')[-1]
|
||||
|
||||
# Group by topics
|
||||
topics = client.group_by(
|
||||
entity_type='works',
|
||||
group_field='topics.id',
|
||||
filter_params={
|
||||
"authorships.institutions.id": mit_id,
|
||||
"publication_year": ">2020"
|
||||
}
|
||||
)
|
||||
|
||||
print("Top research topics at MIT (2020+):")
|
||||
for i, topic in enumerate(topics[:10], 1):
|
||||
print(f"{i}. {topic['key_display_name']}: {topic['count']} works")
|
||||
```
|
||||
|
||||
## Citation Analysis
|
||||
|
||||
**User query**: "Find papers that cite this specific DOI"
|
||||
|
||||
**Approach**: Get work by DOI, then use cited_by_api_url
|
||||
|
||||
**Python example**:
|
||||
```python
|
||||
# Get the work
|
||||
doi = "https://doi.org/10.1038/s41586-021-03819-2"
|
||||
work = client.get_entity('works', doi)
|
||||
|
||||
# Get papers that cite it
|
||||
cited_by_url = work['cited_by_api_url']
|
||||
|
||||
# Extract just the query part and use it
|
||||
import requests
|
||||
response = requests.get(cited_by_url, params={'mailto': client.email})
|
||||
citing_works = response.json()
|
||||
|
||||
print(f"{work['title']}")
|
||||
print(f"Total citations: {work['cited_by_count']}")
|
||||
print(f"\nRecent citing papers:")
|
||||
for citing_work in citing_works['results'][:5]:
|
||||
print(f" - {citing_work['title']} ({citing_work['publication_year']})")
|
||||
```
|
||||
|
||||
## Large-Scale Data Extraction
|
||||
|
||||
**User query**: "Get all papers on quantum computing from the last 3 years"
|
||||
|
||||
**Approach**: Paginate through all results
|
||||
|
||||
**Python example**:
|
||||
```python
|
||||
all_papers = client.paginate_all(
|
||||
endpoint='/works',
|
||||
params={
|
||||
'search': 'quantum computing',
|
||||
'filter': 'publication_year:2022-2024'
|
||||
},
|
||||
max_results=10000 # Limit to prevent excessive API calls
|
||||
)
|
||||
|
||||
print(f"Retrieved {len(all_papers)} papers")
|
||||
|
||||
# Save to CSV
|
||||
import csv
|
||||
with open('quantum_papers.csv', 'w', newline='') as f:
|
||||
writer = csv.writer(f)
|
||||
writer.writerow(['Title', 'Year', 'Citations', 'DOI', 'OA Status'])
|
||||
|
||||
for paper in all_papers:
|
||||
writer.writerow([
|
||||
paper['title'],
|
||||
paper['publication_year'],
|
||||
paper['cited_by_count'],
|
||||
paper.get('doi', 'N/A'),
|
||||
paper['open_access']['oa_status']
|
||||
])
|
||||
```
|
||||
|
||||
## Complex Multi-Filter Query
|
||||
|
||||
**User query**: "Find recent, highly-cited, open access papers on AI from top institutions"
|
||||
|
||||
**Approach**: Combine multiple filters
|
||||
|
||||
**Python example**:
|
||||
```python
|
||||
# Get IDs for top institutions
|
||||
top_institutions = ['MIT', 'Stanford', 'Oxford']
|
||||
inst_ids = []
|
||||
|
||||
for inst_name in top_institutions:
|
||||
response = client._make_request(
|
||||
'/institutions',
|
||||
params={'search': inst_name, 'per-page': 1}
|
||||
)
|
||||
if response['results']:
|
||||
inst_id = response['results'][0]['id'].split('/')[-1]
|
||||
inst_ids.append(inst_id)
|
||||
|
||||
# Combine with pipe for OR
|
||||
inst_filter = '|'.join(inst_ids)
|
||||
|
||||
# Complex query
|
||||
works = client.search_works(
|
||||
search="artificial intelligence",
|
||||
filter_params={
|
||||
"publication_year": ">2022",
|
||||
"cited_by_count": ">50",
|
||||
"is_oa": "true",
|
||||
"authorships.institutions.id": inst_filter
|
||||
},
|
||||
sort="cited_by_count:desc",
|
||||
per_page=200
|
||||
)
|
||||
|
||||
print(f"Found {works['meta']['count']} papers matching criteria")
|
||||
for work in works['results'][:10]:
|
||||
print(f"{work['title']}")
|
||||
print(f" Citations: {work['cited_by_count']}, Year: {work['publication_year']}")
|
||||
```
|
||||
Reference in New Issue
Block a user