382 lines
9.4 KiB
Markdown
382 lines
9.4 KiB
Markdown
# Common OpenAlex Query Examples
|
|
|
|
This document provides practical examples for common research queries using OpenAlex.
|
|
|
|
## Finding Papers by Author
|
|
|
|
**User query**: "Find papers by Albert Einstein"
|
|
|
|
**Approach**: Two-step pattern
|
|
1. Search for author to get ID
|
|
2. Filter works by author ID
|
|
|
|
**Python example**:
|
|
```python
|
|
from scripts.openalex_client import OpenAlexClient
|
|
from scripts.query_helpers import find_author_works
|
|
|
|
client = OpenAlexClient(email="your-email@example.edu")
|
|
works = find_author_works("Albert Einstein", client, limit=100)
|
|
|
|
for work in works:
|
|
print(f"{work['title']} ({work['publication_year']})")
|
|
```
|
|
|
|
## Finding Papers from an Institution
|
|
|
|
**User query**: "What papers has MIT published in the last year?"
|
|
|
|
**Approach**: Two-step pattern with date filter
|
|
1. Search for institution to get ID
|
|
2. Filter works by institution ID and year
|
|
|
|
**Python example**:
|
|
```python
|
|
from scripts.query_helpers import find_institution_works
|
|
|
|
works = find_institution_works("MIT", client, limit=200)
|
|
|
|
# Filter for recent papers
|
|
import datetime
|
|
current_year = datetime.datetime.now().year
|
|
recent_works = [w for w in works if w['publication_year'] == current_year]
|
|
```
|
|
|
|
## Highly Cited Papers on a Topic
|
|
|
|
**User query**: "Find the most cited papers on CRISPR from the last 5 years"
|
|
|
|
**Approach**: Search + filter + sort
|
|
|
|
**Python example**:
|
|
```python
|
|
works = client.search_works(
|
|
search="CRISPR",
|
|
filter_params={
|
|
"publication_year": ">2019"
|
|
},
|
|
sort="cited_by_count:desc",
|
|
per_page=100
|
|
)
|
|
|
|
for work in works['results']:
|
|
title = work['title']
|
|
citations = work['cited_by_count']
|
|
year = work['publication_year']
|
|
print(f"{title} ({year}): {citations} citations")
|
|
```
|
|
|
|
## Open Access Papers on a Topic
|
|
|
|
**User query**: "Find open access papers about climate change"
|
|
|
|
**Approach**: Search + OA filter
|
|
|
|
**Python example**:
|
|
```python
|
|
from scripts.query_helpers import get_open_access_papers
|
|
|
|
papers = get_open_access_papers(
|
|
search_term="climate change",
|
|
client=client,
|
|
oa_status="any", # or "gold", "green", "hybrid", "bronze"
|
|
limit=200
|
|
)
|
|
|
|
for paper in papers:
|
|
print(f"{paper['title']}")
|
|
print(f" OA Status: {paper['open_access']['oa_status']}")
|
|
print(f" URL: {paper['open_access']['oa_url']}")
|
|
```
|
|
|
|
## Publication Trends Analysis
|
|
|
|
**User query**: "Show me publication trends for machine learning over the years"
|
|
|
|
**Approach**: Use group_by to aggregate by year
|
|
|
|
**Python example**:
|
|
```python
|
|
from scripts.query_helpers import get_publication_trends
|
|
|
|
trends = get_publication_trends(
|
|
search_term="machine learning",
|
|
client=client
|
|
)
|
|
|
|
# Sort by year
|
|
trends_sorted = sorted(trends, key=lambda x: x['key'])
|
|
|
|
for trend in trends_sorted[-10:]: # Last 10 years
|
|
year = trend['key']
|
|
count = trend['count']
|
|
print(f"{year}: {count} publications")
|
|
```
|
|
|
|
## Analyzing Research Output
|
|
|
|
**User query**: "Analyze the research output of Stanford University from 2020-2024"
|
|
|
|
**Approach**: Multiple aggregations for comprehensive analysis
|
|
|
|
**Python example**:
|
|
```python
|
|
from scripts.query_helpers import analyze_research_output
|
|
|
|
analysis = analyze_research_output(
|
|
entity_type='institution',
|
|
entity_name='Stanford University',
|
|
client=client,
|
|
years='2020-2024'
|
|
)
|
|
|
|
print(f"Institution: {analysis['entity_name']}")
|
|
print(f"Total works: {analysis['total_works']}")
|
|
print(f"Open access: {analysis['open_access_percentage']}%")
|
|
print("\nTop topics:")
|
|
for topic in analysis['top_topics'][:5]:
|
|
print(f" - {topic['key_display_name']}: {topic['count']} works")
|
|
```
|
|
|
|
## Finding Papers by DOI (Batch)
|
|
|
|
**User query**: "Get information for these 10 DOIs: ..."
|
|
|
|
**Approach**: Batch lookup with pipe separator
|
|
|
|
**Python example**:
|
|
```python
|
|
dois = [
|
|
"https://doi.org/10.1371/journal.pone.0266781",
|
|
"https://doi.org/10.1371/journal.pone.0267149",
|
|
"https://doi.org/10.1038/s41586-021-03819-2",
|
|
# ... up to 50 DOIs
|
|
]
|
|
|
|
works = client.batch_lookup(
|
|
entity_type='works',
|
|
ids=dois,
|
|
id_field='doi'
|
|
)
|
|
|
|
for work in works:
|
|
print(f"{work['title']} - {work['publication_year']}")
|
|
```
|
|
|
|
## Random Sample of Papers
|
|
|
|
**User query**: "Give me 50 random papers from 2023"
|
|
|
|
**Approach**: Use sample parameter with seed for reproducibility
|
|
|
|
**Python example**:
|
|
```python
|
|
works = client.sample_works(
|
|
sample_size=50,
|
|
seed=42, # For reproducibility
|
|
filter_params={
|
|
"publication_year": "2023",
|
|
"is_oa": "true"
|
|
}
|
|
)
|
|
|
|
print(f"Got {len(works)} random papers from 2023")
|
|
```
|
|
|
|
## Papers from Multiple Institutions
|
|
|
|
**User query**: "Find papers with authors from both MIT and Stanford"
|
|
|
|
**Approach**: Use + operator for AND within same attribute
|
|
|
|
**Python example**:
|
|
```python
|
|
# First, get institution IDs
|
|
mit_response = client._make_request(
|
|
'/institutions',
|
|
params={'search': 'MIT', 'per-page': 1}
|
|
)
|
|
mit_id = mit_response['results'][0]['id'].split('/')[-1]
|
|
|
|
stanford_response = client._make_request(
|
|
'/institutions',
|
|
params={'search': 'Stanford', 'per-page': 1}
|
|
)
|
|
stanford_id = stanford_response['results'][0]['id'].split('/')[-1]
|
|
|
|
# Find works with authors from both institutions
|
|
works = client.search_works(
|
|
filter_params={
|
|
"authorships.institutions.id": f"{mit_id}+{stanford_id}"
|
|
},
|
|
per_page=100
|
|
)
|
|
|
|
print(f"Found {works['meta']['count']} collaborative papers")
|
|
```
|
|
|
|
## Papers in a Specific Journal
|
|
|
|
**User query**: "Get all papers from Nature published in 2023"
|
|
|
|
**Approach**: Two-step - find journal ID, then filter works
|
|
|
|
**Python example**:
|
|
```python
|
|
# Step 1: Find journal source ID
|
|
source_response = client._make_request(
|
|
'/sources',
|
|
params={'search': 'Nature', 'per-page': 1}
|
|
)
|
|
source = source_response['results'][0]
|
|
source_id = source['id'].split('/')[-1]
|
|
|
|
print(f"Found journal: {source['display_name']} (ID: {source_id})")
|
|
|
|
# Step 2: Get works from that source
|
|
works = client.search_works(
|
|
filter_params={
|
|
"primary_location.source.id": source_id,
|
|
"publication_year": "2023"
|
|
},
|
|
per_page=200
|
|
)
|
|
|
|
print(f"Found {works['meta']['count']} papers from Nature in 2023")
|
|
```
|
|
|
|
## Topic Analysis by Institution
|
|
|
|
**User query**: "What topics does MIT research most?"
|
|
|
|
**Approach**: Filter by institution, group by topics
|
|
|
|
**Python example**:
|
|
```python
|
|
# Get MIT ID
|
|
inst_response = client._make_request(
|
|
'/institutions',
|
|
params={'search': 'MIT', 'per-page': 1}
|
|
)
|
|
mit_id = inst_response['results'][0]['id'].split('/')[-1]
|
|
|
|
# Group by topics
|
|
topics = client.group_by(
|
|
entity_type='works',
|
|
group_field='topics.id',
|
|
filter_params={
|
|
"authorships.institutions.id": mit_id,
|
|
"publication_year": ">2020"
|
|
}
|
|
)
|
|
|
|
print("Top research topics at MIT (2020+):")
|
|
for i, topic in enumerate(topics[:10], 1):
|
|
print(f"{i}. {topic['key_display_name']}: {topic['count']} works")
|
|
```
|
|
|
|
## Citation Analysis
|
|
|
|
**User query**: "Find papers that cite this specific DOI"
|
|
|
|
**Approach**: Get work by DOI, then use cited_by_api_url
|
|
|
|
**Python example**:
|
|
```python
|
|
# Get the work
|
|
doi = "https://doi.org/10.1038/s41586-021-03819-2"
|
|
work = client.get_entity('works', doi)
|
|
|
|
# Get papers that cite it
|
|
cited_by_url = work['cited_by_api_url']
|
|
|
|
# Extract just the query part and use it
|
|
import requests
|
|
response = requests.get(cited_by_url, params={'mailto': client.email})
|
|
citing_works = response.json()
|
|
|
|
print(f"{work['title']}")
|
|
print(f"Total citations: {work['cited_by_count']}")
|
|
print(f"\nRecent citing papers:")
|
|
for citing_work in citing_works['results'][:5]:
|
|
print(f" - {citing_work['title']} ({citing_work['publication_year']})")
|
|
```
|
|
|
|
## Large-Scale Data Extraction
|
|
|
|
**User query**: "Get all papers on quantum computing from the last 3 years"
|
|
|
|
**Approach**: Paginate through all results
|
|
|
|
**Python example**:
|
|
```python
|
|
all_papers = client.paginate_all(
|
|
endpoint='/works',
|
|
params={
|
|
'search': 'quantum computing',
|
|
'filter': 'publication_year:2022-2024'
|
|
},
|
|
max_results=10000 # Limit to prevent excessive API calls
|
|
)
|
|
|
|
print(f"Retrieved {len(all_papers)} papers")
|
|
|
|
# Save to CSV
|
|
import csv
|
|
with open('quantum_papers.csv', 'w', newline='') as f:
|
|
writer = csv.writer(f)
|
|
writer.writerow(['Title', 'Year', 'Citations', 'DOI', 'OA Status'])
|
|
|
|
for paper in all_papers:
|
|
writer.writerow([
|
|
paper['title'],
|
|
paper['publication_year'],
|
|
paper['cited_by_count'],
|
|
paper.get('doi', 'N/A'),
|
|
paper['open_access']['oa_status']
|
|
])
|
|
```
|
|
|
|
## Complex Multi-Filter Query
|
|
|
|
**User query**: "Find recent, highly-cited, open access papers on AI from top institutions"
|
|
|
|
**Approach**: Combine multiple filters
|
|
|
|
**Python example**:
|
|
```python
|
|
# Get IDs for top institutions
|
|
top_institutions = ['MIT', 'Stanford', 'Oxford']
|
|
inst_ids = []
|
|
|
|
for inst_name in top_institutions:
|
|
response = client._make_request(
|
|
'/institutions',
|
|
params={'search': inst_name, 'per-page': 1}
|
|
)
|
|
if response['results']:
|
|
inst_id = response['results'][0]['id'].split('/')[-1]
|
|
inst_ids.append(inst_id)
|
|
|
|
# Combine with pipe for OR
|
|
inst_filter = '|'.join(inst_ids)
|
|
|
|
# Complex query
|
|
works = client.search_works(
|
|
search="artificial intelligence",
|
|
filter_params={
|
|
"publication_year": ">2022",
|
|
"cited_by_count": ">50",
|
|
"is_oa": "true",
|
|
"authorships.institutions.id": inst_filter
|
|
},
|
|
sort="cited_by_count:desc",
|
|
per_page=200
|
|
)
|
|
|
|
print(f"Found {works['meta']['count']} papers matching criteria")
|
|
for work in works['results'][:10]:
|
|
print(f"{work['title']}")
|
|
print(f" Citations: {work['cited_by_count']}, Year: {work['publication_year']}")
|
|
```
|