9.4 KiB
Common OpenAlex Query Examples
This document provides practical examples for common research queries using OpenAlex.
Finding Papers by Author
User query: "Find papers by Albert Einstein"
Approach: Two-step pattern
- Search for author to get ID
- Filter works by author ID
Python example:
from scripts.openalex_client import OpenAlexClient
from scripts.query_helpers import find_author_works
client = OpenAlexClient(email="your-email@example.edu")
works = find_author_works("Albert Einstein", client, limit=100)
for work in works:
print(f"{work['title']} ({work['publication_year']})")
Finding Papers from an Institution
User query: "What papers has MIT published in the last year?"
Approach: Two-step pattern with date filter
- Search for institution to get ID
- Filter works by institution ID and year
Python example:
from scripts.query_helpers import find_institution_works
works = find_institution_works("MIT", client, limit=200)
# Filter for recent papers
import datetime
current_year = datetime.datetime.now().year
recent_works = [w for w in works if w['publication_year'] == current_year]
Highly Cited Papers on a Topic
User query: "Find the most cited papers on CRISPR from the last 5 years"
Approach: Search + filter + sort
Python example:
works = client.search_works(
search="CRISPR",
filter_params={
"publication_year": ">2019"
},
sort="cited_by_count:desc",
per_page=100
)
for work in works['results']:
title = work['title']
citations = work['cited_by_count']
year = work['publication_year']
print(f"{title} ({year}): {citations} citations")
Open Access Papers on a Topic
User query: "Find open access papers about climate change"
Approach: Search + OA filter
Python example:
from scripts.query_helpers import get_open_access_papers
papers = get_open_access_papers(
search_term="climate change",
client=client,
oa_status="any", # or "gold", "green", "hybrid", "bronze"
limit=200
)
for paper in papers:
print(f"{paper['title']}")
print(f" OA Status: {paper['open_access']['oa_status']}")
print(f" URL: {paper['open_access']['oa_url']}")
Publication Trends Analysis
User query: "Show me publication trends for machine learning over the years"
Approach: Use group_by to aggregate by year
Python example:
from scripts.query_helpers import get_publication_trends
trends = get_publication_trends(
search_term="machine learning",
client=client
)
# Sort by year
trends_sorted = sorted(trends, key=lambda x: x['key'])
for trend in trends_sorted[-10:]: # Last 10 years
year = trend['key']
count = trend['count']
print(f"{year}: {count} publications")
Analyzing Research Output
User query: "Analyze the research output of Stanford University from 2020-2024"
Approach: Multiple aggregations for comprehensive analysis
Python example:
from scripts.query_helpers import analyze_research_output
analysis = analyze_research_output(
entity_type='institution',
entity_name='Stanford University',
client=client,
years='2020-2024'
)
print(f"Institution: {analysis['entity_name']}")
print(f"Total works: {analysis['total_works']}")
print(f"Open access: {analysis['open_access_percentage']}%")
print("\nTop topics:")
for topic in analysis['top_topics'][:5]:
print(f" - {topic['key_display_name']}: {topic['count']} works")
Finding Papers by DOI (Batch)
User query: "Get information for these 10 DOIs: ..."
Approach: Batch lookup with pipe separator
Python example:
dois = [
"https://doi.org/10.1371/journal.pone.0266781",
"https://doi.org/10.1371/journal.pone.0267149",
"https://doi.org/10.1038/s41586-021-03819-2",
# ... up to 50 DOIs
]
works = client.batch_lookup(
entity_type='works',
ids=dois,
id_field='doi'
)
for work in works:
print(f"{work['title']} - {work['publication_year']}")
Random Sample of Papers
User query: "Give me 50 random papers from 2023"
Approach: Use sample parameter with seed for reproducibility
Python example:
works = client.sample_works(
sample_size=50,
seed=42, # For reproducibility
filter_params={
"publication_year": "2023",
"is_oa": "true"
}
)
print(f"Got {len(works)} random papers from 2023")
Papers from Multiple Institutions
User query: "Find papers with authors from both MIT and Stanford"
Approach: Use + operator for AND within same attribute
Python example:
# First, get institution IDs
mit_response = client._make_request(
'/institutions',
params={'search': 'MIT', 'per-page': 1}
)
mit_id = mit_response['results'][0]['id'].split('/')[-1]
stanford_response = client._make_request(
'/institutions',
params={'search': 'Stanford', 'per-page': 1}
)
stanford_id = stanford_response['results'][0]['id'].split('/')[-1]
# Find works with authors from both institutions
works = client.search_works(
filter_params={
"authorships.institutions.id": f"{mit_id}+{stanford_id}"
},
per_page=100
)
print(f"Found {works['meta']['count']} collaborative papers")
Papers in a Specific Journal
User query: "Get all papers from Nature published in 2023"
Approach: Two-step - find journal ID, then filter works
Python example:
# Step 1: Find journal source ID
source_response = client._make_request(
'/sources',
params={'search': 'Nature', 'per-page': 1}
)
source = source_response['results'][0]
source_id = source['id'].split('/')[-1]
print(f"Found journal: {source['display_name']} (ID: {source_id})")
# Step 2: Get works from that source
works = client.search_works(
filter_params={
"primary_location.source.id": source_id,
"publication_year": "2023"
},
per_page=200
)
print(f"Found {works['meta']['count']} papers from Nature in 2023")
Topic Analysis by Institution
User query: "What topics does MIT research most?"
Approach: Filter by institution, group by topics
Python example:
# Get MIT ID
inst_response = client._make_request(
'/institutions',
params={'search': 'MIT', 'per-page': 1}
)
mit_id = inst_response['results'][0]['id'].split('/')[-1]
# Group by topics
topics = client.group_by(
entity_type='works',
group_field='topics.id',
filter_params={
"authorships.institutions.id": mit_id,
"publication_year": ">2020"
}
)
print("Top research topics at MIT (2020+):")
for i, topic in enumerate(topics[:10], 1):
print(f"{i}. {topic['key_display_name']}: {topic['count']} works")
Citation Analysis
User query: "Find papers that cite this specific DOI"
Approach: Get work by DOI, then use cited_by_api_url
Python example:
# Get the work
doi = "https://doi.org/10.1038/s41586-021-03819-2"
work = client.get_entity('works', doi)
# Get papers that cite it
cited_by_url = work['cited_by_api_url']
# Extract just the query part and use it
import requests
response = requests.get(cited_by_url, params={'mailto': client.email})
citing_works = response.json()
print(f"{work['title']}")
print(f"Total citations: {work['cited_by_count']}")
print(f"\nRecent citing papers:")
for citing_work in citing_works['results'][:5]:
print(f" - {citing_work['title']} ({citing_work['publication_year']})")
Large-Scale Data Extraction
User query: "Get all papers on quantum computing from the last 3 years"
Approach: Paginate through all results
Python example:
all_papers = client.paginate_all(
endpoint='/works',
params={
'search': 'quantum computing',
'filter': 'publication_year:2022-2024'
},
max_results=10000 # Limit to prevent excessive API calls
)
print(f"Retrieved {len(all_papers)} papers")
# Save to CSV
import csv
with open('quantum_papers.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerow(['Title', 'Year', 'Citations', 'DOI', 'OA Status'])
for paper in all_papers:
writer.writerow([
paper['title'],
paper['publication_year'],
paper['cited_by_count'],
paper.get('doi', 'N/A'),
paper['open_access']['oa_status']
])
Complex Multi-Filter Query
User query: "Find recent, highly-cited, open access papers on AI from top institutions"
Approach: Combine multiple filters
Python example:
# Get IDs for top institutions
top_institutions = ['MIT', 'Stanford', 'Oxford']
inst_ids = []
for inst_name in top_institutions:
response = client._make_request(
'/institutions',
params={'search': inst_name, 'per-page': 1}
)
if response['results']:
inst_id = response['results'][0]['id'].split('/')[-1]
inst_ids.append(inst_id)
# Combine with pipe for OR
inst_filter = '|'.join(inst_ids)
# Complex query
works = client.search_works(
search="artificial intelligence",
filter_params={
"publication_year": ">2022",
"cited_by_count": ">50",
"is_oa": "true",
"authorships.institutions.id": inst_filter
},
sort="cited_by_count:desc",
per_page=200
)
print(f"Found {works['meta']['count']} papers matching criteria")
for work in works['results'][:10]:
print(f"{work['title']}")
print(f" Citations: {work['cited_by_count']}, Year: {work['publication_year']}")