# Common OpenAlex Query Examples This document provides practical examples for common research queries using OpenAlex. ## Finding Papers by Author **User query**: "Find papers by Albert Einstein" **Approach**: Two-step pattern 1. Search for author to get ID 2. Filter works by author ID **Python example**: ```python from scripts.openalex_client import OpenAlexClient from scripts.query_helpers import find_author_works client = OpenAlexClient(email="your-email@example.edu") works = find_author_works("Albert Einstein", client, limit=100) for work in works: print(f"{work['title']} ({work['publication_year']})") ``` ## Finding Papers from an Institution **User query**: "What papers has MIT published in the last year?" **Approach**: Two-step pattern with date filter 1. Search for institution to get ID 2. Filter works by institution ID and year **Python example**: ```python from scripts.query_helpers import find_institution_works works = find_institution_works("MIT", client, limit=200) # Filter for recent papers import datetime current_year = datetime.datetime.now().year recent_works = [w for w in works if w['publication_year'] == current_year] ``` ## Highly Cited Papers on a Topic **User query**: "Find the most cited papers on CRISPR from the last 5 years" **Approach**: Search + filter + sort **Python example**: ```python works = client.search_works( search="CRISPR", filter_params={ "publication_year": ">2019" }, sort="cited_by_count:desc", per_page=100 ) for work in works['results']: title = work['title'] citations = work['cited_by_count'] year = work['publication_year'] print(f"{title} ({year}): {citations} citations") ``` ## Open Access Papers on a Topic **User query**: "Find open access papers about climate change" **Approach**: Search + OA filter **Python example**: ```python from scripts.query_helpers import get_open_access_papers papers = get_open_access_papers( search_term="climate change", client=client, oa_status="any", # or "gold", "green", "hybrid", "bronze" limit=200 ) for paper in papers: print(f"{paper['title']}") print(f" OA Status: {paper['open_access']['oa_status']}") print(f" URL: {paper['open_access']['oa_url']}") ``` ## Publication Trends Analysis **User query**: "Show me publication trends for machine learning over the years" **Approach**: Use group_by to aggregate by year **Python example**: ```python from scripts.query_helpers import get_publication_trends trends = get_publication_trends( search_term="machine learning", client=client ) # Sort by year trends_sorted = sorted(trends, key=lambda x: x['key']) for trend in trends_sorted[-10:]: # Last 10 years year = trend['key'] count = trend['count'] print(f"{year}: {count} publications") ``` ## Analyzing Research Output **User query**: "Analyze the research output of Stanford University from 2020-2024" **Approach**: Multiple aggregations for comprehensive analysis **Python example**: ```python from scripts.query_helpers import analyze_research_output analysis = analyze_research_output( entity_type='institution', entity_name='Stanford University', client=client, years='2020-2024' ) print(f"Institution: {analysis['entity_name']}") print(f"Total works: {analysis['total_works']}") print(f"Open access: {analysis['open_access_percentage']}%") print("\nTop topics:") for topic in analysis['top_topics'][:5]: print(f" - {topic['key_display_name']}: {topic['count']} works") ``` ## Finding Papers by DOI (Batch) **User query**: "Get information for these 10 DOIs: ..." **Approach**: Batch lookup with pipe separator **Python example**: ```python dois = [ "https://doi.org/10.1371/journal.pone.0266781", "https://doi.org/10.1371/journal.pone.0267149", "https://doi.org/10.1038/s41586-021-03819-2", # ... up to 50 DOIs ] works = client.batch_lookup( entity_type='works', ids=dois, id_field='doi' ) for work in works: print(f"{work['title']} - {work['publication_year']}") ``` ## Random Sample of Papers **User query**: "Give me 50 random papers from 2023" **Approach**: Use sample parameter with seed for reproducibility **Python example**: ```python works = client.sample_works( sample_size=50, seed=42, # For reproducibility filter_params={ "publication_year": "2023", "is_oa": "true" } ) print(f"Got {len(works)} random papers from 2023") ``` ## Papers from Multiple Institutions **User query**: "Find papers with authors from both MIT and Stanford" **Approach**: Use + operator for AND within same attribute **Python example**: ```python # First, get institution IDs mit_response = client._make_request( '/institutions', params={'search': 'MIT', 'per-page': 1} ) mit_id = mit_response['results'][0]['id'].split('/')[-1] stanford_response = client._make_request( '/institutions', params={'search': 'Stanford', 'per-page': 1} ) stanford_id = stanford_response['results'][0]['id'].split('/')[-1] # Find works with authors from both institutions works = client.search_works( filter_params={ "authorships.institutions.id": f"{mit_id}+{stanford_id}" }, per_page=100 ) print(f"Found {works['meta']['count']} collaborative papers") ``` ## Papers in a Specific Journal **User query**: "Get all papers from Nature published in 2023" **Approach**: Two-step - find journal ID, then filter works **Python example**: ```python # Step 1: Find journal source ID source_response = client._make_request( '/sources', params={'search': 'Nature', 'per-page': 1} ) source = source_response['results'][0] source_id = source['id'].split('/')[-1] print(f"Found journal: {source['display_name']} (ID: {source_id})") # Step 2: Get works from that source works = client.search_works( filter_params={ "primary_location.source.id": source_id, "publication_year": "2023" }, per_page=200 ) print(f"Found {works['meta']['count']} papers from Nature in 2023") ``` ## Topic Analysis by Institution **User query**: "What topics does MIT research most?" **Approach**: Filter by institution, group by topics **Python example**: ```python # Get MIT ID inst_response = client._make_request( '/institutions', params={'search': 'MIT', 'per-page': 1} ) mit_id = inst_response['results'][0]['id'].split('/')[-1] # Group by topics topics = client.group_by( entity_type='works', group_field='topics.id', filter_params={ "authorships.institutions.id": mit_id, "publication_year": ">2020" } ) print("Top research topics at MIT (2020+):") for i, topic in enumerate(topics[:10], 1): print(f"{i}. {topic['key_display_name']}: {topic['count']} works") ``` ## Citation Analysis **User query**: "Find papers that cite this specific DOI" **Approach**: Get work by DOI, then use cited_by_api_url **Python example**: ```python # Get the work doi = "https://doi.org/10.1038/s41586-021-03819-2" work = client.get_entity('works', doi) # Get papers that cite it cited_by_url = work['cited_by_api_url'] # Extract just the query part and use it import requests response = requests.get(cited_by_url, params={'mailto': client.email}) citing_works = response.json() print(f"{work['title']}") print(f"Total citations: {work['cited_by_count']}") print(f"\nRecent citing papers:") for citing_work in citing_works['results'][:5]: print(f" - {citing_work['title']} ({citing_work['publication_year']})") ``` ## Large-Scale Data Extraction **User query**: "Get all papers on quantum computing from the last 3 years" **Approach**: Paginate through all results **Python example**: ```python all_papers = client.paginate_all( endpoint='/works', params={ 'search': 'quantum computing', 'filter': 'publication_year:2022-2024' }, max_results=10000 # Limit to prevent excessive API calls ) print(f"Retrieved {len(all_papers)} papers") # Save to CSV import csv with open('quantum_papers.csv', 'w', newline='') as f: writer = csv.writer(f) writer.writerow(['Title', 'Year', 'Citations', 'DOI', 'OA Status']) for paper in all_papers: writer.writerow([ paper['title'], paper['publication_year'], paper['cited_by_count'], paper.get('doi', 'N/A'), paper['open_access']['oa_status'] ]) ``` ## Complex Multi-Filter Query **User query**: "Find recent, highly-cited, open access papers on AI from top institutions" **Approach**: Combine multiple filters **Python example**: ```python # Get IDs for top institutions top_institutions = ['MIT', 'Stanford', 'Oxford'] inst_ids = [] for inst_name in top_institutions: response = client._make_request( '/institutions', params={'search': inst_name, 'per-page': 1} ) if response['results']: inst_id = response['results'][0]['id'].split('/')[-1] inst_ids.append(inst_id) # Combine with pipe for OR inst_filter = '|'.join(inst_ids) # Complex query works = client.search_works( search="artificial intelligence", filter_params={ "publication_year": ">2022", "cited_by_count": ">50", "is_oa": "true", "authorships.institutions.id": inst_filter }, sort="cited_by_count:desc", per_page=200 ) print(f"Found {works['meta']['count']} papers matching criteria") for work in works['results'][:10]: print(f"{work['title']}") print(f" Citations: {work['cited_by_count']}, Year: {work['publication_year']}") ```