Skip to content

Commit 13a19b7

Browse files
author
Sandra Mierz
authored
Openaire (#18)
* query openaire for person-works * query openaire for work-projects
1 parent caa3347 commit 13a19b7

File tree

5 files changed

+427
-3
lines changed

5 files changed

+427
-3
lines changed

README.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
[![DOI](https://zenodo.org/badge/447263093.svg)](https://zenodo.org/badge/latestdoi/447263093)
44
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/Project-TAPIR/pidgraph-notebooks/main)
55

6-
A collection of Jupyter notebooks with examples of querying different PID providers like [ORCID](https://orcid.org/), [ROR](https://ror.readme.io/), [Crossref](https://www.crossref.org/) and PID graphs like the [FREYA PID Graph](https://blog.datacite.org/powering-the-pid-graph/) and [OpenAlex](https://openalex.org/about) for connected objects.
6+
A collection of Jupyter notebooks with examples of querying different PID providers like [ORCID](https://orcid.org/), [ROR](https://ror.readme.io/), [Crossref](https://www.crossref.org/) and PID graphs like the [FREYA PID Graph](https://blog.datacite.org/powering-the-pid-graph/), [OpenAlex](https://openalex.org/about) and [OpenAIRE](https://www.openaire.eu/) for connected objects.
77

88
Currently included connections:
99
* organization-organization
@@ -17,7 +17,11 @@ Currently included connections:
1717
* person-works
1818
* input: ORCID
1919
* output: list of works authored/created by the person, each identified by their DOI
20-
* data sources: Crossref, FREYA PID Graph, OpenAlex, ORCID
20+
* data sources: Crossref, FREYA PID Graph, OpenAlex, ORCID, OpenAIRE
21+
* work-projects
22+
* input: DOI
23+
* output: list of projects the work was produced in, each identified by their OpenAIRE project ID
24+
* data sources: OpenAIRE
2125

2226

2327
Please navigate into the respective folder to see the list of available notebooks.

person-works/README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,4 +6,5 @@ Currently available PID Graphs:
66
* [Crossref](https://www.crossref.org/)
77
* [FREYA PID Graph](https://blog.datacite.org/powering-the-pid-graph/)
88
* [OpenAlex](https://openalex.org/about)
9-
* [ORCID](https://orcid.org/)
9+
* [ORCID](https://orcid.org/)
10+
* [OpenAIRE](https://www.openaire.eu/)
Lines changed: 198 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,198 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"### Query OpenAIRE for publications authored by a person\n",
8+
"This notebook queries the [OpenAIRE HTTP API](https://graph.openaire.eu/develop/api.html) via its `/publications` endpoint for publications authored by a person. It takes an ORCID iD as input which is used to filter for publications where one of the creators' `orcid` field matches the given ORCID iD. From the resulting list of publications we output all DOIs.\n",
9+
"\n",
10+
"*Note:\n",
11+
"The API has several different endpoints for research outputs: they are divided into publications, research data, software metadata and other research products, so to get a full picture about a person's research output, you would have to query all of these endpoints and union their results.*"
12+
]
13+
},
14+
{
15+
"cell_type": "code",
16+
"execution_count": 1,
17+
"metadata": {
18+
"pycharm": {
19+
"name": "#%%\n"
20+
}
21+
},
22+
"outputs": [],
23+
"source": [
24+
"# Prerequisites:\n",
25+
"import requests # dependency for making HTTP calls\n",
26+
"from benedict import benedict # dependency for dealing with json"
27+
]
28+
},
29+
{
30+
"cell_type": "markdown",
31+
"metadata": {
32+
"collapsed": true,
33+
"pycharm": {
34+
"name": "#%% md\n"
35+
}
36+
},
37+
"source": [
38+
"The input for this notebook is an ORCID iD, e.g. '`0000-0003-2499-7741`'."
39+
]
40+
},
41+
{
42+
"cell_type": "code",
43+
"execution_count": 2,
44+
"metadata": {
45+
"pycharm": {
46+
"name": "#%%\n"
47+
}
48+
},
49+
"outputs": [],
50+
"source": [
51+
"# input parameter\n",
52+
"example_orcid_id=\"0000-0003-2499-7741\""
53+
]
54+
},
55+
{
56+
"cell_type": "markdown",
57+
"metadata": {},
58+
"source": [
59+
"We use it to query the OpenAIRE HTTP API for publications that specified the ORCID iD within their metadata in one of the creators `orcid` field. Since the API uses pagination, we need to loop through all pages to get the complete result set."
60+
]
61+
},
62+
{
63+
"cell_type": "code",
64+
"execution_count": 3,
65+
"metadata": {
66+
"pycharm": {
67+
"name": "#%%\n"
68+
}
69+
},
70+
"outputs": [],
71+
"source": [
72+
"# OpenAIRE endpoint to query for publications\n",
73+
"OPENAIRE_API_PUBLICATIONS = \"https://api.openaire.eu/search/publications\"\n",
74+
"\n",
75+
"# query OpenAIRE for all publications that are connected to orcid\n",
76+
"def query_openaire_for_person2publications(orcid_id):\n",
77+
" page = 1\n",
78+
" max_page = 1\n",
79+
"\n",
80+
" while page <= max_page:\n",
81+
" params = {'orcid': orcid_id, 'page': page, 'format': \"json\"}\n",
82+
" response = requests.get(url=OPENAIRE_API_PUBLICATIONS,\n",
83+
" params=params)\n",
84+
" response.raise_for_status()\n",
85+
" result=response.json()\n",
86+
"\n",
87+
" # calculate max page number in first loop\n",
88+
" if max_page == 1:\n",
89+
" max_page = determine_max_page(result)\n",
90+
" page = page + 1\n",
91+
" yield result\n",
92+
"\n",
93+
"# calculate max number of result pages\n",
94+
"def determine_max_page(response_data):\n",
95+
" response_dict = benedict.from_json(response_data)\n",
96+
" items_total = response_dict.get('response.header.total.$')\n",
97+
" items_per_page = response_dict.get('response.header.size.$')\n",
98+
" max_page_ceil = items_total // items_per_page + bool(items_total % items_per_page)\n",
99+
" return max_page_ceil\n",
100+
"\n",
101+
"\n",
102+
"# ---- example execution\n",
103+
"list_of_pages=query_openaire_for_person2publications(example_orcid_id)"
104+
]
105+
},
106+
{
107+
"cell_type": "markdown",
108+
"metadata": {},
109+
"source": [
110+
"From the resulting list of publications we extract and print out each title and DOI. \n",
111+
"\n",
112+
"*Note: publications that do not have a DOI assigned, will not be printed.*"
113+
]
114+
},
115+
{
116+
"cell_type": "code",
117+
"execution_count": 4,
118+
"metadata": {},
119+
"outputs": [
120+
{
121+
"name": "stdout",
122+
"output_type": "stream",
123+
"text": [
124+
"Number of publications found: 6\n",
125+
"\n",
126+
"10.15488/11463, Roadmap to FAIR Research Information in Open Infrastructures\n",
127+
"10.1515/bd.2006.40.4.466, Informationsvermittlung: Personalisiertes Lernen in der Bibliothek: das Düsseldorfer Online-Tutorial (DOT) Informationskompetenz\n",
128+
"10.1080/00048623.2006.10755322, Teaching Information Literacy with the Lerninformationssystem\n",
129+
"10.3389/frma.2021.694307, Enhancing Knowledge Graph Extraction and Validation From Scholarly Publications Using Bibliographic Metadata\n",
130+
"10.3897/rio.7.e66264, OPTIMETA – Strengthening the Open Access publishing system through open citations and spatiotemporal metadata\n",
131+
"10.1016/j.procs.2019.01.074, The Research Core Dataset (KDSF) in the Linked Data context\n"
132+
]
133+
}
134+
],
135+
"source": [
136+
"# from the result pages, extract the data about each publication\n",
137+
"def extract_publications_from_page(page):\n",
138+
" return [pub for pub in benedict.from_json(page).get('response.results.result') or []]\n",
139+
"\n",
140+
"# extract DOI from publication\n",
141+
"def extract_doi(pub):\n",
142+
" oaf_result=benedict.from_json(pub).get('metadata.oaf:entity.oaf:result')\n",
143+
"\n",
144+
" # unfortunately the json data is inconsistently modeled:\n",
145+
" # if there is one pid/title for a publication, it is a json object\n",
146+
" # if there are multiple pids/titles for a publication, they form a json list\n",
147+
" pids=oaf_result.get('pid') or []\n",
148+
" is_doi = lambda pid: pid.get('@classid')==\"doi\"\n",
149+
" if isinstance(pids, list):\n",
150+
" dois=[pid['$'] for pid in pids if is_doi(pid)]\n",
151+
" else:\n",
152+
" dois= [pids['$']] if is_doi(pids) else []\n",
153+
" doi=dois[0] if dois else None # pick the first one\n",
154+
" \n",
155+
" titles=oaf_result.get('title') or []\n",
156+
" is_main_title = lambda title: title.get('@classid')==\"main title\"\n",
157+
" if isinstance(titles, list):\n",
158+
" main_titles=[title['$'] for title in titles if is_main_title(title)]\n",
159+
" else:\n",
160+
" main_titles=[titles['$']] if is_main_title(titles) else []\n",
161+
" title=main_titles[0] if main_titles else None # pick the first one\n",
162+
"\n",
163+
" return doi, title\n",
164+
"\n",
165+
"\n",
166+
"#--- example execution\n",
167+
"for page in list_of_pages or []:\n",
168+
" publications=extract_publications_from_page(page)\n",
169+
" print(f\"Number of publications found: {len(publications)}\\n\")\n",
170+
" for pub in publications:\n",
171+
" doi,title = extract_doi(pub)\n",
172+
" if doi:\n",
173+
" print(f\"{doi}, {title}\")"
174+
]
175+
}
176+
],
177+
"metadata": {
178+
"kernelspec": {
179+
"display_name": "Python 3 (ipykernel)",
180+
"language": "python",
181+
"name": "python3"
182+
},
183+
"language_info": {
184+
"codemirror_mode": {
185+
"name": "ipython",
186+
"version": 3
187+
},
188+
"file_extension": ".py",
189+
"mimetype": "text/x-python",
190+
"name": "python",
191+
"nbconvert_exporter": "python",
192+
"pygments_lexer": "ipython3",
193+
"version": "3.9.6"
194+
}
195+
},
196+
"nbformat": 4,
197+
"nbformat_minor": 1
198+
}

work-projects/README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
## work-projects
2+
3+
A Jupyter notebook showing an example of using a persistent identifier for a publication (DOI)
4+
as input for retrieving the project a work was produced in (identified by its OpenAIRE project ID).
5+
6+
* [OpenAIRE](https://www.openaire.eu/)

0 commit comments

Comments
 (0)