In the biomedical field, there is an ever-increasing number of large, fragmented, and isolated data sources stored in databases and ontologies that use heterogeneous formats and poorly integrated schemes. Researchers and healthcare professionals find it extremely difficult to master this huge amount of data and extract relevant information. In this work, we propose a linked data approach, based on multilayer networks and semantic Web standards, capable of integrating and harmonizing several biomedical datasets with different schemas and semi-structured data through a multi-model database providing polyglot persistence. The domain chosen concerns the analysis and aggregation of available data on neuroendocrine neoplasms (NENs), a relatively rare type of neoplasm. Integrated information includes twelve public datasets available in heterogeneous schemas and formats including RDF, CSV, TSV, SQL, OWL, and OBO. The proposed integrated model consists of six interconnected layers representing, respectively, information on the disease, the related phenotypic alterations, the affected genes, the related biological processes, molecular functions, the involved human tissues, and drugs and compounds that show documented interactions with them. The defined scheme extends an existing three-layer model covering a subset of the mentioned aspects. A client-server application was also developed to browse and search for information on the integrated model. The main challenges of this work concern the complexity of the biomedical domain, the syntactic and semantic heterogeneity of the datasets, and the organization of the integrated model. Unlike related works, multilayer networks have been adopted to organize the model in a manageable and stratified structure, without the need to change the original datasets but by transforming their data "on the fly" to respond to user requests.

A Linked Data Application for Harmonizing Heterogeneous Biomedical Information

Capuano, N
;
2022-01-01

Abstract

In the biomedical field, there is an ever-increasing number of large, fragmented, and isolated data sources stored in databases and ontologies that use heterogeneous formats and poorly integrated schemes. Researchers and healthcare professionals find it extremely difficult to master this huge amount of data and extract relevant information. In this work, we propose a linked data approach, based on multilayer networks and semantic Web standards, capable of integrating and harmonizing several biomedical datasets with different schemas and semi-structured data through a multi-model database providing polyglot persistence. The domain chosen concerns the analysis and aggregation of available data on neuroendocrine neoplasms (NENs), a relatively rare type of neoplasm. Integrated information includes twelve public datasets available in heterogeneous schemas and formats including RDF, CSV, TSV, SQL, OWL, and OBO. The proposed integrated model consists of six interconnected layers representing, respectively, information on the disease, the related phenotypic alterations, the affected genes, the related biological processes, molecular functions, the involved human tissues, and drugs and compounds that show documented interactions with them. The defined scheme extends an existing three-layer model covering a subset of the mentioned aspects. A client-server application was also developed to browse and search for information on the integrated model. The main challenges of this work concern the complexity of the biomedical domain, the syntactic and semantic heterogeneity of the datasets, and the organization of the integrated model. Unlike related works, multilayer networks have been adopted to organize the model in a manageable and stratified structure, without the need to change the original datasets but by transforming their data "on the fly" to respond to user requests.
2022
File in questo prodotto:
File Dimensione Formato  
2022 MDPI-AS (offprint).pdf

solo utenti autorizzati

Tipologia: Pdf editoriale
Licenza: Versione editoriale
Dimensione 980.76 kB
Formato Adobe PDF
980.76 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11563/160228
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact