Semantic Annotation of Data Processing Pipelines in Scientific Publications

Sepideh Mesbah, Kyriakos Fragkeskos, Christoph Lofi, Alessandro Bozzon, Geert-Jan Houben

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

11 Citations (Scopus)


Data processing pipelines are a core object of interest for data scientist and practitioners operating in a variety of data-related application domains. To effectively capitalise on the experience gained in the creation and adoption of such pipelines, the need arises for mechanisms able to capture knowledge about datasets of interest, data processing methods designed to achieve a given goal, and the performance achieved when applying such methods to the considered datasets. However, due to its distributed and often unstructured nature, this knowledge is not easily accessible. In this paper, we use (scientific) publications as source of knowledge about Data Processing Pipelines. We describe a method designed to classify sentences according to the nature of the contained information (i.e. scientific objective, dataset, method, software, result), and to extract relevant named entities. The extracted information is then semantically annotated and published as linked data in open knowledge repositories according to the DMS ontology for data processing metadata. To demonstrate the effectiveness and performance of our approach, we present the results of a quantitative and qualitative analysis performed on four different conference series.
Original languageEnglish
Title of host publicationThe Semantic Web
Subtitle of host publication14th International Conference, ESWC 2017, Proceedings Part 1
EditorsEva Blomqvist, Diana Maynard, Aldo Gangemi, Rinke Hoekstra, Pascal Hitzler, Olaf Hartig
Place of PublicationCham
Number of pages16
ISBN (Electronic)978-3-319-58068-5
ISBN (Print)978-3-319-58067-8
Publication statusPublished - 16 May 2017
EventExtended Semantic Web Conference - Portorož, Slovenia
Duration: 28 May 20171 Jun 2017
Conference number: 14

Publication series

NameLecture Notes in Computer Science
ISSN (Print)0302-9743


ConferenceExtended Semantic Web Conference
Abbreviated titleESWC 2017
Internet address


Dive into the research topics of 'Semantic Annotation of Data Processing Pipelines in Scientific Publications'. Together they form a unique fingerprint.

Cite this