Describing Data Processing Pipelines in Scientific Publications for Big Data Injection

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

2 Citations (Scopus)


The rise of Big Data analytics has been a disruptive game changer for many application domains, allowing the integration into domain-specific applications and systems of insights and knowledge extracted from external big data sets. The effective ``injection'' of external Big Data demands an understanding of the properties of available data sets, and expertise on the available and most suitable methods for data collection, enrichment and analysis. A prominent knowledge source is scientific literature, where data processing pipelines are described, discussed, and evaluated. Such knowledge is however not readily accessible, due to its distributed and unstructured nature. In this paper, we propose a novel ontology aimed at modeling properties of data processing pipelines, and their related artifacts, as described in scientific publications. The ontology is the result of a requirement analysis that involved experts from both academia and industry. We showcase the effectiveness of our ontology by manually applying it to a collection of publications describing data processing methods.
Original languageEnglish
Title of host publicationSWM'17 Proceedings of the 1st Workshop on Scholary Web Mining
Place of PublicationCambridge, UK
PublisherAssociation for Computing Machinery (ACM)
Number of pages8
ISBN (Electronic)978-1-4503-5240-6
Publication statusPublished - 2017
EventWorkshop on Scholary Web Mining - Cambrdge Guild Hall, Cambridge, United Kingdom
Duration: 10 Feb 201710 Feb 2017


WorkshopWorkshop on Scholary Web Mining
Abbreviated titleSWM
Country/TerritoryUnited Kingdom
Otherin conjunction with WSDM2017: The International Conference on Web Search and Data Mining located in Cambridge
Internet address


  • Ontology
  • Document structure
  • Digital Libraries and archives


Dive into the research topics of 'Describing Data Processing Pipelines in Scientific Publications for Big Data Injection'. Together they form a unique fingerprint.

Cite this