Describing Data Processing Pipelines in Scientific Publications for Big Data Injection

Sepideh Mesbah; Alessandro Bozzon; Christoph Lofi; Geert-Jan Houben

doi:10.1145/3057148.3057149

Describing Data Processing Pipelines in Scientific Publications for Big Data Injection

Sepideh Mesbah, Alessandro Bozzon, Christoph Lofi, Geert-Jan Houben

Web Information Systems

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

2 Citations (Scopus)

Abstract

The rise of Big Data analytics has been a disruptive game changer for many application domains, allowing the integration into domain-specific applications and systems of insights and knowledge extracted from external big data sets. The effective ``injection'' of external Big Data demands an understanding of the properties of available data sets, and expertise on the available and most suitable methods for data collection, enrichment and analysis. A prominent knowledge source is scientific literature, where data processing pipelines are described, discussed, and evaluated. Such knowledge is however not readily accessible, due to its distributed and unstructured nature. In this paper, we propose a novel ontology aimed at modeling properties of data processing pipelines, and their related artifacts, as described in scientific publications. The ontology is the result of a requirement analysis that involved experts from both academia and industry. We showcase the effectiveness of our ontology by manually applying it to a collection of publications describing data processing methods.

Original language	English
Title of host publication	SWM'17 Proceedings of the 1st Workshop on Scholary Web Mining
Place of Publication	Cambridge, UK
Publisher	Association for Computing Machinery (ACM)
Pages	1-8
Number of pages	8
ISBN (Electronic)	978-1-4503-5240-6
DOIs	https://doi.org/10.1145/3057148.3057149
Publication status	Published - 2017
Event	Workshop on Scholary Web Mining - Cambrdge Guild Hall, Cambridge, United Kingdom Duration: 10 Feb 2017 → 10 Feb 2017 https://ornlcda.github.io/SWM2017/

Workshop

Workshop	Workshop on Scholary Web Mining
Abbreviated title	SWM
Country/Territory	United Kingdom
City	Cambridge
Period	10/02/17 → 10/02/17
Other	in conjunction with WSDM2017: The International Conference on Web Search and Data Mining located in Cambridge
Internet address	https://ornlcda.github.io/SWM2017/

Keywords

Ontology
Document structure
Digital Libraries and archives

Access to Document

10.1145/3057148.3057149

Cite this

@inproceedings{71c4c6f4a5b54b28a4b5fe59b8adb674,

title = "Describing Data Processing Pipelines in Scientific Publications for Big Data Injection",

abstract = "The rise of Big Data analytics has been a disruptive game changer for many application domains, allowing the integration into domain-specific applications and systems of insights and knowledge extracted from external big data sets. The effective ``injection'' of external Big Data demands an understanding of the properties of available data sets, and expertise on the available and most suitable methods for data collection, enrichment and analysis. A prominent knowledge source is scientific literature, where data processing pipelines are described, discussed, and evaluated. Such knowledge is however not readily accessible, due to its distributed and unstructured nature. In this paper, we propose a novel ontology aimed at modeling properties of data processing pipelines, and their related artifacts, as described in scientific publications. The ontology is the result of a requirement analysis that involved experts from both academia and industry. We showcase the effectiveness of our ontology by manually applying it to a collection of publications describing data processing methods.",

keywords = "Ontology, Document structure, Digital Libraries and archives",

author = "Sepideh Mesbah and Alessandro Bozzon and Christoph Lofi and Geert-Jan Houben",

year = "2017",

doi = "10.1145/3057148.3057149",

language = "English",

pages = "1--8",

booktitle = "SWM'17 Proceedings of the 1st Workshop on Scholary Web Mining",

publisher = "Association for Computing Machinery (ACM)",

address = "United States",

note = "Workshop on Scholary Web Mining, SWM ; Conference date: 10-02-2017 Through 10-02-2017",

url = "https://ornlcda.github.io/SWM2017/",

}

Mesbah, S, Bozzon, A , Lofi, C & Houben, G-J 2017, Describing Data Processing Pipelines in Scientific Publications for Big Data Injection. in SWM'17 Proceedings of the 1st Workshop on Scholary Web Mining. Association for Computing Machinery (ACM), Cambridge, UK, pp. 1-8, Workshop on Scholary Web Mining, Cambridge, United Kingdom, 10/02/17. https://doi.org/10.1145/3057148.3057149

Describing Data Processing Pipelines in Scientific Publications for Big Data Injection. / Mesbah, Sepideh; Bozzon, Alessandro ; Lofi, Christoph et al.
SWM'17 Proceedings of the 1st Workshop on Scholary Web Mining. Cambridge, UK: Association for Computing Machinery (ACM), 2017. p. 1-8.

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

TY - GEN

T1 - Describing Data Processing Pipelines in Scientific Publications for Big Data Injection

AU - Mesbah, Sepideh

AU - Bozzon, Alessandro

AU - Lofi, Christoph

AU - Houben, Geert-Jan

PY - 2017

Y1 - 2017

N2 - The rise of Big Data analytics has been a disruptive game changer for many application domains, allowing the integration into domain-specific applications and systems of insights and knowledge extracted from external big data sets. The effective ``injection'' of external Big Data demands an understanding of the properties of available data sets, and expertise on the available and most suitable methods for data collection, enrichment and analysis. A prominent knowledge source is scientific literature, where data processing pipelines are described, discussed, and evaluated. Such knowledge is however not readily accessible, due to its distributed and unstructured nature. In this paper, we propose a novel ontology aimed at modeling properties of data processing pipelines, and their related artifacts, as described in scientific publications. The ontology is the result of a requirement analysis that involved experts from both academia and industry. We showcase the effectiveness of our ontology by manually applying it to a collection of publications describing data processing methods.

AB - The rise of Big Data analytics has been a disruptive game changer for many application domains, allowing the integration into domain-specific applications and systems of insights and knowledge extracted from external big data sets. The effective ``injection'' of external Big Data demands an understanding of the properties of available data sets, and expertise on the available and most suitable methods for data collection, enrichment and analysis. A prominent knowledge source is scientific literature, where data processing pipelines are described, discussed, and evaluated. Such knowledge is however not readily accessible, due to its distributed and unstructured nature. In this paper, we propose a novel ontology aimed at modeling properties of data processing pipelines, and their related artifacts, as described in scientific publications. The ontology is the result of a requirement analysis that involved experts from both academia and industry. We showcase the effectiveness of our ontology by manually applying it to a collection of publications describing data processing methods.

KW - Ontology

KW - Document structure

KW - Digital Libraries and archives

UR - https://ornlcda.github.io/SWM2017/

U2 - 10.1145/3057148.3057149

DO - 10.1145/3057148.3057149

M3 - Conference contribution

SP - 1

EP - 8

BT - SWM'17 Proceedings of the 1st Workshop on Scholary Web Mining

PB - Association for Computing Machinery (ACM)

CY - Cambridge, UK

T2 - Workshop on Scholary Web Mining

Y2 - 10 February 2017 through 10 February 2017

ER -

Describing Data Processing Pipelines in Scientific Publications for Big Data Injection

Abstract

Workshop

Keywords

Access to Document

Other files and links

Fingerprint

Cite this