Big data software analytics with Apache Spark

Georgios Gousios

doi:10.1145/3183440.3183458

Big data software analytics with Apache Spark

^*Corresponding author for this work

Software Engineering

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

10 Citations (Scopus)

Abstract

At the beginning of every research effort, researchers in empirical software engineering have to go through the processes of extracting data from raw data sources and transforming them to what their tools expect as inputs. This step is time consuming and error prone, while the produced artifacts (code, intermediate datasets) are usually not of scientific value. In the recent years, Apache Spark has emerged as a solid foundation for data science and has taken the big data analytics domain by storm. We believe that the primitives exposed by Apache Spark can help software engineering researchers create and share reproducible, high-performance data analysis pipelines. In our technical briefing, we discuss how researchers can profit from Apache Spark, through a hands-on case study.

Original language	English
Title of host publication	Proceedings of the 40th International Conference on Software Engineering, ICSE '18
Subtitle of host publication	Companion Proceedings
Place of Publication	New York, NY
Publisher	Association for Computing Machinery (ACM)
Pages	542-543
Number of pages	2
Volume	Part F137351
ISBN (Electronic)	978-1-4503-5663-3
DOIs	https://doi.org/10.1145/3183440.3183458
Publication status	Published - 2018
Event	ICSE 2018: 40th International Conference on Software Engineering - Gothenburg, Sweden Duration: 27 May 2018 → 3 Jun 2018 Conference number: 40 https://www.icse2018.org/

Conference

Conference	ICSE 2018
Country/Territory	Sweden
City	Gothenburg
Period	27/05/18 → 3/06/18
Internet address	https://www.icse2018.org/

Keywords

Apache Spark
Big data
Data analytics

Access to Document

10.1145/3183440.3183458

Cite this

@inproceedings{b074ba6632c0415eb694ad0f632aa27c,

title = "Big data software analytics with Apache Spark",

abstract = "At the beginning of every research effort, researchers in empirical software engineering have to go through the processes of extracting data from raw data sources and transforming them to what their tools expect as inputs. This step is time consuming and error prone, while the produced artifacts (code, intermediate datasets) are usually not of scientific value. In the recent years, Apache Spark has emerged as a solid foundation for data science and has taken the big data analytics domain by storm. We believe that the primitives exposed by Apache Spark can help software engineering researchers create and share reproducible, high-performance data analysis pipelines. In our technical briefing, we discuss how researchers can profit from Apache Spark, through a hands-on case study.",

keywords = "Apache Spark, Big data, Data analytics",

author = "Georgios Gousios",

year = "2018",

doi = "10.1145/3183440.3183458",

language = "English",

volume = "Part F137351",

pages = "542--543",

booktitle = "Proceedings of the 40th International Conference on Software Engineering, ICSE '18",

publisher = "Association for Computing Machinery (ACM)",

address = "United States",

note = "ICSE 2018 : 40th International Conference on Software Engineering ; Conference date: 27-05-2018 Through 03-06-2018",

url = "https://www.icse2018.org/",

}

Big data software analytics with Apache Spark. / Gousios, Georgios.
Proceedings of the 40th International Conference on Software Engineering, ICSE '18: Companion Proceedings. Vol. Part F137351 New York, NY: Association for Computing Machinery (ACM), 2018. p. 542-543.

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

TY - GEN

T1 - Big data software analytics with Apache Spark

AU - Gousios, Georgios

N1 - Conference code: 40

PY - 2018

Y1 - 2018

N2 - At the beginning of every research effort, researchers in empirical software engineering have to go through the processes of extracting data from raw data sources and transforming them to what their tools expect as inputs. This step is time consuming and error prone, while the produced artifacts (code, intermediate datasets) are usually not of scientific value. In the recent years, Apache Spark has emerged as a solid foundation for data science and has taken the big data analytics domain by storm. We believe that the primitives exposed by Apache Spark can help software engineering researchers create and share reproducible, high-performance data analysis pipelines. In our technical briefing, we discuss how researchers can profit from Apache Spark, through a hands-on case study.

AB - At the beginning of every research effort, researchers in empirical software engineering have to go through the processes of extracting data from raw data sources and transforming them to what their tools expect as inputs. This step is time consuming and error prone, while the produced artifacts (code, intermediate datasets) are usually not of scientific value. In the recent years, Apache Spark has emerged as a solid foundation for data science and has taken the big data analytics domain by storm. We believe that the primitives exposed by Apache Spark can help software engineering researchers create and share reproducible, high-performance data analysis pipelines. In our technical briefing, we discuss how researchers can profit from Apache Spark, through a hands-on case study.

KW - Apache Spark

KW - Big data

KW - Data analytics

UR - http://www.scopus.com/inward/record.url?scp=85049675827&partnerID=8YFLogxK

U2 - 10.1145/3183440.3183458

DO - 10.1145/3183440.3183458

M3 - Conference contribution

AN - SCOPUS:85049675827

VL - Part F137351

SP - 542

EP - 543

BT - Proceedings of the 40th International Conference on Software Engineering, ICSE '18

PB - Association for Computing Machinery (ACM)

CY - New York, NY

T2 - ICSE 2018

Y2 - 27 May 2018 through 3 June 2018

ER -

Big data software analytics with Apache Spark

Abstract

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this