Big data software analytics with Apache Spark

Georgios Gousios*

*Corresponding author for this work

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

10 Citations (Scopus)


At the beginning of every research effort, researchers in empirical software engineering have to go through the processes of extracting data from raw data sources and transforming them to what their tools expect as inputs. This step is time consuming and error prone, while the produced artifacts (code, intermediate datasets) are usually not of scientific value. In the recent years, Apache Spark has emerged as a solid foundation for data science and has taken the big data analytics domain by storm. We believe that the primitives exposed by Apache Spark can help software engineering researchers create and share reproducible, high-performance data analysis pipelines. In our technical briefing, we discuss how researchers can profit from Apache Spark, through a hands-on case study.

Original languageEnglish
Title of host publicationProceedings of the 40th International Conference on Software Engineering, ICSE '18
Subtitle of host publicationCompanion Proceedings
Place of PublicationNew York, NY
PublisherAssociation for Computing Machinery (ACM)
Number of pages2
VolumePart F137351
ISBN (Electronic)978-1-4503-5663-3
Publication statusPublished - 2018
EventICSE 2018: 40th International Conference on Software Engineering - Gothenburg, Sweden
Duration: 27 May 20183 Jun 2018
Conference number: 40


ConferenceICSE 2018
Internet address


  • Apache Spark
  • Big data
  • Data analytics


Dive into the research topics of 'Big data software analytics with Apache Spark'. Together they form a unique fingerprint.

Cite this