A Dataset of Enterprise-Driven Open Source Software

Diomidis Spinellis, Zoe Kotti, Konstantinos Kravvaritis, Georgios Theodorou, Panos Louridas

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

4 Citations (Scopus)

Abstract

We present a dataset of open source software developed mainly by enterprises rather than volunteers. This can be used to address known generalizability concerns, and, also, to perform research on open source business software development. Based on the premise that an enterprise's employees are likely to contribute to a project developed by their organization using the email account provided by it, we mine domain names associated with enterprises from open data sources as well as through white-and blacklisting, and use them through three heuristics to identify 17 264 enterprise GitHub projects. We provide these as a dataset detailing their provenance and properties. A manual evaluation of a dataset sample shows an identification accuracy of 89%. Through an exploratory data analysis we found that projects are staffed by a plurality of enterprise insiders, who appear to be pulling more than their weight, and that in a small percentage of relatively large projects development happens exclusively through enterprise insiders.

Original languageEnglish
Title of host publicationProceedings - 2020 IEEE/ACM 17th International Conference on Mining Software Repositories, MSR 2020
PublisherAssociation for Computing Machinery (ACM)
Pages533-537
Number of pages5
ISBN (Electronic)9781450379571
DOIs
Publication statusPublished - 29 Jun 2020
Externally publishedYes
Event17th IEEE/ACM International Conference on Mining Software Repositories, MSR 2020, co-located with the 42nd International Conference on Software Engineering. ICSE 2020 - Virtual, Online, Korea, Republic of
Duration: 29 Jun 202030 Jun 2020

Publication series

NameProceedings - 2020 IEEE/ACM 17th International Conference on Mining Software Repositories, MSR 2020

Conference

Conference17th IEEE/ACM International Conference on Mining Software Repositories, MSR 2020, co-located with the 42nd International Conference on Software Engineering. ICSE 2020
Country/TerritoryKorea, Republic of
CityVirtual, Online
Period29/06/2030/06/20

Keywords

  • dataset
  • EDGAR
  • Fortune Global 500
  • open source software in business
  • SEC 10-K
  • SEC 20-F
  • software ecosystems
  • Software engineering economics

Fingerprint

Dive into the research topics of 'A Dataset of Enterprise-Driven Open Source Software'. Together they form a unique fingerprint.

Cite this