Data underlying the BSc project: "An analysis of Java release practices on GitHub"

  • V.C. Roest (Creator)

Dataset

Description

This dataset contains the following inside a tar.zst file:

A list of all Java repositories on GitHub in a CSV format
The POM.xml file from those repositories if there was one at the root of the repo
A sample of 500 000 repositories that
Have been searched recursively for POM.xml files
Of those that have a POM.xml file an 'effective' POM.xml has been created
Of those that have distribution repositories configured, GitHub workflow files if they exist
a report.json file that contains aggregate information of the sample


The scraper written to retrieve this data is also included.


This dataset was created for a Computer Science Bachelor Research Project titled "An analysis of Java release practices on GitHub" by Vivian Roest.
Date made available29 Jan 2024
PublisherTU Delft - 4TU.ResearchData
Date of data production2024 -

Cite this