Description
This dataset contains the following inside a tar.zst file:
A list of all Java repositories on GitHub in a CSV format
The POM.xml file from those repositories if there was one at the root of the repo
A sample of 500 000 repositories that
Have been searched recursively for POM.xml files
Of those that have a POM.xml file an 'effective' POM.xml has been created
Of those that have distribution repositories configured, GitHub workflow files if they exist
a report.json file that contains aggregate information of the sample
The scraper written to retrieve this data is also included.
This dataset was created for a Computer Science Bachelor Research Project titled "An analysis of Java release practices on GitHub" by Vivian Roest.
A list of all Java repositories on GitHub in a CSV format
The POM.xml file from those repositories if there was one at the root of the repo
A sample of 500 000 repositories that
Have been searched recursively for POM.xml files
Of those that have a POM.xml file an 'effective' POM.xml has been created
Of those that have distribution repositories configured, GitHub workflow files if they exist
a report.json file that contains aggregate information of the sample
The scraper written to retrieve this data is also included.
This dataset was created for a Computer Science Bachelor Research Project titled "An analysis of Java release practices on GitHub" by Vivian Roest.
Date made available | 29 Jan 2024 |
---|---|
Publisher | TU Delft - 4TU.ResearchData |
Date of data production | 2024 - |