TY - JOUR
T1 - Pull Request Decisions Explained
T2 - An Empirical Overview
AU - Zhang, Xunhui
AU - Yu, Yue
AU - Georgios, Gousios
AU - Rastogi, Ayushi
PY - 2022
Y1 - 2022
N2 - Context: The pull-based development model is widely used in open source projects, leading to the emergence of trends in distributed software development. One aspect that has garnered significant attention concerning pull request decisions is the identification of explanatory factors. Objective: This study builds on a decade of research on pull request decisions and provides further insights. We empirically investigate how factors influence pull request decisions and the scenarios that change the influence of such factors. Method: We identify factors influencing pull request decisions on GitHub through a systematic literature review and infer them by mining archival data. We collect a total of 3,347,937 pull requests with 95 features from 11,230 diverse projects on GitHub. Using these data, we explore the relations among the factors and build mixed effects logistic regression models to empirically explain pull request decisions. Results: Our study shows that a small number of factors explain pull request decisions, with that concerning whether the integrator is the same as or different from the submitter being the most important factor. We also note that the influence of factors on pull request decisions change with a change in context; e.g., the area hotness of pull request is important only in the early stage of project development, however it becomes unimportant for pull request decisions as projects become mature.
AB - Context: The pull-based development model is widely used in open source projects, leading to the emergence of trends in distributed software development. One aspect that has garnered significant attention concerning pull request decisions is the identification of explanatory factors. Objective: This study builds on a decade of research on pull request decisions and provides further insights. We empirically investigate how factors influence pull request decisions and the scenarios that change the influence of such factors. Method: We identify factors influencing pull request decisions on GitHub through a systematic literature review and infer them by mining archival data. We collect a total of 3,347,937 pull requests with 95 features from 11,230 diverse projects on GitHub. Using these data, we explore the relations among the factors and build mixed effects logistic regression models to empirically explain pull request decisions. Results: Our study shows that a small number of factors explain pull request decisions, with that concerning whether the integrator is the same as or different from the submitter being the most important factor. We also note that the influence of factors on pull request decisions change with a change in context; e.g., the area hotness of pull request is important only in the early stage of project development, however it becomes unimportant for pull request decisions as projects become mature.
KW - pull-based development
KW - pull request decision
KW - distributed software development
KW - GitHub
UR - http://www.scopus.com/inward/record.url?scp=85127759401&partnerID=8YFLogxK
U2 - 10.1109/TSE.2022.3165056
DO - 10.1109/TSE.2022.3165056
M3 - Article
AN - SCOPUS:85127759401
JO - IEEE Transactions on Software Engineering
JF - IEEE Transactions on Software Engineering
SN - 0098-5589
M1 - 9749844
ER -