TY - JOUR
T1 - Predicting the age of researchers using bibliometric data
AU - Nane, Gabriela F.
AU - Larivière , Vincent
AU - Costas, Rodrigo
N1 - Accepted Author Manuscript
PY - 2017/8
Y1 - 2017/8
N2 - The age of researchers is a critical factor necessary to study the bibliometric characteristics of the scholars that produce new knowledge. In bibliometric studies, the age of scientific authors is generally missing; however, the year of the first publication is frequently considered as a proxy of the age of researchers. In this article, we investigate what are the most important bibibliometric factors that can be used to predict the age of researchers (birth and PhD age). Using a dataset of 3574 researchers from Québec for whom their Web of Science publications, year of birth and year of their PhD are known, our analysis falls under the linear regression setting and focuses on investigating the predictive power of various regression models rather than data fitting, considering also a breakdown by fields. The year of first publication proves to be the best linear predictor for the age of researchers. When using simple linear regression models, predicting birth and PhD years result in an error of about 3.7 years and 3.9 years, respectively. Including other bibliometric data marginally improves the predictive power of the regression models. A validation analysis for the field breakdown shows that the average length of the prediction intervals vary from 2.5 years for Basic Medical Sciences (for birth years) up to almost 10 years for Education (for PhD years). The average models perform significantly better than the models using individual observations. Nonetheless, the high variability of data and the uncertainty inherited by the models advice to caution when using linear regression models for predicting the age of researchers.
AB - The age of researchers is a critical factor necessary to study the bibliometric characteristics of the scholars that produce new knowledge. In bibliometric studies, the age of scientific authors is generally missing; however, the year of the first publication is frequently considered as a proxy of the age of researchers. In this article, we investigate what are the most important bibibliometric factors that can be used to predict the age of researchers (birth and PhD age). Using a dataset of 3574 researchers from Québec for whom their Web of Science publications, year of birth and year of their PhD are known, our analysis falls under the linear regression setting and focuses on investigating the predictive power of various regression models rather than data fitting, considering also a breakdown by fields. The year of first publication proves to be the best linear predictor for the age of researchers. When using simple linear regression models, predicting birth and PhD years result in an error of about 3.7 years and 3.9 years, respectively. Including other bibliometric data marginally improves the predictive power of the regression models. A validation analysis for the field breakdown shows that the average length of the prediction intervals vary from 2.5 years for Basic Medical Sciences (for birth years) up to almost 10 years for Education (for PhD years). The average models perform significantly better than the models using individual observations. Nonetheless, the high variability of data and the uncertainty inherited by the models advice to caution when using linear regression models for predicting the age of researchers.
UR - http://www.scopus.com/inward/record.url?scp=85020698838&partnerID=8YFLogxK
UR - http://resolver.tudelft.nl/uuid:3c5330d6-661f-48c4-8be5-cd66f8c880b0
U2 - 10.1016/j.joi.2017.05.002
DO - 10.1016/j.joi.2017.05.002
M3 - Article
AN - SCOPUS:85020698838
VL - 11
SP - 713
EP - 729
JO - Journal of Informetrics
JF - Journal of Informetrics
SN - 1751-1577
IS - 3
ER -