Question generation systems aim to generate natural language questions that are relevant to a given piece of text, and can usually be answered by just considering this text. Prior works have identified a range of shortcomings (including semantic drift and exposure bias) and thus have turned to the reinforcement learning paradigm to improve the effectiveness of question generation. As part of it, different reward functions have been proposed. As typically these reward functions have been empirically investigated in different experimental settings (different datasets, models and parameters) we lack a common framework to fairly compare them. In this paper, we first categorize existing rewards systematically. We then provide such a fair empirical evaluation of different reward functions (including three we propose here for QG) in a common framework. We find rewards that model answerability to be the most effective.