An important aspect of human emotion perception is the use of contextual information to understand others' feelings even in situations where their behavior is not very expressive or has an emotionally ambiguous meaning. For technology to successfully detect affect, it must mimic this human ability when analyzing audiovisual input. Databases upon which machine learning algorithms are trained should capture the context of social interactions as well as the behavior expressed in them. However, there is a lack of consensus about what constitutes relevant context in such databases. In this article, we make two contributions towards overcoming this challenge: (a) we identify two principal sources of context for emotion perceptions based on psychological theory, and (b) we provide an overview of how each of these has been considered in published databases covering social interactions. Our results show that a similar set of contextual features are present across the reviewed databases. Between all the different databases researchers seem to have taken into account a set of contextual features reflecting the sources of context seen in psychological theory. However, within individual databases, these features are not yet systematically varied. This is problematic because it prevents them from being used directly as resources for the modeling of context-sensitive affect detection. Based on our findings, we suggest improvements for the future development of affective databases.