Abstract
We address the complex problem of associating several wearable devices with the spatio-temporal region of their wearers in video during crowded mingling events using only acceleration and proximity. This is a particularly important first step for multi-sensor behavior analysis using video and wearable technologies, where the privacy of the participants must be maintained. Most state-of-the-art works using these two modalities perform their association manually, which becomes practically unfeasible as the number of people in the scene increases. We proposed an automatic association method based on a hierarchical linear assignment optimization, which exploits the spatial context of the scene. Moreover, we present extensive experiments on matching from 2 to more than 69 acceleration and video streams, showing significant improvements over a random baseline in a real world crowded mingling scenario. We also show the effectiveness of our method for incomplete or missing streams (up to a certain limit) and analyze the trade-off between length of the streams and number of participants. Finally, we provide an analysis of failure cases, showing that deep understanding of the social actions within the context of the event is necessary to further improve performance on this intriguing task.
Original language | English |
---|---|
Pages (from-to) | 1867-1879 |
Number of pages | 13 |
Journal | IEEE Transactions on Multimedia |
Volume | 21 |
Issue number | 7 |
DOIs | |
Publication status | Published - 2019 |
Bibliographical note
Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-careOtherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.
Keywords
- acceleration
- association
- computer vision
- Mingling
- wearable sensor