Running event visualization using videos from multiple cameras

Yeshwanth Napolean; Priadi T. Wibowo; Jan C. Van Gemert

doi:10.1145/3347318.3355528

Running event visualization using videos from multiple cameras

Yeshwanth Napolean, Priadi T. Wibowo, Jan C. Van Gemert

Pattern Recognition and Bioinformatics

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

5 Citations (Scopus)

Abstract

Visualizing the trajectory of multiple runners with videos collected at different points in a race could be useful for sports performance analysis. The videos and the trajectories can also aid in athlete health monitoring. While the runners unique ID and their appearance are distinct, the task is not straightforward because the video data does not contain explicit information as to which runners appear in each of the videos. There is no direct supervision of the model in tracking athletes, only filtering steps to remove irrelevant detections. Other factors of concern include occlusion of runners and harsh illumination. To this end, we identify two methods for runner identification at different points of the event, for determining their trajectory. One is scene text detection which recognizes the runners by detecting a unique’bib number’ attached to their clothes and the other is person re-identification which detects the runners based on their appearance. We train our method without ground truth but to evaluate the proposed methods, we create a ground truth database which consists of video and frame interval information where the runners appear. The videos in the dataset was recorded by nine cameras at different locations during the a marathon event. This data is annotated with bib numbers of runners appearing in each video. The bib numbers of runners known to occur in the frame are used to filter irrelevant text and numbers detected. Except for this filtering step, no supervisory signal is used. The experimental evidence shows that the scene text recognition method achieves an F1-score of 74. Combining the two methods, that is - using samples collected by text spotter to train the re-identification model yields a higher F1-score of 85.8. Re-training the person re-identification model with identified inliers yields a slight improvement in performance(F1 score of 87.8). This combination of text recognition and person re-identification can be used in conjunction with video metadata to visualize running events.

Original language	English
Title of host publication	MMSports 2019 - Proceedings of the 2nd International Workshop on Multimedia Content Analysis in Sports, co-located with MM 2019
Place of Publication	New York
Publisher	Association for Computing Machinery (ACM)
Pages	82-90
Number of pages	9
ISBN (Electronic)	9781450369114
DOIs	https://doi.org/10.1145/3347318.3355528
Publication status	Published - 15 Oct 2019
Event	2nd ACM International Workshop on Multimedia Content Analysis in Sports, MMSports 2019, co-located with ACM Multimedia 2019 - Nice, France Duration: 25 Oct 2019 → 25 Oct 2019

Publication series

Name	MMSports 2019 - Proceedings of the 2nd International Workshop on Multimedia Content Analysis in Sports, co-located with MM 2019

Conference

Conference	2nd ACM International Workshop on Multimedia Content Analysis in Sports, MMSports 2019, co-located with ACM Multimedia 2019
Country/Territory	France
City	Nice
Period	25/10/19 → 25/10/19

Keywords

Person re-identification
Runners
Text recognition
Visualization

Access to Document

10.1145/3347318.3355528

Cite this

Napolean, Y., Wibowo, P. T., & Van Gemert, J. C. (2019). Running event visualization using videos from multiple cameras. In MMSports 2019 - Proceedings of the 2nd International Workshop on Multimedia Content Analysis in Sports, co-located with MM 2019 (pp. 82-90). (MMSports 2019 - Proceedings of the 2nd International Workshop on Multimedia Content Analysis in Sports, co-located with MM 2019). Association for Computing Machinery (ACM). https://doi.org/10.1145/3347318.3355528

Napolean, Yeshwanth ; Wibowo, Priadi T. ; Van Gemert, Jan C. / Running event visualization using videos from multiple cameras. MMSports 2019 - Proceedings of the 2nd International Workshop on Multimedia Content Analysis in Sports, co-located with MM 2019. New York : Association for Computing Machinery (ACM), 2019. pp. 82-90 (MMSports 2019 - Proceedings of the 2nd International Workshop on Multimedia Content Analysis in Sports, co-located with MM 2019).

@inproceedings{11116a97f4c94671bb96400a39273b75,

title = "Running event visualization using videos from multiple cameras",

abstract = "Visualizing the trajectory of multiple runners with videos collected at different points in a race could be useful for sports performance analysis. The videos and the trajectories can also aid in athlete health monitoring. While the runners unique ID and their appearance are distinct, the task is not straightforward because the video data does not contain explicit information as to which runners appear in each of the videos. There is no direct supervision of the model in tracking athletes, only filtering steps to remove irrelevant detections. Other factors of concern include occlusion of runners and harsh illumination. To this end, we identify two methods for runner identification at different points of the event, for determining their trajectory. One is scene text detection which recognizes the runners by detecting a unique{\textquoteright}bib number{\textquoteright} attached to their clothes and the other is person re-identification which detects the runners based on their appearance. We train our method without ground truth but to evaluate the proposed methods, we create a ground truth database which consists of video and frame interval information where the runners appear. The videos in the dataset was recorded by nine cameras at different locations during the a marathon event. This data is annotated with bib numbers of runners appearing in each video. The bib numbers of runners known to occur in the frame are used to filter irrelevant text and numbers detected. Except for this filtering step, no supervisory signal is used. The experimental evidence shows that the scene text recognition method achieves an F1-score of 74. Combining the two methods, that is - using samples collected by text spotter to train the re-identification model yields a higher F1-score of 85.8. Re-training the person re-identification model with identified inliers yields a slight improvement in performance(F1 score of 87.8). This combination of text recognition and person re-identification can be used in conjunction with video metadata to visualize running events.",

keywords = "Person re-identification, Runners, Text recognition, Visualization",

author = "Yeshwanth Napolean and Wibowo, {Priadi T.} and {Van Gemert}, {Jan C.}",

year = "2019",

month = oct,

day = "15",

doi = "10.1145/3347318.3355528",

language = "English",

series = "MMSports 2019 - Proceedings of the 2nd International Workshop on Multimedia Content Analysis in Sports, co-located with MM 2019",

publisher = "Association for Computing Machinery (ACM)",

pages = "82--90",

booktitle = "MMSports 2019 - Proceedings of the 2nd International Workshop on Multimedia Content Analysis in Sports, co-located with MM 2019",

address = "United States",

note = "2nd ACM International Workshop on Multimedia Content Analysis in Sports, MMSports 2019, co-located with ACM Multimedia 2019 ; Conference date: 25-10-2019 Through 25-10-2019",

}

Napolean, Y, Wibowo, PT & Van Gemert, JC 2019, Running event visualization using videos from multiple cameras. in MMSports 2019 - Proceedings of the 2nd International Workshop on Multimedia Content Analysis in Sports, co-located with MM 2019. MMSports 2019 - Proceedings of the 2nd International Workshop on Multimedia Content Analysis in Sports, co-located with MM 2019, Association for Computing Machinery (ACM), New York, pp. 82-90, 2nd ACM International Workshop on Multimedia Content Analysis in Sports, MMSports 2019, co-located with ACM Multimedia 2019, Nice, France, 25/10/19. https://doi.org/10.1145/3347318.3355528

Running event visualization using videos from multiple cameras. / Napolean, Yeshwanth; Wibowo, Priadi T.; Van Gemert, Jan C.
MMSports 2019 - Proceedings of the 2nd International Workshop on Multimedia Content Analysis in Sports, co-located with MM 2019. New York: Association for Computing Machinery (ACM), 2019. p. 82-90 (MMSports 2019 - Proceedings of the 2nd International Workshop on Multimedia Content Analysis in Sports, co-located with MM 2019).

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

TY - GEN

T1 - Running event visualization using videos from multiple cameras

AU - Napolean, Yeshwanth

AU - Wibowo, Priadi T.

AU - Van Gemert, Jan C.

PY - 2019/10/15

Y1 - 2019/10/15

N2 - Visualizing the trajectory of multiple runners with videos collected at different points in a race could be useful for sports performance analysis. The videos and the trajectories can also aid in athlete health monitoring. While the runners unique ID and their appearance are distinct, the task is not straightforward because the video data does not contain explicit information as to which runners appear in each of the videos. There is no direct supervision of the model in tracking athletes, only filtering steps to remove irrelevant detections. Other factors of concern include occlusion of runners and harsh illumination. To this end, we identify two methods for runner identification at different points of the event, for determining their trajectory. One is scene text detection which recognizes the runners by detecting a unique’bib number’ attached to their clothes and the other is person re-identification which detects the runners based on their appearance. We train our method without ground truth but to evaluate the proposed methods, we create a ground truth database which consists of video and frame interval information where the runners appear. The videos in the dataset was recorded by nine cameras at different locations during the a marathon event. This data is annotated with bib numbers of runners appearing in each video. The bib numbers of runners known to occur in the frame are used to filter irrelevant text and numbers detected. Except for this filtering step, no supervisory signal is used. The experimental evidence shows that the scene text recognition method achieves an F1-score of 74. Combining the two methods, that is - using samples collected by text spotter to train the re-identification model yields a higher F1-score of 85.8. Re-training the person re-identification model with identified inliers yields a slight improvement in performance(F1 score of 87.8). This combination of text recognition and person re-identification can be used in conjunction with video metadata to visualize running events.

AB - Visualizing the trajectory of multiple runners with videos collected at different points in a race could be useful for sports performance analysis. The videos and the trajectories can also aid in athlete health monitoring. While the runners unique ID and their appearance are distinct, the task is not straightforward because the video data does not contain explicit information as to which runners appear in each of the videos. There is no direct supervision of the model in tracking athletes, only filtering steps to remove irrelevant detections. Other factors of concern include occlusion of runners and harsh illumination. To this end, we identify two methods for runner identification at different points of the event, for determining their trajectory. One is scene text detection which recognizes the runners by detecting a unique’bib number’ attached to their clothes and the other is person re-identification which detects the runners based on their appearance. We train our method without ground truth but to evaluate the proposed methods, we create a ground truth database which consists of video and frame interval information where the runners appear. The videos in the dataset was recorded by nine cameras at different locations during the a marathon event. This data is annotated with bib numbers of runners appearing in each video. The bib numbers of runners known to occur in the frame are used to filter irrelevant text and numbers detected. Except for this filtering step, no supervisory signal is used. The experimental evidence shows that the scene text recognition method achieves an F1-score of 74. Combining the two methods, that is - using samples collected by text spotter to train the re-identification model yields a higher F1-score of 85.8. Re-training the person re-identification model with identified inliers yields a slight improvement in performance(F1 score of 87.8). This combination of text recognition and person re-identification can be used in conjunction with video metadata to visualize running events.

KW - Person re-identification

KW - Runners

KW - Text recognition

KW - Visualization

UR - http://www.scopus.com/inward/record.url?scp=85075729589&partnerID=8YFLogxK

U2 - 10.1145/3347318.3355528

DO - 10.1145/3347318.3355528

M3 - Conference contribution

T3 - MMSports 2019 - Proceedings of the 2nd International Workshop on Multimedia Content Analysis in Sports, co-located with MM 2019

SP - 82

EP - 90

BT - MMSports 2019 - Proceedings of the 2nd International Workshop on Multimedia Content Analysis in Sports, co-located with MM 2019

PB - Association for Computing Machinery (ACM)

CY - New York

T2 - 2nd ACM International Workshop on Multimedia Content Analysis in Sports, MMSports 2019, co-located with ACM Multimedia 2019

Y2 - 25 October 2019 through 25 October 2019

ER -

Napolean Y, Wibowo PT, Van Gemert JC. Running event visualization using videos from multiple cameras. In MMSports 2019 - Proceedings of the 2nd International Workshop on Multimedia Content Analysis in Sports, co-located with MM 2019. New York: Association for Computing Machinery (ACM). 2019. p. 82-90. (MMSports 2019 - Proceedings of the 2nd International Workshop on Multimedia Content Analysis in Sports, co-located with MM 2019). doi: 10.1145/3347318.3355528

Running event visualization using videos from multiple cameras

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this