MoReSo: A DNN Framework Expediting Content-based Video Image Retrieval (CBVIR)

Sinian Li, Doruk Barokas Profeta, Justin Dauwels

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

2 Downloads (Pure)

Abstract

With the exponential growth of video data, individuals, particularly scholars in the fields of history and sociology, are increasingly reliant on video materials. However, the task of locating specific frames within videos remains a laborious and time-consuming endeavor. Advanced machine learning-assisted video processing techniques have emerged, including text-based video searches, video summarization, real-time object detection, and person re-identification. However, distinct from these, the main challenge of retrieving video frames based on given visual content is how to efficiently and accurately pinpoint the instance occurrences. To expedite the process while maintaining retrieval performance, we propose a two-stage approach, combining KeyFrame Extraction (KFE) and Content-based Image Retrieval (CBIR), underpinned a DNN-empowered framework called MoReSo. Our innovations include 1) the integration of improved statistical features with dynamic clustering in the KFE stage and 2) the development of the MoReSo framework, which consists of MobileNet and ResNet backbones with SOA layer to jointly represent video frames, achieving 2.67x increase in efficiency compared to existing solutions. Our framework is evaluated on two datasets: the annotated EHM Historical Database provided by digital history researchers and the widely-used image retrieval benchmark datasets, the Oxford and Paris datasets. The experimental results showcase that the proposed framework and scheme excel among other models in the CBVIR task. We make our code available for further exploration through our GitHub repository. This repository contains the implementation of our model and CBVIR system with a GUI prototype.

Original languageEnglish
Title of host publication32nd European Signal Processing Conference, EUSIPCO 2024 - Proceedings
PublisherEuropean Signal Processing Conference, EUSIPCO
Pages551-555
Number of pages5
ISBN (Electronic)9789464593617
DOIs
Publication statusPublished - 2024
Event32nd European Signal Processing Conference, EUSIPCO 2024 - Lyon, France
Duration: 26 Aug 202430 Aug 2024
https://eusipcolyon.sciencesconf.org/

Publication series

NameEuropean Signal Processing Conference
ISSN (Print)2219-5491

Conference

Conference32nd European Signal Processing Conference, EUSIPCO 2024
Abbreviated titleEUSIPCO 2024
Country/TerritoryFrance
CityLyon
Period26/08/2430/08/24
Internet address

Bibliographical note

Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care
Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Keywords

  • Content-Based Image Retrieval
  • Content-Based Video Image Retrieval
  • Image Retrieval from Video
  • Key Frame Extraction

Fingerprint

Dive into the research topics of 'MoReSo: A DNN Framework Expediting Content-based Video Image Retrieval (CBVIR)'. Together they form a unique fingerprint.

Cite this