Photo2Video: Semantic-Aware Deep Learning-Based Video Generation from Still Content

Paula Viana; Maria Teresa Andrade; Pedro Carvalho; Luis Vilaça; Inês N. Teixeira; Tiago Costa; Pieter Jonker

doi:10.3390/jimaging8030068

Photo2Video: Semantic-Aware Deep Learning-Based Video Generation from Still Content

Paula Viana^*, Maria Teresa Andrade, Pedro Carvalho, Luis Vilaça, Inês N. Teixeira, Tiago Costa, Pieter Jonker

^*Corresponding author for this work

Biomechatronics & Human-Machine Control

Research output: Contribution to journal › Article › Scientific › peer-review

37 Downloads (Pure)

Abstract

Applying machine learning (ML), and especially deep learning, to understand visual content is becoming common practice in many application areas. However, little attention has been given to its use within the multimedia creative domain. It is true that ML is already popular for content creation, but the progress achieved so far addresses essentially textual content or the identification and selection of specific types of content. A wealth of possibilities are yet to be explored by bringing the use of ML into the multimedia creative process, allowing the knowledge inferred by the former to influence automatically how new multimedia content is created. The work presented in this article provides contributions in three distinct ways towards this goal: firstly, it proposes a methodology to re-train popular neural network models in identifying new thematic concepts in static visual content and attaching meaningful annotations to the detected regions of interest; secondly, it presents varied visual digital effects and corresponding tools that can be automatically called upon to apply such effects in a previously analyzed photo; thirdly, it defines a complete automated creative workflow, from the acquisition of a photograph and corresponding contextual data, through the ML region-based annotation, to the automatic application of digital effects and generation of a semantically aware multimedia story driven by the previously derived situational and visual contextual data. Additionally, it presents a variant of this automated workflow by offering to the user the possibility of manipulating the automatic annotations in an assisted manner. The final aim is to transform a static digital photo into a short video clip, taking into account the information acquired. The final result strongly contrasts with current standard approaches of creating random movements, by implementing an intelligent content-and context-aware video.

Original language	English
Article number	68
Number of pages	14
Journal	Journal of Imaging
Volume	8
Issue number	3
DOIs	https://doi.org/10.3390/jimaging8030068
Publication status	Published - 2022

Keywords

Automated content creation
Context awareness
Deep learning
RoI
Semantic awareness
Storytelling

Access to Document

10.3390/jimaging8030068

jimaging-08-00068Final published version, 6.86 MBLicence: CC BY

Cite this

@article{d440d3fca01045508ffe65c136162079,

title = "Photo2Video: Semantic-Aware Deep Learning-Based Video Generation from Still Content",

abstract = "Applying machine learning (ML), and especially deep learning, to understand visual content is becoming common practice in many application areas. However, little attention has been given to its use within the multimedia creative domain. It is true that ML is already popular for content creation, but the progress achieved so far addresses essentially textual content or the identification and selection of specific types of content. A wealth of possibilities are yet to be explored by bringing the use of ML into the multimedia creative process, allowing the knowledge inferred by the former to influence automatically how new multimedia content is created. The work presented in this article provides contributions in three distinct ways towards this goal: firstly, it proposes a methodology to re-train popular neural network models in identifying new thematic concepts in static visual content and attaching meaningful annotations to the detected regions of interest; secondly, it presents varied visual digital effects and corresponding tools that can be automatically called upon to apply such effects in a previously analyzed photo; thirdly, it defines a complete automated creative workflow, from the acquisition of a photograph and corresponding contextual data, through the ML region-based annotation, to the automatic application of digital effects and generation of a semantically aware multimedia story driven by the previously derived situational and visual contextual data. Additionally, it presents a variant of this automated workflow by offering to the user the possibility of manipulating the automatic annotations in an assisted manner. The final aim is to transform a static digital photo into a short video clip, taking into account the information acquired. The final result strongly contrasts with current standard approaches of creating random movements, by implementing an intelligent content-and context-aware video.",

keywords = "Automated content creation, Context awareness, Deep learning, RoI, Semantic awareness, Storytelling",

author = "Paula Viana and Andrade, {Maria Teresa} and Pedro Carvalho and Luis Vila{\c c}a and Teixeira, {In{\^e}s N.} and Tiago Costa and Pieter Jonker",

year = "2022",

doi = "10.3390/jimaging8030068",

language = "English",

volume = "8",

journal = "Journal of Imaging",

issn = "2313-433X",

publisher = "Multidisciplinary Digital Publishing Institute",

number = "3",

}

TY - JOUR

T1 - Photo2Video

T2 - Semantic-Aware Deep Learning-Based Video Generation from Still Content

AU - Viana, Paula

AU - Andrade, Maria Teresa

AU - Carvalho, Pedro

AU - Vilaça, Luis

AU - Teixeira, Inês N.

AU - Costa, Tiago

AU - Jonker, Pieter

PY - 2022

Y1 - 2022

N2 - Applying machine learning (ML), and especially deep learning, to understand visual content is becoming common practice in many application areas. However, little attention has been given to its use within the multimedia creative domain. It is true that ML is already popular for content creation, but the progress achieved so far addresses essentially textual content or the identification and selection of specific types of content. A wealth of possibilities are yet to be explored by bringing the use of ML into the multimedia creative process, allowing the knowledge inferred by the former to influence automatically how new multimedia content is created. The work presented in this article provides contributions in three distinct ways towards this goal: firstly, it proposes a methodology to re-train popular neural network models in identifying new thematic concepts in static visual content and attaching meaningful annotations to the detected regions of interest; secondly, it presents varied visual digital effects and corresponding tools that can be automatically called upon to apply such effects in a previously analyzed photo; thirdly, it defines a complete automated creative workflow, from the acquisition of a photograph and corresponding contextual data, through the ML region-based annotation, to the automatic application of digital effects and generation of a semantically aware multimedia story driven by the previously derived situational and visual contextual data. Additionally, it presents a variant of this automated workflow by offering to the user the possibility of manipulating the automatic annotations in an assisted manner. The final aim is to transform a static digital photo into a short video clip, taking into account the information acquired. The final result strongly contrasts with current standard approaches of creating random movements, by implementing an intelligent content-and context-aware video.

AB - Applying machine learning (ML), and especially deep learning, to understand visual content is becoming common practice in many application areas. However, little attention has been given to its use within the multimedia creative domain. It is true that ML is already popular for content creation, but the progress achieved so far addresses essentially textual content or the identification and selection of specific types of content. A wealth of possibilities are yet to be explored by bringing the use of ML into the multimedia creative process, allowing the knowledge inferred by the former to influence automatically how new multimedia content is created. The work presented in this article provides contributions in three distinct ways towards this goal: firstly, it proposes a methodology to re-train popular neural network models in identifying new thematic concepts in static visual content and attaching meaningful annotations to the detected regions of interest; secondly, it presents varied visual digital effects and corresponding tools that can be automatically called upon to apply such effects in a previously analyzed photo; thirdly, it defines a complete automated creative workflow, from the acquisition of a photograph and corresponding contextual data, through the ML region-based annotation, to the automatic application of digital effects and generation of a semantically aware multimedia story driven by the previously derived situational and visual contextual data. Additionally, it presents a variant of this automated workflow by offering to the user the possibility of manipulating the automatic annotations in an assisted manner. The final aim is to transform a static digital photo into a short video clip, taking into account the information acquired. The final result strongly contrasts with current standard approaches of creating random movements, by implementing an intelligent content-and context-aware video.

KW - Automated content creation

KW - Context awareness

KW - Deep learning

KW - RoI

KW - Semantic awareness

KW - Storytelling

UR - http://www.scopus.com/inward/record.url?scp=85126644407&partnerID=8YFLogxK

U2 - 10.3390/jimaging8030068

DO - 10.3390/jimaging8030068

M3 - Article

AN - SCOPUS:85126644407

SN - 2313-433X

VL - 8

JO - Journal of Imaging

JF - Journal of Imaging

IS - 3

M1 - 68

ER -

Photo2Video: Semantic-Aware Deep Learning-Based Video Generation from Still Content

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this