A flexible framework for key audio effects detection and auditory context inference

R Cai; L Lu; A Hanjalic; LH Cai

A flexible framework for key audio effects detection and auditory context inference

R Cai, L Lu, A Hanjalic, LH Cai

Multimedia Computing

Research output: Contribution to journal › Article › Scientific › peer-review

Abstract

Abstract Key audio effects are those special effects that play critical roles in human's perception of an auditory context in audiovisual materials. Based on key audio effects, high-level semantic inference can be carried out to facilitate various content-based analysis applications, such as highlight extraction and video summarization. In this paper, a flexible framework is proposed for key audio effect detection in a continuous audio stream, as well as for the semantic inference of an auditory context. In the proposed framework, key audio effects and the background sounds are comprehensively modeled with hidden Markov models, and a Grammar Network is proposed to connect various models to fully explore the transitions among them. Moreover, a set of new spectral features are employed to improve the representation of each audio effect and the discrimination among various effects. The framework is convenient to add or remove target audio effects in various applications. Based on the obtained key effect sequence, a Bayesian network-based approach is proposed to further discover the high-level semantics of an auditory context by integrating prior knowledge and statistical learning. Evaluations on 12 h of audio data indicate that the proposed framework can achieve satisfying results, both on key audio effect detection and auditory context inference.

Original language	Undefined/Unknown
Pages (from-to)	1026-1039
Number of pages	14
Journal	IEEE Transactions on Speech and Audio Processing
Volume	14
Issue number	3
Publication status	Published - 2006

Bibliographical note

Tijdschrift heet nu: IEEE transactions on audio, speech and language processing, 1558-7916

Keywords

academic journal papers
CWTS 0.75 <= JFIS < 2.00

Cite this

@article{2410d9e5f715462b81af83474bd98f07,

title = "A flexible framework for key audio effects detection and auditory context inference",

abstract = "Abstract Key audio effects are those special effects that play critical roles in human's perception of an auditory context in audiovisual materials. Based on key audio effects, high-level semantic inference can be carried out to facilitate various content-based analysis applications, such as highlight extraction and video summarization. In this paper, a flexible framework is proposed for key audio effect detection in a continuous audio stream, as well as for the semantic inference of an auditory context. In the proposed framework, key audio effects and the background sounds are comprehensively modeled with hidden Markov models, and a Grammar Network is proposed to connect various models to fully explore the transitions among them. Moreover, a set of new spectral features are employed to improve the representation of each audio effect and the discrimination among various effects. The framework is convenient to add or remove target audio effects in various applications. Based on the obtained key effect sequence, a Bayesian network-based approach is proposed to further discover the high-level semantics of an auditory context by integrating prior knowledge and statistical learning. Evaluations on 12 h of audio data indicate that the proposed framework can achieve satisfying results, both on key audio effect detection and auditory context inference.",

keywords = "academic journal papers, CWTS 0.75 <= JFIS < 2.00",

author = "R Cai and L Lu and A Hanjalic and LH Cai",

note = "Tijdschrift heet nu: IEEE transactions on audio, speech and language processing, 1558-7916",

year = "2006",

language = "Undefined/Unknown",

volume = "14",

pages = "1026--1039",

journal = "IEEE Transactions on Speech and Audio Processing",

issn = "1063-6676",

publisher = "Institute of Electrical and Electronics Engineers (IEEE)",

number = "3",

}

TY - JOUR

T1 - A flexible framework for key audio effects detection and auditory context inference

AU - Cai, R

AU - Lu, L

AU - Hanjalic, A

AU - Cai, LH

N1 - Tijdschrift heet nu: IEEE transactions on audio, speech and language processing, 1558-7916

PY - 2006

Y1 - 2006

N2 - Abstract Key audio effects are those special effects that play critical roles in human's perception of an auditory context in audiovisual materials. Based on key audio effects, high-level semantic inference can be carried out to facilitate various content-based analysis applications, such as highlight extraction and video summarization. In this paper, a flexible framework is proposed for key audio effect detection in a continuous audio stream, as well as for the semantic inference of an auditory context. In the proposed framework, key audio effects and the background sounds are comprehensively modeled with hidden Markov models, and a Grammar Network is proposed to connect various models to fully explore the transitions among them. Moreover, a set of new spectral features are employed to improve the representation of each audio effect and the discrimination among various effects. The framework is convenient to add or remove target audio effects in various applications. Based on the obtained key effect sequence, a Bayesian network-based approach is proposed to further discover the high-level semantics of an auditory context by integrating prior knowledge and statistical learning. Evaluations on 12 h of audio data indicate that the proposed framework can achieve satisfying results, both on key audio effect detection and auditory context inference.

AB - Abstract Key audio effects are those special effects that play critical roles in human's perception of an auditory context in audiovisual materials. Based on key audio effects, high-level semantic inference can be carried out to facilitate various content-based analysis applications, such as highlight extraction and video summarization. In this paper, a flexible framework is proposed for key audio effect detection in a continuous audio stream, as well as for the semantic inference of an auditory context. In the proposed framework, key audio effects and the background sounds are comprehensively modeled with hidden Markov models, and a Grammar Network is proposed to connect various models to fully explore the transitions among them. Moreover, a set of new spectral features are employed to improve the representation of each audio effect and the discrimination among various effects. The framework is convenient to add or remove target audio effects in various applications. Based on the obtained key effect sequence, a Bayesian network-based approach is proposed to further discover the high-level semantics of an auditory context by integrating prior knowledge and statistical learning. Evaluations on 12 h of audio data indicate that the proposed framework can achieve satisfying results, both on key audio effect detection and auditory context inference.

KW - academic journal papers

KW - CWTS 0.75 <= JFIS < 2.00

UR - http://ieeexplore.ieee.org/iel5/10376/33958/01621215.pdf?tp=&arnumber=1621215&isnumber=33958

M3 - Article

SN - 1063-6676

VL - 14

SP - 1026

EP - 1039

JO - IEEE Transactions on Speech and Audio Processing

JF - IEEE Transactions on Speech and Audio Processing

IS - 3

ER -

A flexible framework for key audio effects detection and auditory context inference

Abstract

Bibliographical note

Keywords

Other files and links

Cite this