Abstract
This paper presents a design concept for speech-based mobile applications that is based on the use of a narrative storyline. Its main contribution is to introduce the idea of conceptualizing speech-based mobile multimedia tagging and retrieval applications as a story that develops via interaction of the user with characters representing elements of the system. The aim of this paper is to encourage and support the research community to further explore and develop this concept into mature systems that allow for the accumulation and access of
large quantities of speech-annotated images. We provide two resources
intended to facilitate such work: First, we describe two applications, together referred as the ‘Verbals Mobile System’, that we have developed on the basis of this design concept, and implemented on Android platform 2.2 (API level 8) using Google's Speech Recognition service, Text-to-Speech Engine and Flickr API. The code for these applications has been made publically available to encourage further extension. Second, we distill our practical findings
into a discussion of technology limitations and guidelines for the design of speech-based mobile applications, in an effort to support researchers to build
on our work, while avoiding known pitfalls.
large quantities of speech-annotated images. We provide two resources
intended to facilitate such work: First, we describe two applications, together referred as the ‘Verbals Mobile System’, that we have developed on the basis of this design concept, and implemented on Android platform 2.2 (API level 8) using Google's Speech Recognition service, Text-to-Speech Engine and Flickr API. The code for these applications has been made publically available to encourage further extension. Second, we distill our practical findings
into a discussion of technology limitations and guidelines for the design of speech-based mobile applications, in an effort to support researchers to build
on our work, while avoiding known pitfalls.
Original language | English |
---|---|
Title of host publication | Proceedings of the First Workshop on Speech, Language and Audio in Multimedia (SLAM) |
Editors | Guillaume Gravier, Frédéric Béchet |
Publisher | CEUR-WS |
Pages | 90-95 |
Number of pages | 6 |
Publication status | Published - 2013 |
Publication series
Name | CEUR Workshop Proceedings |
---|---|
Volume | 1012 |
ISSN (Electronic) | 1613-0073 |