TY - GEN
T1 - Facet Embeddings for Explorative Analytics in Digital Libraries
AU - Mesbah, Sepideh
AU - Fragkeskos, Kyriakos
AU - Lofi, Christoph
AU - Bozzon, Alessandro
AU - Houben, Geert Jan
PY - 2017
Y1 - 2017
N2 - With the increasing amount of scientific publications in digital libraries, it is crucial to capture “deep meta-data” to facilitate more effective search and discovery, like search by topics, research methods, or data sets used in a publication. Such meta-data can also help to better understand and visualize the evolution of research topics or research venues over time. The automatic generation of meaningful deep meta-data from natural-language documents is challenged by the unstructured and often ambiguous nature of publications’ content. In this paper, we propose a domain-aware topic modeling technique called Facet Embedding which can generate such deep meta-data in an efficient way. We automatically extract a set of terms according to the key facets relevant to a specific domain (i.e. scientific objective, used data sets, methods, or software, obtained results), relying only on limited manual training. We then cluster and subsume similar facet terms according to their semantic similarity into facet topics. To showcase the effectiveness and performance of our approach, we present the results of a quantitative and qualitative analysis performed on ten different conference series in a Digital Library setting, focusing on the effectiveness for document search, but also for visualizing scientific trends.
AB - With the increasing amount of scientific publications in digital libraries, it is crucial to capture “deep meta-data” to facilitate more effective search and discovery, like search by topics, research methods, or data sets used in a publication. Such meta-data can also help to better understand and visualize the evolution of research topics or research venues over time. The automatic generation of meaningful deep meta-data from natural-language documents is challenged by the unstructured and often ambiguous nature of publications’ content. In this paper, we propose a domain-aware topic modeling technique called Facet Embedding which can generate such deep meta-data in an efficient way. We automatically extract a set of terms according to the key facets relevant to a specific domain (i.e. scientific objective, used data sets, methods, or software, obtained results), relying only on limited manual training. We then cluster and subsume similar facet terms according to their semantic similarity into facet topics. To showcase the effectiveness and performance of our approach, we present the results of a quantitative and qualitative analysis performed on ten different conference series in a Digital Library setting, focusing on the effectiveness for document search, but also for visualizing scientific trends.
UR - http://www.scopus.com/inward/record.url?scp=85029580458&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-67008-9_8
DO - 10.1007/978-3-319-67008-9_8
M3 - Conference contribution
AN - SCOPUS:85029580458
SN - 978-3-319-67007-2
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 86
EP - 99
BT - Research and Advanced Technology for Digital Libraries
A2 - Kamps, Jaap
A2 - Tsakonas, Giannis
A2 - Manolopoulos, Yannis
A2 - Iliadis, Lazaros
A2 - Karydis, Ioannis
PB - Springer
T2 - 21st International Conference on Theory and Practice of Digital Libraries, TPDL 2017
Y2 - 18 September 2017 through 21 September 2017
ER -