TY - GEN
T1 - Finding similar RSS news articles using correlation-based phrase matching
AU - Pera, Maria Soledad
AU - Ng, Yiu Kai
PY - 2007
Y1 - 2007
N2 - Traditional phrase matching approaches, which can discover documents containing exactly the same phrases, fail to detect documents including phrases that are semantically relevant, but not exact matches. We propose a correlation-based phrase matching (CPM) model that can detect RSS news articles which contain not only phrases that are exactly the same but also semantically relevant, which dictate the degrees of similarity of any two articles. As the number of RSS news feeds continue to increase over the Internet, our CPM approach becomes more significant, since it minimizes the workload of the user who is otherwise required to scan through huge number of news articles to find related articles of interest, which is a tedious and often an impossible task. Experimental results show that our CPM model on matching bigrams and trigrams outperforms other phrase, including keyword, matching approaches.
AB - Traditional phrase matching approaches, which can discover documents containing exactly the same phrases, fail to detect documents including phrases that are semantically relevant, but not exact matches. We propose a correlation-based phrase matching (CPM) model that can detect RSS news articles which contain not only phrases that are exactly the same but also semantically relevant, which dictate the degrees of similarity of any two articles. As the number of RSS news feeds continue to increase over the Internet, our CPM approach becomes more significant, since it minimizes the workload of the user who is otherwise required to scan through huge number of news articles to find related articles of interest, which is a tedious and often an impossible task. Experimental results show that our CPM model on matching bigrams and trigrams outperforms other phrase, including keyword, matching approaches.
UR - http://www.scopus.com/inward/record.url?scp=38149007032&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-76719-0_34
DO - 10.1007/978-3-540-76719-0_34
M3 - Conference contribution
AN - SCOPUS:38149007032
SN - 9783540767183
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 336
EP - 348
BT - Knowledge Science, Engineering and Management - Second International Conference, KSEM 2007, Proceedings
PB - Springer
T2 - 2nd International Conference on Knowledge Science, Engineering and Management, KSEM 2007
Y2 - 28 November 2007 through 30 November 2007
ER -