Finding similar RSS news articles using correlation-based phrase matching

Maria Soledad Pera*, Yiu Kai Ng

*Corresponding author for this work

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

1 Citation (Scopus)

Abstract

Traditional phrase matching approaches, which can discover documents containing exactly the same phrases, fail to detect documents including phrases that are semantically relevant, but not exact matches. We propose a correlation-based phrase matching (CPM) model that can detect RSS news articles which contain not only phrases that are exactly the same but also semantically relevant, which dictate the degrees of similarity of any two articles. As the number of RSS news feeds continue to increase over the Internet, our CPM approach becomes more significant, since it minimizes the workload of the user who is otherwise required to scan through huge number of news articles to find related articles of interest, which is a tedious and often an impossible task. Experimental results show that our CPM model on matching bigrams and trigrams outperforms other phrase, including keyword, matching approaches.

Original languageEnglish
Title of host publicationKnowledge Science, Engineering and Management - Second International Conference, KSEM 2007, Proceedings
PublisherSpringer
Pages336-348
Number of pages13
ISBN (Print)9783540767183
DOIs
Publication statusPublished - 2007
Externally publishedYes
Event2nd International Conference on Knowledge Science, Engineering and Management, KSEM 2007 - Melbourne, Australia
Duration: 28 Nov 200730 Nov 2007

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4798 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference2nd International Conference on Knowledge Science, Engineering and Management, KSEM 2007
Country/TerritoryAustralia
CityMelbourne
Period28/11/0730/11/07

Fingerprint

Dive into the research topics of 'Finding similar RSS news articles using correlation-based phrase matching'. Together they form a unique fingerprint.

Cite this