Utilizing phrase-similarity measures for detecting and clustering informative RSS news articles

Maria Soledad Pera, Yiu Kai Ng*

*Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review

17 Citations (Scopus)

Abstract

As the number of RSS news feeds continue to increase over the Internet, it becomes necessary to minimize the workload of the user who is otherwise required to scan through huge number of news articles to find related articles of interest, which is a tedious and often an impossible task. In order to solve this problem, we present a novel approach, called InFRSS, which consists of a correlation-based phrase matching (CPM) model and a fuzzy compatibility clustering (FCC) model. CPM can detect RSS news articles containing phrases that are the same as well as semantically alike, and dictate the degrees of similarity of any two articles. FCC identifies and clusters non-redundant, closely related RSS news articles based on their degrees of similarity and a fuzzy compatibility relation. Experimental results show that (i) our CPM model on matching bigrams and trigrams in RSS news articles outperforms other phrase/keyword-matching approaches and (ii) our FCC model generates high quality clusters and outperforms other well-known clustering techniques.

Original languageEnglish
Pages (from-to)331-350
Number of pages20
JournalIntegrated Computer-Aided Engineering
Volume15
Issue number4
DOIs
Publication statusPublished - 2008
Externally publishedYes

Keywords

  • Clustering
  • Fuzzy-set IR model
  • Information search
  • Phrase matching

Fingerprint

Dive into the research topics of 'Utilizing phrase-similarity measures for detecting and clustering informative RSS news articles'. Together they form a unique fingerprint.

Cite this