TY - GEN
T1 - Classifying sentence-based summaries of web documents
AU - Pera, Maria Soledad
AU - Ng, Yiu Kai
PY - 2009
Y1 - 2009
N2 - Text classification categories Web documents in large collections into predefined classes based on their contents. Unfortunately, the classification process can be time-consuming and users are still required to spend considerable amount of time scanning through the classified Web documents to identify the ones that satisfy their information needs. In solving this problem, we first introduce CorSum, an extractive single-document summarization approach, which is simple and effective in performing the summarization task, since it only relies on word similarity to generate high-quality summaries. Hereafter, we train a Naïve Bayes classifier on CorSum-generated summaries and verify the classification accuracy using the summaries and the speed-up during the process. Experimental results on the DUC-2002 and 20 Newsgroups datasets show that CorSum outperforms other extractive summarization methods, and classification time is significantly reduced using CorSum-generated summaries with compatible accuracy. More importantly, browsing summaries, instead of entire documents, classified to topic-oriented categories facilitates the information searching process on the Web.
AB - Text classification categories Web documents in large collections into predefined classes based on their contents. Unfortunately, the classification process can be time-consuming and users are still required to spend considerable amount of time scanning through the classified Web documents to identify the ones that satisfy their information needs. In solving this problem, we first introduce CorSum, an extractive single-document summarization approach, which is simple and effective in performing the summarization task, since it only relies on word similarity to generate high-quality summaries. Hereafter, we train a Naïve Bayes classifier on CorSum-generated summaries and verify the classification accuracy using the summaries and the speed-up during the process. Experimental results on the DUC-2002 and 20 Newsgroups datasets show that CorSum outperforms other extractive summarization methods, and classification time is significantly reduced using CorSum-generated summaries with compatible accuracy. More importantly, browsing summaries, instead of entire documents, classified to topic-oriented categories facilitates the information searching process on the Web.
UR - http://www.scopus.com/inward/record.url?scp=77949524365&partnerID=8YFLogxK
U2 - 10.1109/ICTAI.2009.101
DO - 10.1109/ICTAI.2009.101
M3 - Conference contribution
AN - SCOPUS:77949524365
SN - 9781424456192
T3 - Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI
SP - 433
EP - 440
BT - ICTAI 2009 - 21st IEEE International Conference on Tools with Artificial Intelligence
T2 - 21st IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2009
Y2 - 2 November 2009 through 5 November 2009
ER -