Su, Tze-Wei and Khor, Hao-Ming and Tan, Ian K. T. (2011) Utilizing word matching for duplicate article removal : a study using Malaysian online news feed. In: Symposium on Information & Computer Sciences (1st).
|
Text
ICS2011_17.pdf Download (241kB) | Preview |
Abstract
Users of feed aggregators know that duplicated articles are found occasionally on the feeds they subscribe to. It can be time consuming to read all articles and stumble upon duplicated items they have already read. Our work here is to determine the effectiveness of using basic word matching to remove duplicated items and only show the most relevant item, thus saving readers? time. The method described in this paper to remove duplicates involves word matching heuristics with an appropriate matching percentage. The duplicated feeds are then ranked to only display the highest ranked article. Ranking is done using the number of search items found on the titles of the news feeds where the highest number returned will be considered the highest ranked article. Using Malaysian online news feeds, our method found that with a matching percentage of 40%, the method will be able to minimize duplicates
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Additional Information: | Authors are affiliated to Multimedia University, Cyberjaya, Faculty of Information Technology. |
Uncontrolled Keywords: | duplicate removal; word matching; RSS feeds; news; ranking |
Subjects: | Q Science > QA Mathematics > QA76 Computer software |
Divisions: | Others > Non Sunway Academics |
Depositing User: | Administrator Admin |
Date Deposited: | 17 Oct 2012 03:41 |
Last Modified: | 17 Oct 2012 03:41 |
URI: | http://eprints.sunway.edu.my/id/eprint/116 |
Actions (login required)
View Item |