Mining templates from search result records of search engines

15 years 21 days ago

Download www.cs.binghamton.edu

Metasearch engine, Comparison-shopping and Deep Web crawling applications need to extract search result records enwrapped in result pages returned from search engines in response to user queries. The search result records from a given search engine are usually formatted based on a template. Precisely identifying this template can greatly help extract and annotate the data units within each record correctly. In this paper, we propose a graph model to represent record template and develop a domain independent statistical method to automatically mine the record template for any search engine using sample search result records. Our approach can identify both template tags (HTML tags) and template texts (non-tag texts), and it also explicitly addresses the mismatches between the tag structures and the data structures of search result records. Our experimental results indicate that this approach is very effective. Categories and Subject Descriptors H.3.5 [Information Storage and Retrieval]:...

Hongkun Zhao, Weiyi Meng, Clement T. Yu

Real-time Traffic

Data Mining | KDD 2007 | Record Template | Search Engine | Template Tags |

claim paper

Post Info
More Details (n/a)

Added	30 Nov 2009
Updated	30 Nov 2009
Type	Conference
Year	2007
Where	KDD
Authors	Hongkun Zhao, Weiyi Meng, Clement T. Yu

Comments (0)

Sciweavers

Mining templates from search result records of search engines

Data Mining | KDD 2007 | Record Template | Search Engine | Template Tags |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers