An Unsupervised Approach for Product Record Normalization across Different Web Sites

15 years 9 months ago

Download www.aaai.org

An unsupervised probabilistic learning framework for normalizing product records across different retailer Web sites is presented. Our framework decomposes the problem into two tasks to achieve the goal. The first task aims at extracting attribute values of products from different sites and normalizing them into appropriate reference attributes. This task is challenging because the set of reference attributes is unknown in advance. Besides, the layout formats are different in different Web sites. The second task is to conduct product record normalization aiming at identifying product records referring to the same reference product based on the results of the first task. We develop a graphical model for the generation of text fragments in Web pages to accomplish the two tasks. One characteristic of our model is that the product attributes to be extracted are not required to be specified in advance and an unlimited number of previously unseen product attributes can be handled. We compar...

Tak-Lam Wong, Tik-Shun Wong, Wai Lam

Real-time Traffic

AAAI 2008 | Intelligent Agents | Product Records | Reference Attributes | Web Sites |

claim paper

» Mining Web Sites Using Wrapper Induction Named Entities and Postprocessing

» Web based information for product ranking in ebusiness a fuzzy approach

» 2D Conditional Random Fields for Web information extraction

Post Info
More Details (n/a)

Added	02 Oct 2010
Updated	02 Oct 2010
Type	Conference
Year	2008
Where	AAAI
Authors	Tak-Lam Wong, Tik-Shun Wong, Wai Lam

Comments (0)

Sciweavers

An Unsupervised Approach for Product Record Normalization across Different Web Sites

AAAI 2008 | Intelligent Agents | Product Records | Reference Attributes | Web Sites |

Explore & Download

Productivity Tools

Sciweavers