While much of the data on the web is unstructured in nature, there is also a significant amount of embedded structured data, such as product information on e-commerce sites or sto...
An unsupervised probabilistic learning framework for normalizing product records across different retailer Web sites is presented. Our framework decomposes the problem into two ta...
As a good complement to page content, anchor texts have been extensively used, and proven to be useful, in commercial search engines. However, anchor texts have been assumed to be...
Zhicheng Dou, Ruihua Song, Jian-Yun Nie, Ji-Rong W...
The paper introduces an approach that organizes retrieval results semantically and displays them spatially for browsing. Latent Semantic Analysis as well as cluster techniques are...
Being the client's first interface, call centres worldwide contain a huge amount of information of all kind under the form of conversational speech. If accessible, this infor...
Martine Garnier-Rizet, Gilles Adda, Frederik Caill...