Large collections of documents containing various types of multimedia, are made available to the WWW. Unfortunately, due to the un-structuredness of Internet environments it is ha...
In this paper, we propose a machine learning approach to title extraction from general documents. By general documents, we mean documents that can belong to any one of a number of...
Yunhua Hu, Hang Li, Yunbo Cao, Dmitriy Meyerzon, Q...
Multiple-dimensional, i.e., polyadic, data exist in many applications, such as personalized recommendation and multipledimensional data summarization. Analyzing all the dimensions...
This paper presents a novel prototype hierarchy based clustering (PHC) framework for the organization of web collections. It solves simultaneously the problem of categorizing web ...
In many Web search applications, similarities between objects of one type (say, queries) can be affected by the similarities between their interrelated objects of another type (sa...