As online document collections continue to expand, both on the Web and in proprietary environments, the need for duplicate detection becomes more critical. Few users wish to retri...
The Web has grown beyond anybody’s imagination. While significant research has been devoted to understanding aspects of the Web from the perspective of the documents that compr...
There is an increase in the number of data sources that can be queried across the WWW. Such sources typically support HTML forms-based interfaces and search engines query collecti...
In contrast to traditional document retrieval, a web page as a whole is not a good information unit to search because it often contains multiple topics and a lot of irrelevant inf...
Recently, the extraordinarygrowth in the World Wide Web has been sweeping through business and industry. Many companies have developed or integrated their mission-critical applica...
Chien-Hung Liu, David Chenho Kung, Pei Hsia, Chih-...