Browsing and searching for documents in large, online enterprise document repositories are common activities. While internet search produces satisfying results for most user queri...
Andreas Girgensohn, Frank M. Shipman III, Francine...
Code duplication, plausibly caused by copying source code and slightly modifying it, is often observed in large systems. Clone detection and documentation have been investigated b...
Magdalena Balazinska, Ettore Merlo, Michel Dagenai...
In this paper we study the problem of collecting training samples for building enterprise taxonomies. We develop a computer-aided tool named InfoAnalyzer, which can effectively as...
A new dictionary-based text categorization approach is proposed to classify the chemical web pages efficiently. Using a chemistry dictionary, the approach can extract chemistry-re...
Chunyan Liang, Li Guo, Zhaojie Xia, Feng-Guang Nie...
The structure of the web is increasingly being used to improve organization, search, and analysis of information on the web. For example, Google uses the text in citing documents ...
Eric J. Glover, Kostas Tsioutsiouliklis, Steve Law...