Abstract—It is well known that for finite-sized networks, onestep retrieval in the autoassociative Willshaw net is a suboptimal way to extract the information stored in the syna...
This paper is concerned with automatic extraction of titles from the bodies of HTML documents. Titles of HTML documents should be correctly defined in the title fields; however, i...
Abstract. We focus on two recently proposed algorithms in the family of “boosting”-based learners for automated text classification, AdaBoost.MH and AdaBoost.MHKR . While the ...
Pio Nardiello, Fabrizio Sebastiani, Alessandro Spe...
It is well known that anchor text plays a critical role in a variety of search tasks performed over hypertextual domains, including enterprise search, wiki search, and web search....
Donald Metzler, Jasmine Novak, Hang Cui, Srihari R...
We describe an adaptive method for extracting records from web pages. Our algorithm combines a weighted tree matching metric with clustering for obtaining data extraction patterns...