We consider the problem of automatically extracting general lists from the web. Existing approaches are mostly dependent upon either the underlying HTML markup or the visual struc...
Fabio Fumarola, Tim Weninger, Rick Barber, Donato ...
Background: Health and disease of organisms are reflected in their phenotypes. Often, a genetic component to a disease is discovered only after clearly defining its phenotype. In ...
Philip Groth, Bertram Weiss, Hans-Dieter Pohlenz, ...
Due to their capability for expressing semantics and relationships among data objects, semi-structured documents have become a common way of representing domain knowledge. Compari...
Henry Tan, Tharam S. Dillon, Fedja Hadzic, Elizabe...
— Many information retrieval and machine learning methods have not evolved in order to be applied to the Web. Two main problems in applying some machine learning techniques for W...
Having access to large data sets for the purpose of predictive data mining does not guarantee good models, even when the size of the training data is virtually unlimited. Instead,...