Web pages contain a combination of unique content and template material, which is present across multiple pages and used primarily for formatting, navigation, and branding. We stu...
Text classification systems on biomedical literature aim to select relevant articles to a specific issue from large corpora. Most systems with an acceptable accuracy are based o...
Abstract. Hypertext categorization is the task of automatically assigning category labels to hypertext units. Comparable to text categorization it stays in the area of function lea...
Abstract: CASPUR allows many academic Italian institutions located in the CentreSouth of Italy to access more than 7 million of articles through a digital library platform. We anal...
We propose three heuristics to determine the country of origin of a person or institution via text-based IE from the Web. We evaluate all methods on a collection of music artists ...
Markus Schedl, Klaus Seyerlehner, Dominik Schnitze...