Scholarly entities, such as articles, journals, authors and institutions, are now mostly ranked according to expert opinion and citation data. The Andrew W. Mellon Foundation fund...
Johan Bollen, Marko A. Rodriguez, Herbert Van de S...
The paper describes some innovations related to the ongoing work on the GSA prototype, an integrated information retrieval agent. In order to improve the original system effective...
Giovambattista Ianni, Francesco Ricca, Francesco C...
First generation Web-content encodes information in handwritten (HTML) Web pages. Second generation Web content generates HTML pages on demand, e.g. by filling in templates with c...
Jacco van Ossenbruggen, Joost Geurts, Frank Cornel...
We discuss the problem of Web data extraction and describe an XML-based methodology whose goal extends far beyond simple "screen scraping." An ideal data extraction proc...
Address standardization is a very challenging task in data cleansing. To provide better customer relationship management and business intelligence for customer-oriented cooperates...