This paper describes DUTIR at TREC 2007 Blog Track. In data preprocessing, a non English language list created from the corpus was used to remove the non English blogs, blog templ...
Rui Song, Qin Tang, Daming Shi 0002, Hongfei Lin, ...
Abstract. In Data Warehouse (DW) scenarios, ETL (Extraction, Transformation, Loading) processes are responsible for the extraction of data from heterogeneous operational data sourc...
In this paper, we consider the problem of extracting structured data from web pages taking into account both the content of individual attributes as well as the structure of pages...
The concept of pattern, meant as an interesting knowledge artifact extracted from data, is considered to be a an effective answer to the advanced analysis requirements emerging in ...
In this paper we present a keyphrase extraction system that can extract potential phrases from a single document in an unsupervised, domain-independent way. We extract word n-grams...
Nirmala Pudota, Antonina Dattolo, Andrea Baruzzo, ...