Sciweavers

24 search results - page 2 / 5
» Automatic Extraction of Textual Elements from News Web Pages
Sort
View
WWW
2004
ACM
14 years 8 months ago
Automatic web news extraction using tree edit distance
The Web poses itself as the largest data repository ever available in the history of humankind. Major efforts have been made in order to provide efficient access to relevant infor...
Davi de Castro Reis, Paulo Braz Golgher, Altigran ...
LREC
2008
133views Education» more  LREC 2008»
13 years 9 months ago
Automatic Identification of Temporal Information in Tourism Web Pages
This paper presents our work on the detection of temporal information in web pages. The pages examined within the scope of this study were taken from the tourism sector and the te...
Stéphanie Weiser, Philippe Laublet, Jean-Lu...
HICSS
2008
IEEE
105views Biometrics» more  HICSS 2008»
14 years 1 months ago
Using Visual Features for Fine-Grained Genre Classification of Web Pages
The field of automatic genre classification has primarily focused on extracting textual features from documents. The goal of this research is to investigate whether visual feature...
Ryan Levering, Michal Cutler, Lei Yu
KDD
1999
ACM
147views Data Mining» more  KDD 1999»
13 years 11 months ago
Text Mining: Finding Nuggets in Mountains of Textual Data
Text mining appliesthe sameanalytical functions of datamining to the domainof textual information, relying on sophisticatedtext analysis techniques that distill information from f...
Jochen Dörre, Peter Gerstl, Roland Seiffert
LREC
2010
216views Education» more  LREC 2010»
13 years 9 months ago
BlogBuster: A Tool for Extracting Corpora from the Blogosphere
This paper presents BlogBuster, a tool for extracting a corpus from the blogosphere. The topic of cleaning arbitrary web pages with the goal of extracting a corpus from web data, ...
Georgios Petasis, Dimitrios Petasis