Sciweavers

708 search results - page 28 / 142
» Identifying Content Blocks from Web Documents
Sort
View
WWW
2006
ACM
14 years 9 months ago
Robust web content extraction
We present an empirical evaluation and comparison of two content extraction methods in HTML: absolute XPath expressions and relative XPath expressions. We argue that the relative ...
Marek Kowalkiewicz, Maria E. Orlowska, Tomasz Kacz...
WWW
2006
ACM
14 years 9 months ago
Using graph matching techniques to wrap data from PDF documents
Wrapping is the process of navigating a data source, semiautomatically extracting data and transforming it into a form suitable for data processing applications. There are current...
Tamir Hassan, Robert Baumgartner
WSDM
2010
ACM
261views Data Mining» more  WSDM 2010»
14 years 6 months ago
Learning Similarity Metrics for Event Identification in Social Media
Social media sites (e.g., Flickr, YouTube, and Facebook) are a popular distribution outlet for users looking to share their experiences and interests on the Web. These sites host ...
Hila Becker, Mor Naaman, Luis Gravano
WWW
2004
ACM
14 years 9 months ago
Learning block importance models for web pages
Some previous works show that a web page can be partitioned to multiple segments or blocks, and usually the importance of those blocks in a page is not equivalent. Also, it is pro...
Ruihua Song, Haifeng Liu, Ji-Rong Wen, Wei-Ying Ma
BXML
2003
13 years 10 months ago
An XML-based Component Architecture for Personalized Adaptive Web Applications
: Developing personalized applications for the ubiquitous Web assumes to create content that can be automatically adapted to both different presentation platforms and user preferen...
Zoltán Fiala, Michael Hinz, Frank Wehner