Sciweavers

309 search results - page 44 / 62
» An Analysis of Web Documents Retrieved and Viewed
Sort
View
WWW
2002
ACM
14 years 8 months ago
A machine learning based approach for table detection on the web
Table is a commonly used presentation scheme, especially for describing relational information. However, table understanding remains an open problem. In this paper, we consider th...
Yalin Wang, Jianying Hu
WWW
2010
ACM
14 years 2 months ago
Not so creepy crawler: easy crawler generation with standard xml queries
Web crawlers are increasingly used for focused tasks such as the extraction of data from Wikipedia or the analysis of social networks like last.fm. In these cases, pages are far m...
Franziska von dem Bussche, Klara A. Weiand, Benedi...
WWW
2010
ACM
14 years 2 months ago
CETR: content extraction via tag ratios
We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...
Tim Weninger, William H. Hsu, Jiawei Han
WSDM
2010
ACM
265views Data Mining» more  WSDM 2010»
14 years 5 months ago
Data-oriented Content Query System: Searching for Data into Text on the Web
As the Web provides rich data embedded in the immense contents inside pages, we witness many ad-hoc efforts for exploiting fine granularity information across Web text, such as We...
Kevin Chen-Chuan Chang, Mianwei Zhou, Tao Cheng
WWW
2005
ACM
14 years 8 months ago
Improving Web search efficiency via a locality based static pruning method
The unarguably fast, and continuous, growth of the volume of indexed (and indexable) documents on the Web poses a great challenge for search engines. This is true regarding not on...
Edleno Silva de Moura, Célia Francisca dos ...