Sciweavers

WWW
2009
ACM
14 years 2 months ago
Network-aware forward caching
This paper proposes and evaluates a Network Aware Forward Caching approach for determining the optimal deployment strategy of forward caches to a network. A key advantage of this ...
Jeffrey Erman, Alexandre Gerber, Mohammad Taghi Ha...
WWW
2009
ACM
14 years 2 months ago
Building term suggestion relational graphs from collective intelligence
This paper proposes an effective approach to provide relevant search terms for conceptual Web search. ‘Semantic Term Suggestion’ function has been included so that users can f...
Jyh-Ren Shieh, Yung-Huan Hsieh, Yang-Ting Yeh, Tse...
WWW
2009
ACM
14 years 2 months ago
News article extraction with template-independent wrapper
We consider the problem of template-independent news extraction. The state-of-the-art news extraction method is based on template-level wrapper induction, which has two serious li...
Junfeng Wang, Xiaofei He, Can Wang, Jian Pei, Jiaj...
WWW
2009
ACM
14 years 2 months ago
Threshold selection for web-page classification with highly skewed class distribution
We propose a novel cost-efficient approach to threshold selection for binary web-page classification problems with imbalanced class distributions. In many binary-classification ta...
Xiaofeng He, Lei Duan, Yiping Zhou, Byron Dom
WWW
2009
ACM
14 years 8 months ago
Smart Miner: a new framework for mining large scale web usage data
In this paper, we propose a novel framework called SmartMiner for web usage mining problem which uses link information for producing accurate user sessions and frequent navigation...
Murat Ali Bayir, Ismail Hakki Toroslu, Ahmet Cosar...
WWW
2009
ACM
14 years 8 months ago
Mining multilingual topics from wikipedia
In this paper, we try to leverage a large-scale and multilingual knowledge base, Wikipedia, to help effectively analyze and organize Web information written in different languages...
Xiaochuan Ni, Jian-Tao Sun, Jian Hu, Zheng Chen
WWW
2009
ACM
14 years 8 months ago
Estimating web site readability using content extraction
Nowadays, information is primarily searched on the WWW. From a user perspective, the readability is an important criterion for measuring the accessibility and thereby the quality ...
Thomas Gottron, Ludger Martin
WWW
2009
ACM
14 years 8 months ago
C-SPARQL: SPARQL for continuous querying
C-SPARQL is an extension of SPARQL to support continuous queries over RDF data streams. Supporting streams in RDF format guarantees interoperability and opens up important applica...
Davide Francesco Barbieri, Daniele Braga, Stefano ...
WWW
2009
ACM
14 years 8 months ago
Purely URL-based topic classification
Given only the URL of a web page, can we identify its topic? This is the question that we examine in this paper. Usually, web pages are classified using their content [7], but a U...
Eda Baykan, Monika Rauch Henzinger, Ludmila Marian...
WWW
2009
ACM
14 years 8 months ago
Extracting article text from the web with maximum subsequence segmentation
Much of the information on the Web is found in articles from online news outlets, magazines, encyclopedias, review collections, and other sources. However, extracting this content...
Jeff Pasternack, Dan Roth