Sciweavers

WWW
2007
ACM
15 years 7 days ago
GigaHash: scalable minimal perfect hashing for billions of urls
A minimal perfect function maps a static set of keys on to the range of integers {0,1,2, ... , - 1}. We present a scalable high performance algorithm based on random graphs for ...
Kumar Chellapilla, Anton Mityagin, Denis Xavier Ch...
WWW
2007
ACM
15 years 7 days ago
A search-based Chinese word segmentation method
In this paper, we propose a novel Chinese word segmentation method which leverages the huge deposit of Web documents and search technology. It simultaneously solves ambiguous phra...
Xin-Jing Wang, Yong Qin, Wen Liu
WWW
2007
ACM
15 years 7 days ago
Investigating behavioral variability in web search
Understanding the extent to which people'ssearch behaviors differ in terms of the interaction flow and information targeted is important in designing interfaces to help World...
Ryen W. White, Steven M. Drucker
WWW
2007
ACM
15 years 7 days ago
Towards domain-independent information extraction from web tables
Traditionally, information extraction from web tables has focused on small, more or less homogeneous corpora, often based on assumptions about the use of <table> tags. A mul...
Bernhard Krüpl, Bernhard Pollak, Marcus Herzo...
WWW
2007
ACM
15 years 7 days ago
Robust web page segmentation for mobile terminal using content-distances and page layout information
The demand of browsing information from general Web pages using a mobile phone is increasing. However, since the majority of Web pages on the Internet are optimized for browsing f...
Gen Hattori, Keiichiro Hoashi, Kazunori Matsumoto,...
WWW
2007
ACM
15 years 7 days ago
U-REST: an unsupervised record extraction system
In this paper, we describe a system that can extract record structures from web pages with no direct human supervision. Records are commonly occurring HTML-embedded data tuples th...
Yuan Kui Shen, David R. Karger
WWW
2007
ACM
15 years 7 days ago
EPCI: extracting potentially copyright infringement texts from the web
In this paper, we propose a new system extracting potentially copyright infringement texts from the Web, called EPCI. EPCI extracts them in the following way: (1) generating a set...
Takashi Tashiro, Takanori Ueda, Taisuke Hori, Yu H...
WWW
2007
ACM
15 years 7 days ago
Yago: a core of semantic knowledge
We present YAGO, a light-weight and extensible ontology with high coverage and quality. YAGO builds on entities and relations and currently contains more than 1 million entities a...
Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weiku...
WWW
2007
ACM
15 years 7 days ago
Supervised rank aggregation
This paper is concerned with rank aggregation, the task of combining the ranking results of individual rankers at meta-search. Previously, rank aggregation was performed mainly by...
Yu-Ting Liu, Tie-Yan Liu, Tao Qin, Zhiming Ma, Han...
WWW
2007
ACM
15 years 7 days ago
Towards multi-granularity multi-facet e-book retrieval
Generally speaking, digital libraries have multiple granularities of semantic units: book, chapter, page, paragraph and word. However, there are two limitations of current eBook r...
Chong Huang, YongHong Tian, Zhi Zhou, Tiejun Huang