Internet Technology | Sciweavers

167

Voted

WWW
2007
ACM

285views Internet Technology» more WWW 2007»

GigaHash: scalable minimal perfect hashing for billions of urls

16 years 7 months ago

Download www2007.org

A minimal perfect function maps a static set of keys on to the range of integers {0,1,2, ... , - 1}. We present a scalable high performance algorithm based on random graphs for ...

Kumar Chellapilla, Anton Mityagin, Denis Xavier Ch...

claim paper

Read More »

153

click to vote

WWW
2007
ACM

117views Internet Technology» more WWW 2007»

A search-based Chinese word segmentation method

16 years 7 months ago

Download www2007.org

In this paper, we propose a novel Chinese word segmentation method which leverages the huge deposit of Web documents and search technology. It simultaneously solves ambiguous phra...

Xin-Jing Wang, Yong Qin, Wen Liu

claim paper

Read More »

148

click to vote

WWW
2007
ACM

149views Internet Technology» more WWW 2007»

Investigating behavioral variability in web search

16 years 7 months ago

Download www2007.org

Understanding the extent to which people'ssearch behaviors differ in terms of the interaction flow and information targeted is important in designing interfaces to help World...

Ryen W. White, Steven M. Drucker

claim paper

Read More »

213

click to vote

WWW
2007
ACM

144views Internet Technology» more WWW 2007»

Towards domain-independent information extraction from web tables

16 years 7 months ago

Download www2007.org

Traditionally, information extraction from web tables has focused on small, more or less homogeneous corpora, often based on assumptions about the use of <table> tags. A mul...

Bernhard Krüpl, Bernhard Pollak, Marcus Herzo...

claim paper

Read More »

169

click to vote

WWW
2007
ACM

138views Internet Technology» more WWW 2007»

Robust web page segmentation for mobile terminal using content-distances and page layout information

16 years 7 months ago

Download www2007.org

The demand of browsing information from general Web pages using a mobile phone is increasing. However, since the majority of Web pages on the Internet are optimized for browsing f...

Gen Hattori, Keiichiro Hoashi, Kazunori Matsumoto,...

claim paper

Read More »

169

click to vote

WWW
2007
ACM

131views Internet Technology» more WWW 2007»

U-REST: an unsupervised record extraction system

16 years 7 months ago

Download people.csail.mit.edu

In this paper, we describe a system that can extract record structures from web pages with no direct human supervision. Records are commonly occurring HTML-embedded data tuples th...

Yuan Kui Shen, David R. Karger

claim paper

Read More »

175

click to vote

WWW
2007
ACM

224views Internet Technology» more WWW 2007»

EPCI: extracting potentially copyright infringement texts from the web

16 years 7 months ago

Download www2007.org

In this paper, we propose a new system extracting potentially copyright infringement texts from the Web, called EPCI. EPCI extracts them in the following way: (1) generating a set...

Takashi Tashiro, Takanori Ueda, Taisuke Hori, Yu H...

claim paper

Read More »

151

click to vote

WWW
2007
ACM

118views Internet Technology» more WWW 2007»

Yago: a core of semantic knowledge

16 years 7 months ago

Download www2007.org

We present YAGO, a light-weight and extensible ontology with high coverage and quality. YAGO builds on entities and relations and currently contains more than 1 million entities a...

Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weiku...

claim paper

Read More »

159

click to vote

WWW
2007
ACM

139views Internet Technology» more WWW 2007»

Supervised rank aggregation

16 years 7 months ago

Download www2007.org

This paper is concerned with rank aggregation, the task of combining the ranking results of individual rankers at meta-search. Previously, rank aggregation was performed mainly by...

Yu-Ting Liu, Tie-Yan Liu, Tao Qin, Zhiming Ma, Han...

claim paper

Read More »

147

click to vote

WWW
2007
ACM

120views Internet Technology» more WWW 2007»

Towards multi-granularity multi-facet e-book retrieval

16 years 7 months ago

Download www2007.org

Generally speaking, digital libraries have multiple granularities of semantic units: book, chapter, page, paragraph and word. However, there are two limitations of current eBook r...

Chong Huang, YongHong Tian, Zhi Zhou, Tiejun Huang

claim paper

Read More »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers