In web search, recency ranking refers to ranking documents by relevance which takes freshness into account. In this paper, we propose a retrieval system which automatically detect...
Anlei Dong, Yi Chang, Zhaohui Zheng, Gilad Mishne,...
The aim of query-based sampling is to obtain a sufficient, representative sample of an underlying (text) collection. Current measures for assessing sample quality are too coarse gr...
We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...
Bicycling is an affordable, environmentally friendly alternative transportation mode to motorized travel. A common task performed by bikers is to find good routes in an area, whe...
In recent years, the recognition of Farsi and Arabic handwriting is drawing increasing attention. This paper describes the result of the ICDAR 2009 competition for handwritten Far...