Sciweavers

289 search results - page 13 / 58
» Postal Address Detection from Web Documents
Sort
View
ICDAR
2009
IEEE
13 years 5 months ago
Identification of Very Similar Filled-in Forms with a Reject Option
In this work, a technique addressed to the reliable identification of very similar filled-in forms, with a reject option, is proposed. The method is based on the automatic detecti...
Joaquim Arlandis, Juan Carlos Pérez-Cortes,...
VLDB
2003
ACM
125views Database» more  VLDB 2003»
14 years 8 months ago
THESUS: Organizing Web document collections based on link semantics
Abstract. The requirements for effective search and management of the WWW are stronger than ever. Currently Web documents are classified based on their content not taking into acco...
Maria Halkidi, Benjamin Nguyen, Iraklis Varlamis, ...
ICDE
2005
IEEE
126views Database» more  ICDE 2005»
14 years 1 months ago
WEBVIGIL: Monitoring Multiple Web Pages and Presentation of XML Pages
In the case of large-scale distributed environments such as the Internet, users are interested in monitoring changes to a particular web page (XML or HTML). There are many instanc...
Shravan Chamakura, Alpa Sachde, Sharma Chakravarth...
HT
2003
ACM
14 years 1 months ago
Enhanced web document summarization using hyperlinks
This paper addresses the issue of Web document summarization. As textual content of Web documents is often scarce or irrelevant and existing summarization techniques are based on ...
Jean-Yves Delort, Bernadette Bouchon-Meunier, Mari...
COLING
2010
13 years 2 months ago
Large Scale Parallel Document Mining for Machine Translation
A distributed system is described that reliably mines parallel text from large corpora. The approach can be regarded as cross-language near-duplicate detection, enabled by an init...
Jakob Uszkoreit, Jay Ponte, Ashok C. Popat, Moshe ...