Sciweavers

8316 search results - page 90 / 1664
» Web Document Modeling
Sort
View
WWW
2010
ACM
14 years 5 months ago
CETR: content extraction via tag ratios
We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...
Tim Weninger, William H. Hsu, Jiawei Han
ACSW
2004
13 years 11 months ago
Discovering Parallel Text from the World Wide Web
Parallel corpus is a rich linguistic resource for various multilingual text management tasks, including crosslingual text retrieval, multilingual computational linguistics and mul...
Jisong Chen, Rowena Chau, Chung-Hsing Yeh
EUSFLAT
2003
155views Fuzzy Logic» more  EUSFLAT 2003»
13 years 11 months ago
Fuzzy clustering for indexing in the GAMBAL information retrieval system
Gambal is an information retrieval system for indexing and accessing web pages that includes graphical interfaces to ease web page search and accessing. In particular, the interfa...
Vicenç Torra, Sergi Lanau, Sadaaki Miyamoto
ICDAR
2003
IEEE
14 years 3 months ago
Web Page Summarization for Handheld Devices: A Natural Language Approach
Summarization of web pages is a very interesting topic from both academic and commercial point of view. Academically, it is challenging to create a summary of a document (e.g. a w...
Hassan Alam, Rachmat Hartono, Aman Kumar, Ahmad Fu...
AGENTS
1999
Springer
14 years 2 months ago
Adaptive Web Site Agents
We discuss the design of a class of agents that we call adaptive web site agents. The goal of such an agent is to help a user find information at a particular web site, adapting i...
Michael J. Pazzani, Daniel Billsus