Sciweavers

368 search results - page 41 / 74
» Template-Based Information Mining from HTML Documents
Sort
View
BTW
2003
Springer
103views Database» more  BTW 2003»
14 years 19 days ago
XPath-Aware Chunking of XML-Documents
Dissemination systems are used to route information received from many publishers individually to multiple subscribers. The core of a dissemination system consists of an efficient...
Wolfgang Lehner, Florian Irmert
KDD
2003
ACM
128views Data Mining» more  KDD 2003»
14 years 7 months ago
Similarity analysis on government regulations
Government regulations are semi-structured text documents that are often voluminous, heavily cross-referenced between provisions and even ambiguous. Multiple sources of regulation...
Gloria T. Lau, Kincho H. Law, Gio Wiederhold
WWW
2010
ACM
14 years 2 months ago
CETR: content extraction via tag ratios
We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...
Tim Weninger, William H. Hsu, Jiawei Han
CORR
2010
Springer
215views Education» more  CORR 2010»
13 years 7 months ago
Text Classification using the Concept of Association Rule of Data Mining
As the amount of online text increases, the demand for text classification to aid the analysis and management of text is increasing. Text is cheap, but information, in the form of...
Chowdhury Mofizur Rahman, Ferdous Ahmed Sohel, Par...
ACL
2009
13 years 5 months ago
MARS: Multilingual Access and Retrieval System with Enhanced Query Translation and Document Retrieval
In this paper, we introduce a multilingual access and retrieval system with enhanced query translation and multilingual document retrieval, by mining bilingual terminologies and a...
Lianhau Lee, AiTi Aw, Thuy Vu, Sharifah Aljunied M...