Sciweavers

270 search results - page 21 / 54
» Extracting and Modeling the Semantic Information Content of ...
Sort
View
WWW
2004
ACM
14 years 8 months ago
Managing versions of web documents in a transaction-time web server
This paper presents a transaction-time HTTP server, called ? Apache that supports document versioning. A document often consists of a main file formatted in HTML or XML and severa...
Curtis E. Dyreson, Hui-ling Lin, Yingxia Wang
KDD
2004
ACM
160views Data Mining» more  KDD 2004»
14 years 8 months ago
Boosting for Text Classification with Semantic Features
Abstract. Current text classification systems typically use term stems for representing document content. Semantic Web technologies allow the usage of features on a higher semantic...
Stephan Bloehdorn, Andreas Hotho
KES
2006
Springer
13 years 7 months ago
Integrated Document Browsing and Data Acquisition for Building Large Ontologies
Named entities (e.g., "Kofi Annan", "Coca-Cola", "Second World War") are ubiquitous in web pages and other types of document and often provide a simpl...
Felix Weigel, Klaus U. Schulz, Levin Brunner, Edua...
CORR
2007
Springer
117views Education» more  CORR 2007»
13 years 7 months ago
Dirac Notation, Fock Space and Riemann Metric Tensor in Information Retrieval Models
Using Dirac Notation as a powerful tool, we investigate the three classical Information Retrieval (IR) models and some their extensions. We show that almost all such models can be...
Xing M. Wang
WWW
2010
ACM
14 years 2 months ago
CETR: content extraction via tag ratios
We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...
Tim Weninger, William H. Hsu, Jiawei Han