Sciweavers

232 search results - page 29 / 47
» Query-related data extraction of hidden web documents
Sort
View
TOIS
2008
145views more  TOIS 2008»
13 years 7 months ago
Classification-aware hidden-web text database selection
Many valuable text databases on the web have non-crawlable contents that are "hidden" behind search interfaces. Metasearchers are helpful tools for searching over multip...
Panagiotis G. Ipeirotis, Luis Gravano
PVLDB
2008
141views more  PVLDB 2008»
13 years 7 months ago
WebTables: exploring the power of tables on the web
The World-Wide Web consists of a huge number of unstructured documents, but it also contains structured data in the form of HTML tables. We extracted 14.1 billion HTML tables from...
Michael J. Cafarella, Alon Y. Halevy, Daisy Zhe Wa...
PAKDD
2009
ACM
116views Data Mining» more  PAKDD 2009»
14 years 2 months ago
Scalable Web Mining with Newistic
Abstract. Newistic is a web mining platform that collects and analyses documents crawled from the Internet. Although it currently processes news articles, it can be easily adapted ...
Ovidiu Dan, Horatiu Mocian
KDD
2004
ACM
160views Data Mining» more  KDD 2004»
14 years 8 months ago
Boosting for Text Classification with Semantic Features
Abstract. Current text classification systems typically use term stems for representing document content. Semantic Web technologies allow the usage of features on a higher semantic...
Stephan Bloehdorn, Andreas Hotho
KDD
2005
ACM
194views Data Mining» more  KDD 2005»
14 years 8 months ago
Web object indexing using domain knowledge
Web object is defined to represent any meaningful object embedded in web pages (e.g. images, music) or pointed to by hyperlinks (e.g. downloadable files). Users usually search for...
Muyuan Wang, Zhiwei Li, Lie Lu, Wei-Ying Ma, Naiya...