Sciweavers

139 search results - page 10 / 28
» An Approach to Identify Duplicated Web Pages
Sort
View
COLCOM
2008
IEEE
13 years 9 months ago
Web Canary: A Virtualized Web Browser to Support Large-Scale Silent Collaboration in Detecting Malicious Web Sites
Abstract. Malicious Web content poses a serious threat to the Internet, organizations and users. Current approaches to detecting malicious Web content employ high-powered honey cli...
Jiang Wang, Anup K. Ghosh, Yih Huang
CN
1999
242views more  CN 1999»
13 years 7 months ago
Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery
The rapid growth of the World-Wide Web poses unprecedented scaling challenges for general-purpose crawlers and search engines. In this paper we describe a new hypertext resource d...
Soumen Chakrabarti, Martin van den Berg, Byron Dom
CIKM
2003
Springer
14 years 28 days ago
Extracting unstructured data from template generated web documents
We propose a novel approach that identifies web page templates and extracts the unstructured data. Extracting only the body of the page and eliminating the template increases the ...
Ling Ma, Nazli Goharian, Abdur Chowdhury, Misun Ch...
AIRWEB
2009
Springer
14 years 2 months ago
Looking into the past to better classify web spam
Web spamming techniques aim to achieve undeserved rankings in search results. Research has been widely conducted on identifying such spam and neutralizing its influence. However,...
Na Dai, Brian D. Davison, Xiaoguang Qi
WEBDB
2007
Springer
133views Database» more  WEBDB 2007»
14 years 1 months ago
EntityAuthority: Semantically Enriched Graph-Based Authority Propagation
This paper pursues the recently emerging paradigm of searching for entities that are embedded in Web pages. We utilize informationextraction techniques to identify entity candidat...
Julia Stoyanovich, Srikanta J. Bedathur, Klaus Ber...