Sciweavers

348 search results - page 65 / 70
» Adversarial Information Retrieval on the Web (AIRWeb 2007)
Sort
View
WWW
2008
ACM
14 years 9 months ago
A larger scale study of robots.txt
A website can regulate search engine crawler access to its content using the robots exclusion protocol, specified in its robots.txt file. The rules in the protocol enable the site...
Santanu Kolay
VLDB
2007
ACM
121views Database» more  VLDB 2007»
14 years 8 months ago
Efficient Keyword Search over Virtual XML Views
Emerging applications such as personalized portals, enterprise search and web integration systems often require keyword search over semi-structured views. However, traditional inf...
Feng Shao, Lin Guo, Chavdar Botev, Anand Bhaskar, ...
DASFAA
2007
IEEE
143views Database» more  DASFAA 2007»
14 years 3 months ago
Using Redundant Bit Vectors for Near-Duplicate Image Detection
Images are amongst the most widely proliferated form of digital information due to affordable imaging technologies and the Web. In such an environment, the use of digital watermar...
Jun Jie Foo, Ranjan Sinha
WWW
2007
ACM
14 years 9 months ago
GigaHash: scalable minimal perfect hashing for billions of urls
A minimal perfect function maps a static set of keys on to the range of integers {0,1,2, ... , - 1}. We present a scalable high performance algorithm based on random graphs for ...
Kumar Chellapilla, Anton Mityagin, Denis Xavier Ch...
ICAIL
2007
ACM
14 years 15 days ago
Essential deduplication functions for transactional databases in law firms
As massive document repositories and knowledge management systems continue to expand, in proprietary environments as well as on the Web, the need for duplicate detection becomes i...
Jack G. Conrad, Edward L. Raymond