In a higher level task such as clustering of web results or word sense disambiguation, knowledge of all possible distinct concepts in which an ambiguous word can be expressed woul...
A website can regulate search engine crawler access to its content using the robots exclusion protocol, specified in its robots.txt file. The rules in the protocol enable the site...
Form mapping is the key problem that needs to be solved in order to get access to the hidden web. Currently available solutions for fully automatic mapping are not ready for comme...
Most of the faster community extraction algorithms are based on the Clauset, Newman and Moore (CNM), which is employed for networks with sizes up to 500,000 nodes. The modificatio...
Web graphs are approximate snapshots of the web, created by search engines. Their creation is an error-prone procedure that relies on the availability of Internet nodes and the fa...
Panagiotis Papadimitriou 0002, Ali Dasdan, Hector ...