In this paper, we propose a new system extracting potentially copyright infringement texts from the Web, called EPCI. EPCI extracts them in the following way: (1) generating a set...
Takashi Tashiro, Takanori Ueda, Taisuke Hori, Yu H...
Search results generated by searchable databases are served dynamically and far larger than the static documents on the Web. These results pages have been referred to as the Deep ...
Yasuhiro Yamada, Nick Craswell, Tetsuya Nakatoh, S...
Several Web sites deliver a large number of pages, each publishing data about one instance of some real world entity, such as an athlete, a stock quote, a book. Even though it is ...
Lorenzo Blanco, Valter Crescenzi, Paolo Merialdo, ...
Abstract. We present WBext (Web Browser extended), a web browser extended with client-side mining capabilities. WBext learns sophisticated user interests and browsing habits by tai...
Querying the Web today can be a frustrating activity because the results delivered by syntactically oriented search engines often do not match the intentions of the user. The DARP...
Grit Denker, Jerry R. Hobbs, David L. Martin, Srin...