In this paper, we describe our work in progress in the scope of information retrieval exploiting the spatial data extracted from web documents. We discuss problems of a search for ...
Stefan Dlugolinsky, Michal Laclavik, Ladislav Hluc...
Existing HTML mark-up is used only to indicate the structure and lay-out of documents, but not the document semantics. As a result web documents are difficult to be semantically p...
Indexing quality has an overwhelming effect on retrieval effectiveness of search engines. In the past few years it has become one of the major challenges in the search engines are...
Given the increasing traffic on the World Wide Web (Web), it is difficult for a single popular Web server to handle the demand from its many clients. By clustering a group of Web ...
Web documents present new challenges to conventional Information Retrieval (IR) technologies. This paper describes how these challenges are faced in FameIR, a multilingual multime...
The user observed latency of retrieving Web documents is one of
limiting factors while using the Internet as an information data source.
Prefetching became important technique ...
Replicating Web documents at a worldwide scale can help reduce user-perceived latency and wide-area network traffic. This paper presents the design of Globule, a platform that aut...
Complex web information structures prevent search engines from providing satisfactory context-sensitive retrieval. We see that in order to overcome this obstacle, it is essential t...
Abstract. In spite of the wide use of the Internet, it is difficult to develop desirable web documents evaluation that reflects users’ needs. Many automatic ranking systems have ...
Cloning is extremely likely to occur in web sites, much more so than in other software. While some clones exist for valid reasons, or are too small to eliminate, cloning percentag...