Abstract. Modern document collections often contain groups of documents with overlapping or shared content. However, most information retrieval systems process each document separa...
Andrei Z. Broder, Nadav Eiron, Marcus Fontoura, Mi...
Scalable execution of continuous queries over massive data streams often requires splitting input streams into parallel sub-streams over which query operators are executed in paral...
— Massive data analysis on large clusters presents new opportunities and challenges for query optimization. Data partitioning is crucial to performance in this environment. Howev...
— Commercial tuple extraction systems have enjoyed some success to extract tuples by regarding HTML pages as tree structures and exploiting XPath queries to find attributes of t...
Most operations of the relational algebra or SQL - like projection with duplicate elimination, join, ordering, group by and aggregations - are efficiently processed using a sorted...