In this paper, we proposed an online algorithm, called FQT-Stream (Frequent Query Trees of Streams), to mine the set of all frequent tree patterns over a continuous XML data strea...
An important issue arising from large scale data integration is how to efficiently select the top-K ranking answers from multiple sources while minimizing the transmission cost. T...
An important issue arising from Peer-to-Peer applications is how to accurately and efficiently retrieve a set of K best matching data objects from different sources while minimizi...
In the AllRight project, we are developing an algorithm for unsupervised table detection and segmentation that uses the visual rendition of a Web page rather than the HTML code. O...
Web pages include extraneous material that may be viewed as undesirable by a user. Increasingly many Web sites also require users to register to access either all or portions of t...
Contextual search refers to proactively capturing the information need of a user by automatically augmenting the user query with information extracted from the search context; for...
We present an empirical evaluation and comparison of two content extraction methods in HTML: absolute XPath expressions and relative XPath expressions. We argue that the relative ...
Marek Kowalkiewicz, Maria E. Orlowska, Tomasz Kacz...
The use of Semantic Web Service (SWS) technologies have been suggested to enable more dynamic B2B integration of heterogeneous systems and partners. We present how we add semantic...
Applications and services that access Web data are becoming increasingly more useful and wide-spread. Current main-stream Web query languages such as XQuery, XSLT, or SPARQL, howe...
This paper describes an experimental system in which customized high performance XML parsers are prepared using parser generation and compilation techniques. Parsing is integrated...
Margaret Gaitatzes Kostoulas, Morris Matsa, Noah M...