XML is becoming a prevalent format for data exchange. Many XML documents have complex schemas that are not always known, and can vary widely between information sources and applica...
Eugene Agichtein, C. T. Howard Ho, Vanja Josifovsk...
In this paper we present a method for automatically segmenting unformatted text records into structured elements. Several useful data sources today are human-generated as continuo...
Vinayak R. Borkar, Kaustubh Deshmukh, Sunita Saraw...
Abstract. We consider the problem of clustering data into k 2 clusters given complex relations -- going beyond pairwise -- between the data points. The complex n-wise relations ar...
Many users and applications require the integration of semi-structured data from autonomous, heterogeneous Web sources. Over the last years mediator systems have emerged that use d...
Abstract. Decision-makers in critical fields such as medicine and finance make use of a wide range of information available over the Internet. Mediation, a data integration techn...