Abstract. New methods of data collection, in particular the wide range of sensors and sensor networks that are being constructed, with the ability to collect real-time data streams...
Record deduplication is the task of merging database records that refer to the same underlying entity. In relational databases, accurate deduplication for records of one type is o...
There exist many interrelated information sources on the Internet that can be categorized into structured (database) and semistructured (documents). A key challenge is to integrat...
Entity Recognition (ER) is a key component of relation extraction systems and many other natural-language processing applications. Unfortunately, most ER systems are restricted to...
A variety of heterogenous data sources is available in the field of molecular biology. Our focus lies on the biological sequence data, i. e. data maintained in collections like EM...