Many applications require analyzing vast amounts of textual data, but the size and inherent noise of such data can make processing very challenging. One approach to these issues i...
David G. Underhill, Luke McDowell, David J. Marche...
Tagged data is rapidly becoming more available on the World Wide Web. Web sites which populate tagging services offer a good way for Internet users to share their knowledge. An in...
The problem of identifying approximately duplicate records in databases is an essential step for data cleaning and data integration processes. Most existing approaches have relied...
We investigate how to organize a large collection of geotagged photos, working with a dataset of about 35 million images collected from Flickr. Our approach combines content analy...
David J. Crandall, Lars Backstrom, Daniel P. Hutte...
—Data fusion is the process of integrating multiple sources of information such that their combination yields better results than if the data sources are used individually. This ...