Powerful SIMD instructions in modern processors offer an opportunity for greater search performance. In this paper, we apply these instructions to decoding search engine posting ...
Alexander A. Stepanov, Anil R. Gangolli, Daniel E....
Various semi-supervised learning methods have been proposed recently to solve the long-standing shortage problem of manually labeled data in sentiment classification. However, mos...
In most of the cases, scientists depend on previous literature which is relevant to their research fields for developing new ideas. However, it is not wise, nor possible, to trac...
Rui Yan, Jie Tang, Xiaobing Liu, Dongdong Shan, Xi...
A continuous top-k query retrieves the k most preferred objects in a data stream according to a given preference function. These queries are important for a broad spectrum of appl...
Avani Shastri, Di Yang, Elke A. Rundensteiner, Mat...
Entity matching (EM) is the task of identifying records that refer to the same real-world entity from different data sources. While EM is widely used in data integration and data...
A key component of BM25 contributing to its success is its sub-linear term frequency (TF) normalization formula. The scale and shape of this TF normalization component is controll...
The effectiveness and scalability of MapReduce-based implementations of complex data-intensive tasks depend on an even redistribution of data between map and reduce tasks. In the...
The study of information flow analyzes the principles and mechanisms of social information distribution. It is becoming an extremely important research topic in social network re...
Hongliang Fei, Ruoyi Jiang, Yuhao Yang, Bo Luo, Ju...
NoSQL databases focus on analytical processing of large scale datasets, offering increased scalability over commodity hardware. One of their strongest features is elasticity, whi...
Ioannis Konstantinou, Evangelos Angelou, Christina...