Identification of transliterations is aimed at enriching multilingual lexicons and improving performance in various Natural Language Processing (NLP) applications including Cross ...
Finding good representations of text documents is crucial in information retrieval and classification systems. Today the most popular document representation is based on a vector ...
What makes template content in the Web so special that we need to remove it? In this paper I present a large-scale aggregate analysis of textual Web content, corroborating statist...
ion and Refinement Hyunyoung Kil Wonhong Nam Dongwon Lee The Pennsylvania State University, University Park, PA 16802, USA {hykil, wnam, dongwon}@psu.edu The behavioral descriptio...
Many real life datasets have skewed distributions of events when the probability of observing few events far exceeds the others. In this paper, we observed that in skewed datasets...