Duplicate URLs have brought serious troubles to the whole pipeline of a search engine, from crawling, indexing, to result serving. URL normalization is to transform duplicate URLs...
Tao Lei, Rui Cai, Jiang-Ming Yang, Yan Ke, Xiaodon...
Named entity (NE) recognition is a task in which proper nouns and numerical information in a document are detected and classified into categories such as person, organization, loc...
We introduce and discuss the application of statistical physics concepts in the context of on-line machine learning processes. The consideration of typical properties of very large...
The voice activity detectors (VADs) based on statistical models have shown impressive performances especially when fairly precise statistical models are employed. Moreover, the ac...
This paper describes a framework for defining domain specific Feature Functions in a user friendly form to be used in a Maximum Entropy Markov Model (MEMM) for the Named Entity Re...