Large scale learning is often realistic only in a semi-supervised setting where a small set of labeled examples is available together with a large collection of unlabeled data. In...
Hidden markov model (HMM) is frequently used for Pinyin-toChinese conversion. But it only captures the dependency with the preceding character. Higher order markov models can brin...
To find near-duplicate documents, fingerprint-based paradigms such as Broder's shingling and Charikar's simhash algorithms have been recognized as effective approaches a...