Limit theorems (including a Berry-Esseen bound) are derived for the number of comparisons taken by the Boyer-Moore algorithm for finding the occurrences of a given pattern in a ra...
Given two strings (a text t of length n and a pattern p) and a natural number w, window subsequence problems consist in deciding whether p occurs as a subsequence of t and/or findi...
Recently, semantic text portion (STP) is getting popular in the field of Web mining. STP is a text portion in the original page which is semantically related to the anchor pointing...
Text classification poses some specific challenges. One such challenge is its high dimensionality where each document (data point) contains only a small subset of them. In this pap...
Naive Bayes is often used as a baseline in text classification because it is fast and easy to implement. Its severe assumptions make such efficiency possible but also adversely af...
Jason D. Rennie, Lawrence Shih, Jaime Teevan, Davi...