Current-day crawlers retrieve content only from the publicly indexable Web, i.e., the set of Web pages reachable purely by following hypertext links, ignoring search forms and pag...
Not only is Wikipedia a comprehensive source of quality information, it has several kinds of internal structure (e.g., relational summaries known as infoboxes), which enable self-...
Most decision tree algorithms base their splitting decisions on a piecewise constant model. Often these splitting algorithms are extrapolated to trees with non-constant models at ...
David S. Vogel, Ognian Asparouhov, Tobias Scheffer
In this paper, we consider the problem of identifying and segmenting topically cohesive regions in the URL tree of a large website. Each page of the website is assumed to have a t...
People display adaptive language behaviors in face-to-face conversations, but will computer users do the same during HCI? We report an experiment (N=20) demonstrating that users...
Jamie Pearson, Jiang Hu, Holly P. Branigan, Martin...