Web spam is behavior that attempts to deceive search engine ranking algorithms. TrustRank is a recent algorithm that can combat web spam. However, TrustRank is vulnerable in the s...
Web pages contain clutter (such as ads, unnecessary images and extraneous links) around the body of an article, which distracts a user from actual content. Extraction of "use...
To take the first step beyond keyword-based search toward entity-based search, suitable token spans ("spots") on documents must be identified as references to real-world...
Sayali Kulkarni, Amit Singh, Ganesh Ramakrishnan, ...
Although Web search engines are targeted towards helping people find new information, people regularly use them to re-find Web pages they have seen before. Researchers have noted ...
Recent years have seen the rise of subject-themed digital libraries, such as the NSDL pathways and the Digital Library for Earth System Education (DLESE). These libraries often ne...
Steven Bethard, Soumya Ghosh, James H. Martin, Tam...