Sciweavers

WWW
2003
ACM
14 years 8 months ago
DOM-based content extraction of HTML documents
Web pages often contain clutter (such as pop-up ads, unnecessary images and extraneous links) around the body of an article that distracts a user from actual content. Extraction o...
Suhit Gupta, Gail E. Kaiser, David Neistadt, Peter...