Sciweavers

563 search results - page 59 / 113
» Crawling the web for structured documents
Sort
View
WWW
2005
ACM
16 years 4 months ago
Automatically learning document taxonomies for hierarchical classification
While several hierarchical classification methods have been applied to web content, such techniques invariably rely on a pre-defined taxonomy of documents. We propose a new techni...
Kunal Punera, Suju Rajan, Joydeep Ghosh
DIS
2001
Springer
15 years 8 months ago
Eliminating Useless Parts in Semi-structured Documents Using Alternation Counts
We propose a preprocessing method for Web mining which, given semi-structured documents with the same structure and style, distinguishes useless parts and non-useless parts in each...
Daisuke Ikeda, Yasuhiro Yamada, Sachio Hirokawa
WEBDB
2004
Springer
125views Database» more  WEBDB 2004»
15 years 9 months ago
Best-Match Querying from Document-Centric XML
On the Web, there is a pervasive use of XML to give lightweight semantics to textual collections. Such documentcentric XML collections require a query language that can gracefully...
Jaap Kamps, Maarten Marx, Maarten de Rijke, Bö...
SIGDOC
2006
ACM
15 years 10 months ago
Taming the inaccessible web
Visually impaired users are hindered in their efforts to access the largest repository of electronic information in the world, namely the World Wide Web (Web). A visually impaired...
Simon Harper, Sean Bechhofer, Darren Lunn
KDD
2005
ACM
194views Data Mining» more  KDD 2005»
16 years 4 months ago
Web object indexing using domain knowledge
Web object is defined to represent any meaningful object embedded in web pages (e.g. images, music) or pointed to by hyperlinks (e.g. downloadable files). Users usually search for...
Muyuan Wang, Zhiwei Li, Lie Lu, Wei-Ying Ma, Naiya...