Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

128

CIKM
1999
Springer

favoriteEmaildiscussreport

124views Information Technology» more CIKM 1999»

Word Segmentation and Recognition for Web Document Framework

15 years 6 months ago

Word Segmentation and Recognition for Web Document Framework

Download www.scs.ryerson.ca

It is observed that a better approach to Web information understanding is to base on its document framework, which is mainly consisted of (i) the title and the URL name of the page, (ii) the titles and the URL names of the Web pages that it points to, (iii) the alternative information source for the embedded Web objects, and (iv) its linkage to other Web pages of the same document. Investigation reveals that a high percentage of words inside the document framework are "compound words" which cannot be understood by ordinary dictionaries. They might be abbreviations or acronyms, or concatenations of several (partial) words. To recover the content hierarchy of Web documents, we propose a new word segmentation and recognition mechanism to understand the information derived from the Web document framework. A maximal bi-directional matching algorithm with heuristic rules is used to resolve ambiguous segmentation and meaning in compound words. An adaptive training process is furthe...

Chi-Hung Chi, Chen Ding, Andrew Lim

Real-time Traffic

CIKM 1999 | Compound Words | Document Framework | Information Management | Web Document Framework |

claim paper

Related Content

» A Novel Two Stage Evaluation Methodology for Word Segmentation Techniques

» Word shape recognition for imagebased document retrieval

» Handwritten Word Recognition Using Conditional Random Fields

» Stochastic Segment Modeling for Offline Handwriting Recognition

» Holistic Word Recognition for Handwritten Historical Documents

» RecognitionBased Segmentation Algorithm for OnLine Arabic Handwriting

» A WebBased Demo to Interactive Multimodal Transcription of Historic Text Images

» Italic or Roman Word Style Recognition without A Priori Knowledge for Old Printed Document...

» Lexiconbased offline recognition of Amharic words in unconstrained handwritten text

Post Info
More Details (n/a)

Added	03 Aug 2010
Updated	03 Aug 2010
Type	Conference
Year	1999
Where	CIKM
Authors	Chi-Hung Chi, Chen Ding, Andrew Lim

Comments (0)