Document Ranking by Layout Relevance

16 years 8 days ago

Download www.cfar.umd.edu

This paper describes the development of a new document ranking system based on layout similarity. The user has a need represented by a set of ”wanted” documents, and the system ranks documents in the collection according to this need. Rather than performing complete document analysis, the system extracts text lines, and models layouts as relationships between pairs of these lines. This paper explores three novel feature sets to support scoring in large document collections. First, pairs of lines are used to form quadrilaterals, which are represented by their turning functions. A nonEuclidean distance is used to measure similarity. Second, the quadrilaterals are represented by 5D Euclidean vectors, and third, each line is represented by a 5D Euclidean vector. We compare the classiﬁcation performance and computation speed of these three feature sets using a large database of diverse documents including forms, academic papers and handwritten pages in English and Arabic. The approac...

May Huang, Daniel DeMenthon, David S. Doermann, Ly

Real-time Traffic

5d Euclidean Vector | Document | Document Analysis | ICDAR 2005 | Text Lines |

claim paper

» SAMetaMatch relevant document discovery through document metadata and indexing

» Leveraging Temporal Dynamics of Document Content in Relevance Ranking

» Learning to rank relevant and novel documents through user feedback

» Finding relevant documents using top ranking sentences an evaluation of two alternative sc...

» Expert Search Evaluation by Supporting Documents

» Expected reciprocal rank for graded relevance

» Document Layout Substructure Discovery

» Instability of RelevanceRanked Results Using Latent Semantic Indexing for Web Search

Post Info
More Details (n/a)

Added	24 Jun 2010
Updated	24 Jun 2010
Type	Conference
Year	2005
Where	ICDAR
Authors	May Huang, Daniel DeMenthon, David S. Doermann, Lynn Golebiowski, Booz Allen Hamilton

Comments (0)

Sciweavers

Document Ranking by Layout Relevance

5d Euclidean Vector | Document | Document Analysis | ICDAR 2005 | Text Lines |

Explore & Download

Productivity Tools

Sciweavers