Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

166

LAWEB
2003
IEEE

96views Internet Technology» more LAWEB 2003»

Syntactic Similarity of Web Documents

15 years 11 months ago

Syntactic Similarity of Web Documents

Download www.cwr.cl

This paper presents and compares two methods for evaluating the syntactic similarity between documents. The ﬁrst method uses the Patricia tree, constructed from the original document, and the similarity is computed searching the text of each candidate document in the tree. The second method uses shingles concept to obtain the similarity measure for every document pairs, and each shingle from the original document is inserted in a hash table, where shingles of each candidate document are searched. Given an original document and some candidates, two methods ﬁnd documents that have some similarity relationship with the original document. Experimental results were obtained by using a plagiarized documents generator system, from 900 documents collected from the Web. Considering the arithmetic average of the absolute differences between the expected and obtained similarity, the algorithm that uses shingles obtained a performance of ½¿± and the algorithm that uses Patricia tree a perf...

Álvaro R. Pereira Jr., Nivio Ziviani

Real-time Traffic

Candidate Document | Human Computer Interaction | Internet Technology | LAWEB 2003 | Original Document | Patricia Tree |

claim paper

Related Content

» Estimating Resemblance of MIDI Documents

» Postal Address Detection from Web Documents

» Finding Syntactic Similarities Between XML Documents

» Detection of Duplication in Documents and WebPages Based Documents Syntactical Structures ...

» An adaptive fast and safe XML parser based on byte sequences memorization

» Applying syntactic similarity algorithms for enterprise information management

» Syntactic Folding and its Application to the Information Extraction from Web Pages

» Document Structure Integrity A Robust Basis for Crosssite Scripting Defense

» Enhancing Cross Document Coreference of Web Documents with Context Similarity and Very Lar...

Post Info
More Details (n/a)

Added	05 Jul 2010
Updated	05 Jul 2010
Type	Conference
Year	2003
Where	LAWEB
Authors	Álvaro R. Pereira Jr., Nivio Ziviani

Comments (0)