Parallel and Distributed Document Overlap Detection on the Web

15 years 10 months ago

Download www.csse.monash.edu.au

Proliferation of digital libraries plus availability of electronic documents from the Internet have created new challenges for computer science researchers and professionals. Documents are easily copied and redistributed or used to create plagiarised assignments and conference papers. This paper presents a new, two-stage approach for identifying overlapping documents. The first stage is identifying a set of candidate documents that are compared in the second stage using a matching-engine. The algorithm of the matching-engine is based on suffix trees and it modifies the known matching statistics algorithm. Parallel and distributed approaches are discussed at both stages and performance results are presented.

Krisztián Monostori, Arkady B. Zaslavsky, H

Real-time Traffic

Applied Computing | Documents | Electronic Documents | Libraries Plus Availability | PARA 2000 |

claim paper

» Distributed Text Retrieval From Overlapping Collections

» Signature Extraction for Overlap Detection in Documents

» Large Scale Parallel Document Mining for Machine Translation

» Dynamically Selecting Distribution Strategies for Web Documents According to Access Patter...

» Efficient overlap and content reuse detection in blogs and online news articles

» OntoMiner bootstrapping ontologies from overlapping domain specific web sites

» Document Distribution Algorithm for Load Balancing on an Extensible Web Server Architectur...

» Indexing and Retrieval of Scientific Literature

Post Info
More Details (n/a)

Added	25 Aug 2010
Updated	25 Aug 2010
Type	Conference
Year	2000
Where	PARA
Authors	Krisztián Monostori, Arkady B. Zaslavsky, Heinz W. Schmidt

Comments (0)

Sciweavers

Parallel and Distributed Document Overlap Detection on the Web

Applied Computing | Documents | Electronic Documents | Libraries Plus Availability | PARA 2000 |

Explore & Download

Productivity Tools

Sciweavers