Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

182

HICSS
2016
IEEE

50views Biometrics» more HICSS 2016»

Text-Based Document Similarity Matching Using Sdtext

10 years 3 months ago

Text-Based Document Similarity Matching Using Sdtext

Download people.cs.georgetown.edu

ACT Forensics examiners frequently try to identify duplicate ﬁles during an investigation. They might do so to identify known ﬁles of interest, or to allow more rapid review of documents that appear to be similar. Current forensic tools for detecting duplicate ﬁles operate over the low-level bits of the ﬁle, typically using hashing. While this can be a fast and effective method in many cases, it can fail due to differences in ﬁle format. We introduce sdtext, a tool developed to identify similar ﬁles based on their textual contents, which is robust to changes in format. We show that sdtext is far more accurate than existing tools in matching ﬁles that contain the same text in different formats.

Clay Shields

Real-time Traffic

Biometrics | HICSS 2016 |

claim paper

Related Content

» Using FuzzyWord Correlation Factors to Compute Document Similarity Based on Phrase Matchin...

» Phrasebased Document Similarity Based on an Index Graph Model

» Evading the annotation bottleneck using sequence similarity to search nonsequence gene dat...

» AntiSerendipity Finding Useless Documents and Similar Documents

» Finding Similar RSS News Articles Using CorrelationBased Phrase Matching

» Categorybased Similarity Algorithm for Semantic Similarity in Multiagent Information Shari...

» Inferring document similarity from hyperlinks

» Exploiting the Similarity of NonMatching Terms at Retrieval Time

» A TextualBased Similarity Approach for Efficient and Scalable External Plagiarism Analysis...

» Background Knowledge Indexing and Matching Interdependencies of Document Management and On...

Post Info
More Details (n/a)

Added	03 Apr 2016
Updated	03 Apr 2016
Type	Journal
Year	2016
Where	HICSS
Authors	Clay Shields

Comments (0)