ACT Forensics examiners frequently try to identify duplicate files during an investigation. They might do so to identify known files of interest, or to allow more rapid review of documents that appear to be similar. Current forensic tools for detecting duplicate files operate over the low-level bits of the file, typically using hashing. While this can be a fast and effective method in many cases, it can fail due to differences in file format. We introduce sdtext, a tool developed to identify similar files based on their textual contents, which is robust to changes in format. We show that sdtext is far more accurate than existing tools in matching files that contain the same text in different formats.