Enhancing Cross Document Coreference of Web Documents with Context Similarity and Very Large Scale Text Categorization