Intelligent access to information requires semantic integration of structured databases with unstructured textual resources. While the semantic integration problem has been widely studied in the database domain on structured data, it has not been fully recognized nor studied on unstructured or semi-structured textual resources. This paper presents a first step towards this goal by studying semantic integration in natural language texts -- identifying whether different mentions of real world entities, within and across documents, actually represent the same concept. We present a machine learning study of this problem. The first approach is a discriminative approach -- a pairwise local classifier is trained in a supervised way to determine whether two given mentions represent the same real world entity. This is followed, potentially, by a global clustering algorithm that uses the classifier as its similarity metric. Our second approach is a global generative model, at the heart of which...