: The management and handling of big data is a major challenge in the area of life science. Beside the data storage, information retrieval methods have to be adapted to huge data amounts as well. Therefore we present an approach to improve search results in life science by recommendations based on semantic information. In detail we determine relationships between documents by searching for shared database IDs as well as ontology identifiers. We have established a pipeline based on Hadoop allowing a distributed computation of large amounts of textual data. A comparison with the widely used cosine similarity has been performed. Its results are presented in this work as well.