CLEF-IP 2010: Prior Art Retrieval Using the Different Sections in Patent Documents

15 years 7 months ago

Download clef2010.org

In this paper we describe our participation in the 2010 CLEF-IP Prior Art Retrieval task where we examined the impact of information in different sections of patent documents, namely the title, abstract, claims, description and IPC-R sections, on the retrieval and re-ranking of patent documents. Using a standard bag-of-words approach in Lemur we found that the IPC-R sections are the most informative for patent retrieval. We then performed a re-ranking of the retrieved documents using a Logistic Regression Model, trained on the retrieved documents in the training set. We found indications that the information contained in the text sections of the patent document can contribute to a better ranking of the retrieved documents. The official results have shown that among the nine groups that participated in the Prior Art Retrieval task we achieved the eigth rank in terms of both Mean Average Precision (MAP) and Recall. Categories and Subject Descriptors H.3 [Information Storage and Retrieva...

Eva D'hondt, Suzan Verberne

Real-time Traffic