Statistical methods for PP attachment fall into two classes according to the training material used: first, unsupervised methods trained on raw text corpora and second, supervised methods trained on manually disambiguated examples. Usually supervised methods win over unsupervised methods with regard to attachment accuracy. But what if only small sets of manually disambiguated material are available? We show that in this case it is advantageous to intertwine unsupervised and supervised methods into one disambiguation algorithm that outperforms both methods used alone.1