A labeled sequence data set related to a certain biological property is often biased and, therefore, does not completely capture its diversity in nature. To reduce this sampling bias problem a data mining procedure is proposed for detecting underrepresented relevant sequences. The procedure is aimed at helping domain experts achieve a cost-effective qualitative enlargement of knowledge through an in-depth study of a small number of statistically underrepresented and functionally interesting sequences. Our procedure consists of: (i) learning a class-conditional distribution model on each class of labeled data; (ii) applying the models to select statistically underrepresented unlabeled sequences; and (iii) automatically evaluating their interestingness. An application of the proposed approach is illustrated on an important problem of increasing the data set of confirmed disordered proteins. The obtained results demonstrate the promise of the proposed approach for an efficient reductio...