Manymethods for analyzing biological problems are constrained by problemsize. Theability to distinguish betweenrelevant andirrelevant features of a problemmay allowa problemto be reducedin size sufficiendyto makeit tractable. Theissue of learning in the presenceof large numbersof irrelevant features is an important one in machinelearning, andrecently, several methodshavebeen proposedto address this issue. Acombinationof machine learningapproachesandstatistical analysismethodscanbe usedto identify a set of relevant attributes for currently intractable biological problems.Wecall our framework F/I/E (Focus-Induce-Extract). As an exampleof this methodology,this paperreports onthe identification of the features of mutationsin collagenthat are likely to be relevantin the bonediseaseOsteogenesisimperfecta.
Lawrence Hunter, Teri E. Klein