Active inference seeks to maximize classification performance while minimizing the amount of data that must be labeled ex ante. This task is particularly relevant in the context of relational data, where statistical dependencies among instances can be exploited to improve classification accuracy. We show that efficient methods for indexing network structure can be exploited to select high-value nodes for labeling. We use a network structure index to select nodes for labeling, and we show that this approach substantially outperforms random selection and selection based on simple measures of local structure. We demonstrate the relative effectiveness of this selection approach through experiments with a relational neighbor classifier on a variety of real and synthetic data sets, and explore the necessary characteristics of the data set that allow this approach to perform well. Categories and Subject Descriptors H.2.8 [Database Applications]: data mining; I.2.6 [Artificial Intelligence]: ...
Matthew J. Rattigan, Marc Maier, David Jensen, Bin