We study the optimal rates of convergence for estimating a prior distribution over a VC class from a sequence of independent data sets respectively labeled by independent target functions sampled from the prior. We specifically derive upper and lower bounds on the optimal rates under a smoothness condition on the correct prior, with the number of samples per data set equal the VC dimension. These results have implications for the improvements achievable via transfer learning. This research is supported in part by NSF grant IIS-1065251 and a Google Core AI grant.
Liu Yang, Steve Hanneke, Jaime G. Carbonell