We consider the following setting: a decision maker must make a decision based on reported data points with binary labels. Subsets of data points are controlled by different selfish agents, which might misreport the labels in order to sway the decision in their favor. We design mechanisms (both deterministic and randomized) that reach an approximately optimal decision and are strategyproof, i.e., agents are best off when they tell the truth. We then recast our results into a classical machine learning classification framework, where the decision maker must make a decision (choose between the constant positive hypothesis and the constant negative hypothesis) based only on a sampled subset of the agents' points.
Reshef Meir, Ariel D. Procaccia, Jeffrey S. Rosens