Recently developed dual techniques allow us to evaluate a given sub-optimal dynamic portfolio policy by using the policy to construct an upper bound on the optimal value function. Moreover, when the policy is in fact optimal, the upper bound coincides with the optimal value function. Since it is easy to construct a lower bound by simulating the given policy, we may use the distance from the lower bound to the upper bound to assess the quality of the policy. One of the difficulties that arises when computing the upper bound, however, is that we need to know the suboptimal policy’s value function and its partial derivatives with respect to all state variables. If these quantities are not available analytically, then an alternative upper bound can still be computed but it is less satisfying from a theoretical perspective. In this paper we show how path-wise MonteCarlo estimators together with the cross-path regression approach can be used used to estimate the sub-optimal value functio...
Martin B. Haugh, Ashish Jain