Companies such as Zara and World Co. have recently implemented novel product development processes and supply chain architectures enabling them to make more product design and assortment decisions during the selling season, when actual demand information becomes available. How should such retail firms modify their product assortment over time in order to maximize overall profits for a given selling season? Focusing on a stylized version of this problem, we study a finite horizon multiarmed bandit model with several plays per stage and Bayesian learning. Our analysis involves the Lagrangian relaxation of weakly coupled dynamic programs, results contributing to the emerging theory of DP duality, and various approximations. It yields a closed-form dynamic index policy capturing the key exploration vs. exploitation trade-off, and associated suboptimality bounds. While in numerical experiments its performance proves comparable to that of other closed-form heuristics described in the li...