: This paper addresses the dynamic pricing problem of a single-item, make-to-stock production system. Demand arrives according to Poisson processes with changeable arrival rate dep...
—We propose a dynamic spectrum access scheme where secondary users recommend “good” channels to each other and access accordingly. We formulate the problem as an average rewa...
Learning to act in a multiagent environment is a difficult problem since the normal definition of an optimal policy no longer applies. The optimal policy at any moment depends on ...
We consider the problem of several users transmitting packets to a base station, and study an optimal scheduling formulation involving three communication layers, namely, the mediu...
Abstract--Incremental-redundancy hybrid automatic repeatrequest (IR-HARQ) schemes are proposed in several wireless standards for increased throughput-efficiency and greater reliabi...
Ashok K. Karmokar, Dejan V. Djonin, Vijay K. Bharg...
In this paper, we study the dynamic production location decisions of a manufacturer of a certain branded product. Considering brand-image as a form of goodwill, we extend the well...
Gila E. Fruchter, Eugene D. Jaffe, Israel D. Neben...
We present the first temporal-difference learning algorithm for off-policy control with unrestricted linear function approximation whose per-time-step complexity is linear in the ...
We present an anytime algorithm which computes policies for decision problems represented as multi-stage influence diagrams. Our algorithm constructs policies incrementally, start...
In this paper, we address two issues of long-standing interest in the reinforcement learning literature. First, what kinds of performance guarantees can be made for Q-learning aft...