We present an approximate policy iteration algorithm that uses rollouts to estimate the value of each action under a given policy in a subset of states and a classifier to general...
We present a new algorithm, called incremental least squares policy iteration (ILSPI), for finding the infinite-horizon stationary policy for partially observable Markov decision ...
Abstract. In recent security architectures, it is possible that the security policy is not evaluated in a centralized way but requires negotiation between the subject who is reques...
— The objectives of this paper are twofold. First, we introduce a novel policy language, called CIM-SPL (Simple Policy Language for CIM) that complies with the CIM (Common Inform...
Dakshi Agrawal, Seraphin B. Calo, Kang-Won Lee, Jo...
Abstract. Access control mechanisms are used to control which principals (such as users or processes) have access to which resources based on access control policies. To ensure the...
JeeHyun Hwang, Tao Xie, Vincent C. Hu, Mine Altuna...
Policy search is a successful approach to reinforcement learning. However, policy improvements often result in the loss of information. Hence, it has been marred by premature conv...
Studies have shown that users have great difficulty specifying their security and privacy policies in a variety of application domains. While machine learning techniques have succ...
Patrick Gage Kelley, Paul Hankes Drielsma, Norman ...
In reinforcement learning, least-squares temporal difference methods (e.g., LSTD and LSPI) are effective, data-efficient techniques for policy evaluation and control with linear v...
Michael H. Bowling, Alborz Geramifard, David Winga...
The appropriate production and inventory control policy is a key factor for modern enterprises’ success in competitive environment. In the food industry, most of food manufactur...
Computing a good policy in stochastic uncertain environments with unknown dynamics and reward model parameters is a challenging task. In a number of domains, ranging from space ro...