Sciweavers

473 search results - page 82 / 95
» Optimal policy switching algorithms for reinforcement learni...
Sort
View
COLT
2010
Springer
13 years 5 months ago
Best Arm Identification in Multi-Armed Bandits
We consider the problem of finding the best arm in a stochastic multi-armed bandit game. The regret of a forecaster is here defined by the gap between the mean reward of the optim...
Jean-Yves Audibert, Sébastien Bubeck, R&eac...
ECML
2007
Springer
14 years 1 months ago
Safe Q-Learning on Complete History Spaces
In this article, we present an idea for solving deterministic partially observable markov decision processes (POMDPs) based on a history space containing sequences of past observat...
Stephan Timmer, Martin Riedmiller
PE
2011
Springer
215views Optimization» more  PE 2011»
13 years 2 months ago
Energy-aware routing in the Cognitive Packet Network
An energy aware routing protocol (EARP) is proposed to minimise a performance metric that combines the total consumed power in the network and the QoS that is speciļ¬ed for the ļ...
Toktam Mahmoodi
EMO
2005
Springer
107views Optimization» more  EMO 2005»
14 years 1 months ago
Multiobjective Water Pinch Analysis of the Cuernavaca City Water Distribution Network
Water systems often allow eļ¬ƒcient water uses via water reuse and/or recirculation. Deļ¬ning the network layout connecting water-using processes is a complex problem which involv...
Carlos E. Mariano-Romero, Víctor Alcocer-Ya...
ATAL
2007
Springer
14 years 1 months ago
Multiagent learning in adaptive dynamic systems
Classically, an approach to the multiagent policy learning supposed that the agents, via interactions and/or by using preliminary knowledge about the reward functions of all playe...
Andriy Burkov, Brahim Chaib-draa