Sciweavers

1210 search results - page 216 / 242
» Newton's method and its use in optimization
Sort
View
PKDD
2010
Springer
179views Data Mining» more  PKDD 2010»
13 years 5 months ago
Gaussian Processes for Sample Efficient Reinforcement Learning with RMAX-Like Exploration
Abstract. We present an implementation of model-based online reinforcement learning (RL) for continuous domains with deterministic transitions that is specifically designed to achi...
Tobias Jung, Peter Stone
ICMCS
2009
IEEE
98views Multimedia» more  ICMCS 2009»
13 years 5 months ago
Scalability of HTTP pacing with intelligent bursting
While streaming protocols like RTSP/RTP have continued to evolved, HTTP has remained a primary method for Web-based video retrieval. The ubiquity and simplicity of HTTP makes it a...
Kevin J. Ma, Radim Bartos, Swapnil Bhatia
CDC
2010
IEEE
112views Control Systems» more  CDC 2010»
13 years 2 months ago
Online Convex Programming and regularization in adaptive control
Online Convex Programming (OCP) is a recently developed model of sequential decision-making in the presence of time-varying uncertainty. In this framework, a decisionmaker selects ...
Maxim Raginsky, Alexander Rakhlin, Serdar Yük...
NECO
2011
13 years 2 months ago
Least-Squares Independent Component Analysis
Accurately evaluating statistical independence among random variables is a key element of Independent Component Analysis (ICA). In this paper, we employ a squared-loss variant of ...
Taiji Suzuki, Masashi Sugiyama
CSL
2010
Springer
13 years 7 months ago
Bayesian update of dialogue state: A POMDP framework for spoken dialogue systems
This paper describes a statistically motivated framework for performing real-time dialogue state updates and policy learning in a spoken dialogue system. The framework is based on...
Blaise Thomson, Steve Young