Sciweavers

451 search results - page 37 / 91
» Performance evaluation with temporal rewards
Sort
View
ATAL
2010
Springer
13 years 8 months ago
Closing the learning-planning loop with predictive state representations
A central problem in artificial intelligence is to choose actions to maximize reward in a partially observable, uncertain environment. To do so, we must learn an accurate model of ...
Byron Boots, Sajid M. Siddiqi, Geoffrey J. Gordon
CVPR
2009
IEEE
14 years 2 months ago
Contextualizing histogram
In this paper, we investigate how to incorporate spatial and/or temporal contextual information into classical histogram features with the aim of boosting visual classification p...
Bingbing Ni, Shuicheng Yan, Ashraf A. Kassim
ATAL
2010
Springer
13 years 8 months ago
PAC-MDP learning with knowledge-based admissible models
PAC-MDP algorithms approach the exploration-exploitation problem of reinforcement learning agents in an effective way which guarantees that with high probability, the algorithm pe...
Marek Grzes, Daniel Kudenko
AIPS
2009
13 years 8 months ago
Using Distance Estimates in Heuristic Search
This paper explores the use of an oft-ignored information source in heuristic search: a search-distance-to-go estimate. Operators frequently have different costs and cost-to-go is...
Jordan Tyler Thayer, Wheeler Ruml
NCA
2008
IEEE
13 years 7 months ago
Neurodynamic programming: a case study of the traveling salesman problem
The paper focuses on the study of solving the large-scale traveling salesman problem (TSP) based on neurodynamic programming. From this perspective, two methods, temporal differenc...
Jia Ma, Tao Yang, Zeng-Guang Hou, Min Tan, Derong ...