UCB1
1985
Tzu L. Lai and Herbert Robbins. Asymptotically efficient adaptive allocation rules, 1985
1995
Proc. Natl. Acad. Sci. USA
Vol. 92, pp. 8584-8585, September 1995
Statistics
Sequential choice from several populations
MICHAEL N. KATEHAKIS AND HERBERT ROBBINS
Rutgers University, New Brunswick, NJ 08903
Contributed by Herbert Robbins, May 4, 1995
ABSTRACT We consider the problem of sampling sequentially
from two or more populations in such a way as to
maximize the expected sum of outcomes in the long run.
Sample Mean Based Index Policies with O(log n) Regret for the Multi-Armed Bandit Problem
Rajeev Agrawal
Advances in Applied Probability
Vol. 27, No. 4 (Dec., 1995), pp. 1054-1078
2010
Jouini, W., Ernst, D., Moy, C. and Palicot, J., 2010, May. Upper confidence bound based decision making strategies and dynamic spectrum access. In 2010 IEEE International Conference on Communications (pp. 1-5). IEEE.
We suggest that Upper Confidence
Bound (UCB) algorithms could be useful to design decision
making strategies for SUs to exploit intelligently the spectrum
resources based on their past observations.
---
This page is auto-translated from /nishio/UCB1. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.