UCB1 - NISHIO Hirokazu's Scrapbox (Auto-translated from Japanese)

UCB1

1985

Tzu L. Lai and Herbert Robbins. Asymptotically efficient adaptive allocation rules, 1985

1995

Proc. Natl. Acad. Sci. USA

Vol. 92, pp. 8584-8585, September 1995

Statistics

Sequential choice from several populations

MICHAEL N. KATEHAKIS AND HERBERT ROBBINS

Rutgers University, New Brunswick, NJ 08903

Contributed by Herbert Robbins, May 4, 1995

ABSTRACT We consider the problem of sampling sequentially

from two or more populations in such a way as to

maximize the expected sum of outcomes in the long run.

Sample Mean Based Index Policies with O(log n) Regret for the Multi-Armed Bandit Problem

Rajeev Agrawal

Advances in Applied Probability

Vol. 27, No. 4 (Dec., 1995), pp. 1054-1078

2010

Jouini, W., Ernst, D., Moy, C. and Palicot, J., 2010, May. Upper confidence bound based decision making strategies and dynamic spectrum access. In 2010 IEEE International Conference on Communications (pp. 1-5). IEEE.

We suggest that Upper Confidence

Bound (UCB) algorithms could be useful to design decision

making strategies for SUs to exploit intelligently the spectrum

resources based on their past observations.

---

This page is auto-translated from /nishio/UCB1 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.