9fc872a2a6e4025
http://nhiro.org.s3.amazonaws.com/c/2/c2ab3b5705a192847f753567bab3b3ff.jpg https://gyazo.com/c2ab3b5705a192847f753567bab3b3ff
(OCR text)
26
アルゴリズム
Algorithm 1: Adam, our proposed algorithm for stochastic optimization. See section 2 for details,
and for a slightly more efficient (but less clear) order of computation. g indicates the elementwise
square gt gt Good default settings for the tested machine learning problems are a 0.001
= 0.9, 62
0.999 and e 10-. All operations on vectors are element-wise. With B and B
B1
we denote B1 and B2 to the power t
Require: a: Stepsize
Require: B1, B2 e [0, 1): Exponential decay rates for the moment estimates
Require: f(0): Stochastic objective function with parameters
Require: 00: Initial parameter vector
mo0 (Initialize 1st moment vector)
vo0 (Initialize 2nd moment vector)
t 0 (Initialize timestep)
while 0t not converged do
tt1
9tVoft(0t-1) (Get gradients w.r.t. stochastic objective at timestep t)
mtB1 m-1(1- B1) ge (Update biased first moment estimate)
tB2 Ut-1 + (1 - B2) g (Update biased second raw moment estimate)
mt tmt/(1-Bi) (Compute bias-corrected first moment estimate)
Vtvt/(1 B) (Compute bias-corrected second raw moment estimate)
0,0-1-a m/(Ve) (Update parameters)
end while
return 0t (Resulting parameters)