9fc872a2a6e4,025 - nishio-a2

9fc872a2a6e4025

http://nhiro.org.s3.amazonaws.com/c/2/c2ab3b5705a192847f753567bab3b3ff.jpg https://gyazo.com/c2ab3b5705a192847f753567bab3b3ff

(OCR text)

アルゴリズム

Algorithm 1: Adam, our proposed algorithm for stochastic optimization. See section 2 for details,

and for a slightly more efficient (but less clear) order of computation. g indicates the elementwise

square gt gt Good default settings for the tested machine learning problems are a 0.001

= 0.9, 62

0.999 and e 10-. All operations on vectors are element-wise. With B and B

we denote B1 and B2 to the power t

Require: a: Stepsize

Require: B1, B2 e [0, 1): Exponential decay rates for the moment estimates

Require: f(0): Stochastic objective function with parameters

Require: 00: Initial parameter vector

mo0 (Initialize 1st moment vector)

vo0 (Initialize 2nd moment vector)

t 0 (Initialize timestep)

while 0t not converged do

tt1

9tVoft(0t-1) (Get gradients w.r.t. stochastic objective at timestep t)

mtB1 m-1(1- B1) ge (Update biased first moment estimate)

tB2 Ut-1 + (1 - B2) g (Update biased second raw moment estimate)

mt tmt/(1-Bi) (Compute bias-corrected first moment estimate)

Vtvt/(1 B) (Compute bias-corrected second raw moment estimate)

0,0-1-a m/(Ve) (Update parameters)

end while

return 0t (Resulting parameters)