Scalable and Sustainable Deep Learning via Randomized Hashing
著者
Abstract
Current deep learning architectures are growing larger in order to learn from complex datasets. These architectures require giant matrix multiplication operations to train mil- lions of parameters. Conversely, there is another growing trend to bring deep learning to low-power, embedded de- vices. The matrix operations, associated with both training and testing of deep networks, are very expensive from a computational and energy standpoint. We present a novel hashing based technique to drastically reduce the amount of computation needed to train and test deep networks. Our approach combines recent ideas from adaptive dropouts and randomized hashing for maximum inner product search to select the nodes with the highest activation efficiently. Our new algorithm for deep learning reduces the overall compu- tational cost of forward and back-propagation by operating on significantly fewer (sparse) nodes. As a consequence, our algorithm uses only 5% of the total multiplications, while keeping on average within 1% of the accuracy of the original model. A unique property of the proposed hashing based back-propagation is that the updates are always sparse. Due to the sparse gradient updates, our algorithm is ideally suited for asynchronous and parallel training leading to near linear speedup with increasing number of cores. We demonstrate the scalability and sustainability (energy efficiency) of our proposed algorithm via rigorous experimental evaluations on several real datasets.
メモ
Deep Learningにおいて計算量や消費電力は課題
ボトルネックの行列の積を減少させるために計算量を減らす研究がされてきた
その多くは低次元の行列か低い正確性でのアップデートを使って進めている
提案手法
hashingベースのインデキシングのアプローチで学習とテストをするニューラルネットを提案
データベース分野でdropoutを活用して効率性を高めながら計算量とメモリオーバーヘッドを減少させた、リッチなrandomized sub-linear algorithmsの理論を活用 ニューロンをハッシュテーブルへとインデックスし局所的センシティブハッシングとして利用
このようなハッシュテーブルのニューロンはニューロンへの高い活性化関数をせたくでき、全てのニューロンの活性化関数を計算しなくて良く、計算時間をセーブできる
その上ニューロンのランダムなスパースアクティブ集合で勾配の更新は上書きされない
このような更新は非同期で並列更新のため