SGXTuner: Performance Enhancement of Intel SGX Applications via Stochastic Optimization

#LayerX_Newsletter 2021-03-26

タイトル: SGXTuner: Performance Enhancement of Intel SGX Applications via Stochastic Optimization

著者: Giovanni Mazzeo, Sergei Arnautov, Christof Fetzer, and Luigi Romano

掲載雑誌: IEEE Transactions on Dependable and Secure Computing

リンク: https://ieeexplore.ieee.org/document/9372883

TL;DR

SGX上のソフトウェアのパフォーマンスを最適化するために、SGX-extendedなlibcのパラメーターを自動で調整するツール（のアーキテクチャ）SGXTunerを提案

焼き鈍し法によりパラメーターを探索する

cipepser.icon 多峰性の最適化問題に落とし込める感じなんですかね？（後段を読んだ感じ、パラメタの数が多くて次元が大きそう）

SCONEで使われるsgx-muslの6つのパラメーターの最適化を実現

SCONEはIntel SGXで様々なアプリケーションを効率的に・安全に実行できるようにするプラットフォーム

※今回の研究は、SCONEのチームによるもの

Rustで実装、GitHubで公開されている

Memcached, Redis, Apache Web serverを対象とした実験で実際にパフォーマンスが改善されることを実証

以下それぞれ、domain expertのパラメタ設定からの改善幅と、ランダムな設定からの改善幅

Memcached: 11-12%, 45-39%

Apache Web server: 14-37%, 38-51%

Redis: 7%-5%, 13%-10%

Memcached/Apacheと違ってRedisはシングルスレッドなので、改善幅が小さい

Background

SGXのセキュリティはパフォーマンスとトレードオフである

要因は以下2種類のoverhead

i) secure worldとinsecure worldのコンテキストスイッチ

ii) 保護されたメモリのサイズを超えた場合のページング

近年発見されたサイドチャネル攻撃 (spectreとmeltdown) への対策でより遅くなっている

これらの攻撃は、CPUのspeculative execution (投機的実行) が要因

これらの攻撃への対策として、Intelによるマイクロコードのアップデートでは、投機的実行をなくしている

結果的にパフォーマンスが下がった(最大25%)

よってSGXのアプリケーションのパフォーマンスの最適化は大事だが、とても複雑

複雑な理由は、最適化のためにSGX-extendedのlibcライブラリのパラメーターを適切に設定しないといけないこと

このパラメーターセットが、コンパイラ最適化や、動的なメモリアロケーションや、enclaveのスレッドとenclave外のスレッドの同期など、パフォーマンスにおいて重要な部分を決める

cipepser.icon EPC sizeの設定やTCSのbound設定とかがパラメーターになるんですね。たしかに。

しかし、パラメーターの設定は膨大なパターンがある

よって、マニュアルで最適化しようとすると、専門家でも数年かかる様な仕事で、現実的では無い

Contributions

Architecture

https://gyazo.com/0b42ba5bfb67a6fdca80cf4d0ddc8140

The Control layer is where decisions are taken about the specific job, and in particular on:

i) configurations of parameters to be tested

ii) the assignment of a selected configuration to a Target application to be executed in a specific SGX Node taken from the pool

iii) the dispatch of a Workload generator to a specific node for the execution against the previously-chosen SGX node.

The Application layer, instead, encloses the applications driven by the Control layer to perform the tuning activity.

More precisely, these are: i) the sensitive Target application (e.g., NGINX, Apache, Memcached, or MySQL)

ii) and the Workload generator such as a benchmarking tool (e.g., wrk, memaslap, or twemperf) that is chosen based on the target application.

Tthe Infrastructure layer is used for the deployment of the Target and Workload applications on the physical machine nodes.

Paramerters

今回の研究では、SGX-extended libcとしてはSCONE's sgx-muslを対象としている

i) ETHREADS and STHREADS

Enclave内で実行されるOSのスレッドの数と、Enclave外で実行されるシステムコールのスレッド数

ii) ESPINS and SSPINS

それぞれ、リクエスト・レスポンスのキューから要素をデキューするためにスレッドが何回試行するか

iii) ESLEEP and SSLEEP

それぞれ、Enclave内・Enclave外のスレッドが要素のデキューの試行が一定回数失敗したときに（CPUの無駄な消費を防ぐため）スリープする時間

Algorithm

補足: 焼き鈍し法

https://gyazo.com/f244c1176d84e67efcc87fa9879d85fe

引用元: 大人になってからの再学習 2012-08-25 焼きなまし法

以下の4種類の焼き鈍し法

SEQSA: standard simulated annealing solver

searches for the best solution in a sequential way.

SPISA: parallelized version of the simulated annealing

the different worker machines explore in parallel a specific set of neighborhoods composed by independent configurations and periodically exchange information.

MIPS: additional parallelized version of the solver

starts from different initial parameter configurations and executes multiple indipendent workers, which don't need to exchange information except for the final comparison of worker results.

PRSA: "Parallel Recombinative Simulated Annealing" algorithm

combination of the Genetic Crossover algorithm and Simulated Annealing

Results

最適化の進捗の様子

https://gyazo.com/554e7cf0b448aefff1e12c42a7b70aa0

各アルゴリズムで得られたパラメーター

https://gyazo.com/69fcee4498a21ec4254eb10a7904c338

パラメーターの重み

nrryuya.icon > これを見ることで、アプリケーションごとの設定の影響がわかるのは興味深い

https://gyazo.com/3191f79c2296865fb8aa99980236fd6b

Redisのようなsingle-threadedなアプリケーションでは、ETHREADS, STHREADS, SSPINSの影響が大きい

MemcachedやApacheのようなmulti-threadedなアプリケーションでは、STHREADS, ESLEEP, ESPINSの影響が大きい

備考

application-relatedやcompiler-relatedのパラメーターにも同様のアプローチを適用することで、さらなるパフォーマンス改善が得られる