AWS記事_Runtime and computing cost for Oxford Nanopore basecalling

一文要約

GPUインスタンス、一番でなくても性能出る。並列実行もしたらなおよいのでは？（時間コスト的に）

原文

code:txt

When looking at the computing cost for basecalling per gigabase of DNA sequence it turns out that it is not always required to use high performance, multi-GPU instance types to perform basecalling at low cost. The first five ranks in the tables below include small instance types that are optimized for cost effective machine learning inference: a g5.xlarge instance can perform basecalling at equivalent cost compared to a p4d.24xlarge instance running Dorado v0.3.0 and no modification calling.

code:txt

Table 1 – Runtime and cost for basecalling the CliveOME 5mC dataset with Dorado across different EC2 instance types and calling without and with methylated bases per gigabase of DNA sequence and for a whole human genome (WHG) at 30X coverage (96 gigabases). The instance types are ranked by cost per WHG without modification calling (column with red border). Lowest cost ranked first. Numbers are for On-Demand pricing in the us-west-2 AWS Region. Cost effective basecalling is possible with smaller instance types such as the g5.xlarge, g5.2xlarge and g4dn.xlarge. *WHG = whole human genome at 30X coverage.

code:txt

Table 2 – Runtime and cost for basecalling the CliveOME 5mC dataset with Guppy. With Guppy the lowest cost is achieved with smaller instance types such as the g5.xlarge, g5.2xlarge, g5.12xlarge and g4dn.xlarge. These instance types rank before the most performant instance type, the p4d.24xlarge. Fields with “n/a” indicate test runs that failed. The causes of these failures could not be established before publication of this blog post and is still being investigated.

code:txt

Of course, the runtime for individual EC2 instances is much longer when basecalling is performed sequentially with smaller instance types. However, with the architecture presented in the “Architecture” section basecalling jobs can be executed as parallel batch jobs. The AWS Batch service allows horizontal scaling of basecalling to hundreds or thousands of instances. The same throughput as a p4d.24xlarge instance can be achieved with parallel execution across smaller instances, but at a lower cost.

For example, when parallelizing basecalling with Dorado and 5mCG calling across 25 g5.xlarge instances, basecalling of a whole human genome (30x coverage) can be performed with a runtime similar to a single p4d.24xlarge at a 12% lower cost — $21.28 vs. $24.14 for WHG with 5mCG calling. The ability to run Oxford Nanopore basecallers on ubiquitous smaller EC2 instance types and more fine-grained control of load balancing jobs across GPUs means that further cost savings can be achieved through utilization of Amazon EC2 Spot Instances.

翻訳

code:txt

DNA配列のギガベースあたりのベースコールの計算コストを見ると、低コストでベースコールを実行するためには、必ずしも高性能なマルチGPUインスタンスタイプを使用する必要はないことがわかります。以下の表の最初の5つのランクには、コスト効率の良い機械学習推論に最適化された小さなインスタンスタイプが含まれています。g5.xlargeインスタンスは、Dorado v0.3.0を実行し、修正コールを行わないp4d.24xlargeインスタンスと比較して、同等のコストでベースコールを実行できます。

https://scrapbox.io/files/65c5a81631297200251e93cd.png

code:txt

表1 - 異なるEC2インスタンスタイプでCliveOME 5mCデータセットをDoradoでベースコールし、DNA配列のギガベースあたり、メチル化塩基なしとメチル化塩基ありを30Xカバレッジ（96ギガベース）で全ヒトゲノム（WHG）コールした場合のランタイムとコスト。インスタンスタイプは、修正コールなしのWHGあたりのコストでランク付けされている（赤枠の列）。コストの低い順にランク付けされています。数字はus-west-2のAWSリージョンにおけるオンデマンド価格。g5.xlarge、g5.2xlarge、g4dn.xlargeのような小さなインスタンスタイプでは、費用対効果の高いベースコールが可能です。*WHG = 30倍カバレッジの全ヒトゲノム。

https://scrapbox.io/files/65c5a81be32feb002588843f.png

code:txt

表2 - CliveOME 5mCデータセットをGuppyでベースコールした場合のランタイムとコスト。グッピーではg5.xlarge、g5.2xlarge、g5.12xlarge、g4dn.xlargeのような小さなインスタンスタイプで最も低いコストが達成される。これらのインスタンスタイプは最もパフォーマンスの高いインスタンスタイプであるp4d.24xlargeの前にランクされています。n/a "のフィールドは、失敗したテスト実行を示す。これらの失敗の原因は、このブログ記事の公開前には特定できず、現在も調査中である。

code:txt

もちろん、小さいインスタンスタイプでベースコールを順次実行する場合、個々のEC2インスタンスの実行時間ははるかに長くなる。しかし、「アーキテクチャ」のセクションで紹介したアーキテクチャでは、ベースコールジョブを並列バッチジョブとして実行できる。AWS Batchサービスにより、ベースコールを数百、数千のインスタンスに水平スケーリングすることができます。p4d.24xlargeインスタンスと同じスループットを、より小さなインスタンス間での並列実行で、より低コストで達成できる。

例えば、25個のg5.xlargeインスタンス間でDoradoと5mCGコールによるベースコールを並列化した場合、ヒト全ゲノム（30倍カバレッジ）のベースコールを、1個のp4d.24xlargeと同様のランタイムで、12%低いコスト（5mCGコールによるWHGの24.14ドルに対して21.28ドル）で実行することができます。ユビキタスな小型EC2インスタンスタイプでOxford Nanopore basecallersを実行し、GPU間でジョブのロードバランシングをよりきめ細かく制御できることは、Amazon EC2スポットインスタンスの活用によってさらなるコスト削減が達成できることを意味します。

メモ

高性能なGPU使わなくdorado使えば、同等の性能が出る

表１：Dorado計算コストの比較表

ソフトウェア：Dorado

コスト：性能/お金

実行時間

*WHG = 30倍カバレッジの全ヒトゲノム。

表２：Guppyの計算コストの比較表

ソフトウェア：Guppy

コスト：性能/お金

実行時間

失敗したテストもある

まとめ

並列実行してスループットあげよう

GPUでもジョブ実行できまっせ。

詳しくはAWS記事_Architecture読んでね