ML Opsなメモ - masumi::open-note

ML Opsなメモ

Hidden Technical Debt in Machine Learning Systems関連のエントリをいくつか読んだ。

社会実装の話とmercariのパターンも目を通した。

開発でのアンチパターン

Depending on the Data Science Unicorn (Super hero)

Black box scenario

Technical Debt and Machine learning Systems

Boundary erosion

Entanglement

Hidden Feedback loops

Dependency Debt

Unstable Data dependencies

Underutilized Data Dependencies

最初の2つはプロジェクトのアンチパターンで分断しすぎてだれも全体像を掴めないのも誰かに依存しすぎるのもだめよって話。

また、データに依存するのでコードの依存が当初の想定より広がりやすく更にフィードバックループがあることでコンポーネント境界が曖昧になりやすい。あと使われていたり変更されるデータに依存するので不安定であったり計測が困難になる。

デプロイでのアンチパターン

Lack of ML lifecycle management

KubeFlow, AWS Sagemakerを試してみる必要がありそう。

To correct this, my suggestion would be for the Data Science managers to ask the team to follow a framework, like KubeFlow, KubeFlow Pipelines, AWS Sagemaker or GCP’s CloudAI.

Data Validation

データに基づいたシステムなのでデータについて厳格にするべき。TFXのデータ検証について学ぶ必要がありそう。

下の3つが提供されている。スキーマの推測でプラグイン使っても良いからURLエンコードされたクエリパラメタとかを推測できたりしないだろうか、できると嬉しい。

記述統計の計算

スキーマを推測

データ異常を検出

Training Serving skey

サービス提供中に傾向が変わって精度が落ちてしまう問題。TFXやTFMAをつかって傾向の変化について監視するなどの運用がだいじ。これを読めとのこと。

A discrepancy between how you handle data in the training and serving pipelines.

A change in the data between when you train and when you serve.

A feedback loop between your model and your algorithm.

そのほかのガイドライン

ヒューリスティックから入るのを恥じるな: Don’t be afraid of starting with heuristics

最初に目的と指標を定義しろ: First define objective and metrics

複雑なヒューリスティックよりもMLを選べ: Choose ML over complex heuristics

最初はシンプルなモデルでインフラに集中しろ: Keep the first model simple and focus on infrastructure

MLから独立してインフラをテストしろ: Test the infrastructure independently from ML

ヒューリスティックを特徴に変換しろ: Turn heuristics into features

MLモデルのライフタイムに注意しろ: Be mindful of ML model life (freshness of model)

モデルを公開する前にカナリアテストしろ: Canary test before exporting model

Use proxy objectives instead of direct ones

解釈可能なモデルから開始する: Start with interpretable models

いつもデータ濾過(データクレンジング)の方針を忘れない: Always keep in mind the policy layer filtering (pre and post)

ローンチと更新を計画する: Plan to launch and iterate

最初のレイヤーの特徴量はヒューマンリーダブルに: Make the first layer features understable for humans

モデルの差を計測する: Measure the delta between models

Utilitarian performance trumps predictive performance

Training Serving skeyに注意深く: Be wary of long-term and short-term behavior; training-serving skew.

サービングとトレーニングでは同じコードを使う: Use same code for serving and training

トレーニングと検証は別のデータ使うよね: If you train model on the month of June, test on data for month of July.

Avoid feedback loops with positional features

Ref

Machine learning in Production: Anti-patterns

Hidden Technical Debt in Machine Learning Systems

Rules of Machine Learning:Best Practices for ML Engineering

mercari ml-system-design-pattern

GCP 認定データエンジニア

機械学習を「社会実装」するということ