Oslo and M

2008年のPDC(Professional Developers Conference)で、Microsoftはコードネーム「Oslo」と呼ばれるモデリングフレームワークを発表しました。その中には「M」と呼ばれるモデリング言語と「MGrammar」と呼ばれるDSL(Domain Specific Language)定義言語が含まれています。この技術の名残は、インターネット上のChannel 9などで見ることができます。Osloは、構造的な型システムを備えたDSLとMモデリング言語を使用して、サービス(Active Directoryなど)で使用・公開されるデータや、基盤となる実行プラットフォーム（SQLなど）上のマッピングをモデル化する機能を導入しました。

At the Professional Developers Conference (PDC) in 2008, Microsoft introduced a modeling framework code-named “Oslo”, including a modeling language called “M” and a Domain Specific Language (DSL) definition language called “MGrammar”. Remnants of this technology can be found on the interwebs, for example on Channel 9. Oslo introduced capabilities to model data used and exposed by services (such as Active Directory) and mapping on underlying execution platforms (such as SQL), using DSLs and the M modeling language starring a structural type system.

イベントを表現するために使用されるReaqtorのエンティティの型システムは、間接的にMの型システムに影響を受けています。「エンティティデータモデル」と呼ばれることもあるこの型システムは、元々は分散型グラフデータベースの取り組み（後述）で使用されていましたが、最終的にはReaqtorで使用されるデータモデルとなりました（Nuqleon.DataModelのアセンブリを参照）。今日、構造的な型付けは、JavaScriptなどの動的プログラミング言語に重ねられた型システムによって普及していますが、中でもTypeScriptが有名です。

Reaqtor’s type system for entities - used to represent events - was indirectly inspired by the type system in M. This type system, sometimes referred to as the “entity data model”, was originally used in a distributed graph database effort (see later), and ultimately ended up being the data model used by Reaqtor (see Nuqleon.DataModel assemblies). Today, structural typing has been popularized by type systems layered on top of dynamic programming languages such as JavaScript, most notably TypeScript.

2012年頃に技術が終了すると、Osloチームはサービス＆ツール事業に移行し、MongoDBやHadoopなどのNoSQLやビッグデータのトレンドを考慮しながら、データ処理の未来を定義するプロジェクトを開始しました。新しい組織の目的は、オンプレミスからクラウドベースのデータソリューション（SQL Azureなど）へと移行する中で、マイクロソフトにおけるデータの未来を描くことでした。SQL Serverは、SQL Server 2005 "Yukon" 以降、さまざまなデータやプログラミングモデルに対応してきました（XMLのサポート、BLOB用のFILESTREAMデータタイプ、ServiceBrokerキュー管理、データベース内でカスタムコードを実行するためのSQLCLR、空間データタイプ、リアルタイム通知用のNotification Servicesなど）。これと並行して、DryadLINQへの取り組みは、Microsoftのビッグデータ戦略の方向性を示すものでした。この作業の一部は、最終的にHDInsightの前身や、後のCosmos DBの作成につながりました。

Upon the demise of the technology around 2012, the Oslo team transitioned to the service and tools business where projects were started to define the future of data processing, considering upcoming trends around NoSQL and Big Data, such as MongoDB and Hadoop. The goal of the new organization was to sketch out the future of data at Microsoft, in the midst of the transition from on-prem to cloudbased data solutions (e.g. SQL Azure). SQL Server already had a history of adapting to a variety of other data and programming models starting in the SQL Server 2005 “Yukon” release (cf. XML support, FILESTREAM data types for blobs, ServiceBroker queue management, SQLCLR for running custom code in the database, spatial data types, Notification Services for real-time notification, etc.). In parallel, efforts on DryadLINQ provided a possible direction for the Big Data strategy at Microsoft. Some of this work ultimately led to the creation of predecessors to HDInsight, and much later Cosmos DB.

クラウド・プログラマビリティ・チームもこれらの会話に加わり、LINQ to Everythingのアジェンダを推し進めようとしていました。LINQをベースにした形式言語を用いて、大規模なデータ変換・分析サービスを構築するための新しいプログラミングモデルを模索していました。よく理解されている様々な演算子のセマンティクスを活用して、「Volume、Velocity、Variety」の異なる特性を持つデータストアをまとめることができる分散クエリ実行プランを構築しました。SQL Server組織内の既存の資産への接続を提供するために、私たちのワーキンググループは、SQL Server Integration Services（SSIS）デザイナーの上にクエリビルダを試作し、バックエンドでLINQ式ツリーを生成しました。上にデータフローデザイナー、左にデータソース、シンク、演算子が入ったツールボックス、下にデザイナーで描かれたロジックを表す（C#ベースの）コードエディターがあるスプリットビューのエディターを想像してみてください。

The Cloud Programmability Team chimed into these conversations, trying to push its LINQ to Everything agenda. One working group was nick-named the “data refinery” group, where we explored new programming models to build large-scale data transformation and analysis services using formal languages based on LINQ, leveraging the semantics of various well-understood operators to build distributed query execution plans that can glue together data stores with different “Volume, Velocity, Variety” characteristics. To provide a connection to existing assets in the SQL Server organization, our working group prototyped a query builder on top of the SQL Server Integration Services (SSIS) designer, emitting LINQ expression trees on the back-end. Imagine a split-view editor with a data flow designer on the top, a toolbox with data sources, sinks and operators on the left, and a (C#-based) code editor at the bottom, representing the logic drawn in the designer.

デザイナーのノードは、データソース（IQueryable<T>またはIQbservable<T>コレクションタイプとして公開されており、SQL Serverテーブル、Twitterストリームなど）、データシンク（IQbserver<T>の第一のバリアントを使用しており、SQL Serverテーブル、キューなど）、汎用のクエリ演算子（フィルタリング、プロジェクションなど）、または特殊なクエリ演算子（機械学習モジュールなど）のいずれかを表しています。実行プランナーは、様々な実行計画を評価するために、そのような変換のボリュームと速度のメトリクスを考慮して、どのデータソースがクエリ演算子の実行をサポートしているかを評価しました。この戦略の良い例は述語のプッシュダウンですが、我々のアプローチはあらゆるクエリ演算子に一般化されています。結果として得られるプランは、様々なデータソースに（サブ）式ツリーを送信することで構成され、残りの計算はデータ処理ノードに委ねられます。

Nodes in the designer represented either data sources (exposed as either IQueryable<T> or IQbservable<T> collection types, for example SQL Server tables, Twitter streams, etc.), data sinks (using a first variant of IQbserver<T>, for example SQL Server tables, queues, etc.), general-purpose query operators (such as filtering, projections, etc.), or specialized query operators (e.g. machine learning modules). The execution planner evaluated which data sources support execution of query operators, considering the volume and velocity metrics of such transformations, in order to weigh various execution plans. A good example of this strategy is predicate pushdown, but our approach was generalized to any query operator. The resulting plan effectively consists of sending (sub)expression trees to the various data sources, leaving the remainder computation in a data processing node.

これは多くの点でSSISとは大きく異なっていました。まず、プルベース（反復）およびプッシュベース（ストリーミング）のデータ取得をサポートし、さまざまなデータソースの速度の不一致に対応しました。第二に、様々なデータソースの多様性の不一致をサポートするために、データモデルのマッピングをサポートしています（ドキュメントデータベースなど、すべてがリレーショナルではありません）。第三に、すべてが単一のコンピュートノード上で実行されるSSISとは異なり、真の分散実行をサポートしているため、データをコンピュートに送るのではなく、逆にボリューム軸をサポートしています。

This was very different from SSIS in many ways. First, it supported pull-based(iterative) and push-based (streaming) data acquisition, dealing with the velocity mismatch of various data sources. Second, it supported mapping of data models to support the variety mismatch of various data sources (not everything is relational, e.g. document databases). Third, it supported truly distributed execution unlike SSIS where everything ran on a single compute node, thus shipping data to compute rather than the opposite in support of the volume axis.

Reaqtorにおける委任の概念は、リアクティブなクエリ式の一部を他のリアクティブサービスに押し下げたり、委任したりすることができるもので、この技術から生まれたものです。この概念については、IReactiveProcessing (IRP)の項で詳しく説明します。

The concept of delegation in Reaqtor, whereby parts of reactive query expressions can be pushed down or delegated to other reactive services, is a child of this technology. We’ll discuss this concept in more detail when talking about IReactiveProcessing (IRP) further on.

最後に、2012年にOslo/Mチームが吸収された後、拡張されたSQL Server組織とのオールハンドミーティングで、RxのIQbservable<T>サポートを使用した分散型リアクティブプログラミングの初デモンストレーションを行ったことは、この文脈で特筆すべきことだと思います。LINQ to Everything」と題して、従来のストアやイベントストリームのプロデューサを問わず、コードをデータに変換するという野望を実現するために、ユーザーのためのリアルタイム言語翻訳機能を備えたグループチャットサーバを構築しました。クライアントのコードは大まかには次のようになります。

One last thing worth mentioning in this context is a first demonstration of distributed reactive programming using the IQbservable<T> support in Rx during an all-hands meeting with the extended SQL Server organization after the absorption of the Oslo/M teams in 2012. To illustrate our “LINQ to Everything” ambitions to ship code to data, whether it’s a traditional store or an event stream producer, we built a group chat server with real-time language translation for users. The client code looked roughly as follows:

code:C#

IQubject<Message> ladyGagaChannel;

IQubscription subscription;

void Connect(string language)

{

ladyGagaChannel = chatServer.GetSubject<Message>("#ladygaga");

ladyGagaChannel.Select(m => new Message { Sender = m.Sender, Text = Translate(m.Text, "EN-US", language) }).Subscribe(m => txtChat.AppendLine(m.ToString()));

}

void Disconnect()

{

subscription.Dispose();

}

void SendMessage(string message, string language)

{

ladyGagaChannel.OnNext(new Message { Sender = CurrentUser, Text = Translate(message, language, "EN-US") });

}

これは、LINQクエリの式ツリーをサービスに配信し、スタンディングクエリとしてリモート実行する最初の社内デモでした。このシステムには、リモートアーティファクトの識別子、非同期プログラミングパターン、サービスのフェイルオーバーをサポートするための状態チェックポイント、信頼性の高いメッセージング、異なるコンピュートノード間でのクエリの分割、メタデータクエリなど、Reaqtorで使用されている今日のIRPプログラミングモデルに存在する様々なコンセプトが欠けていました。しかし、Reaqtorの本質は、Bart De Smet氏とWes Dyer氏がオールハンドミーティングのために数夜かけて構築したこの概念実証と同じです。

This was the first internal demo of shipping LINQ query expression trees to a service for remote execution as a standing query. The system lacked various concepts that are present in today’s IRP programming model used by Reaqtor, such as identifiers for remote artifacts, asynchronous programming patterns, state checkpointing to support failover of the service, reliable messaging, splitting of queries across different compute nodes, metadata querying, etc. However, the essence of Reaqtor is identical to this proof of concept which was built by Bart De Smet and Wes Dyer over a few nights for an all-hands meeting.

この概念実証を可能にするための作業として、式木シリアライザーの構築がありました。LINQ to Everythingに関する先行研究では、ドメイン固有のクエリ言語への式ツリーの変換をサポートする汎用フレームワークに焦点を当てていました。今回のケースでは、.NETを実行しているクライアントとサービスの間で式木をそのまま送信したいと考えました。最初の式木シリアライザの実装は、POCの目的に沿ってうまく機能しました。その後、Reaqtorの初期の頃に、Bartは「Bonsai」と呼ばれるより強力な式木シリアライザを作成しました。

Work to enable this proof of concept included building an expression tree serializer. Prior work on LINQ to Everything had focused on a general-purpose framework to support transpilation of expression trees to domain specific query languages. In this case, we wanted to send the expression tree as-is between a client and a service which were both running .NET. The initial expression tree serializer implementation worked well for purposes of the POC. Later, during the early days of Reaqtor, Bart made a more powerful expression tree serializer called “Bonsai”, which we’ll discuss later.