ページモデルの起源
ページモデル(The Page Model)では,すべての処理が$ r, w, c, aにコンパイルされる.
Concurrency Controlを理論付けるにあたって,あらゆるトランザクションがこの4要素の集合に還元される,という事実はとても重要だ.
が,なぜこのような定義になったのか?ということについて厳密に説明している文献が見当たらないのが気になった.
そこでちょっとサーベイした.
結論として On Concurrency Control By Multiple Versions という論文の整理が簡潔だった.得られた知見は トランザクションのモデル にまとめた.
以下はサーベイ中のメモ.
起源を探る
Transactional Information Systems p57 Bibliographic Notesによると,
The methodology for devising a transaction model as described in Section 2.2 essentially follows Vossen(1995).
The major credit for the (read/write) page model of transactions belongs to the 1998 Turing Award winner Jim Gray;
see, especially, Gray(1978, 1981) and Eswaran et al.(1976).
Vossen 1995: https://link.springer.com/content/pdf/10.1007%2FBFb0015267.pdf
これはTransactional Information Systemsの著者Gottfried Vossen.
ここでは厳密にRead/Write modelが定義されている.
まさか1995までこのあたりの定式化はされていなかったのだろうか?
Serializabilityを定式化するにはこのnotationは必ず必要なので,それはないはずだ.
Jim Gray 1978: https://link.springer.com/chapter/10.1007/3-540-08755-9_9
Gray 1981: https://jimgray.azurewebsites.net/papers/theTransactionConcept.pdf
Eswaran et al 1976: http://daslab.seas.harvard.edu/reading-group/papers/astrahan-1976.pdf
At the logical level, such classic situations as the “lost update” problem must be handled to insure that two concurrent transactions do not read the same value and then try to write back an incremented value.
System Rに関するGray, Eswaranらの論文は基本的に「Logical Levelでは全クエリをRead/Writeにコンパイルする」ことを前提としている,ので,このあたりが確かに major credit かもしれない.
しかし,理論的な定式化はあきらかに不十分だ.
さらに同じParagraphの続きを読むと:
The page model transaction concept became the subject of intensive theoretical studies, in particular in the work of Christos Papadimitriou and, independently, Philip Bernstein et al. around 1980; see, for example, Papadimitriou(1979, 1986) as well as Bernstein et al.(1979, 1987).
Papadimitriou 79: The Serializability of Concurrent Database Updates
Sec 2. Definitions and Notationで記述がある.
$ h = R1(x)R2W1(x, y)R3(x)W2(x, y) W3(y)
といった書き方になっている.少なくともR/Wに全命令をコンパイルする,ことは自明のよう.
We can think of each transaction T~ as starting with an instantaneous reading of the
values in the variables m S(R,), performing a possibly lengthy local computation, and then
instantaneously recording the results in a different set S(W,) of variables. We do not look
into the details of the exact nature of the local computation.
個々の命令はR/Wにコンパイルされるが,それが現実にどのような処理であるかは議論しない.
Bernstein, et al. 79: Formal Aspects of Serializability in Database Concurrency Control
現代まで残っているPage model notationがそのまま使われている.
まあ著者Bernsteinだから当たり前か? この論文-> Bernstein本 -> Weikum本という流れで伝播しているだろう.
$ readset_i, writeset_iといった形でRead/Write Setを定義しているのが面白い.
Each transaction $ T_i can execute one read operation $ R_i and one write operation $ W_i.
We define two functions,
, writeset: $ \{T_i | i = 1, 2, ...\}\to 2^D, which specify, for each transaction, the portion of the database read and written.
ここでいう$ Dというのは,
A database $ D is a set of distinct data items $ \{x_1, ..., x_m\}.
のこと.$ 2^Dってなんだろう? #分からないこと
冪集合のことを指す.
The actual computation performed by a transaction can be quite complex and difficult to analyze.Therefore, it is common practice to ignore the exact nature of the computation and treat the computation as a collection of function symbols open to arbitarary interpretations.
これと同じような文章はConcurrency ControlのNotation/Definitionを行うほぼすべての論文で見る.
つまり,「具体的に何が起きているかは問題にせず,R/Wにコンパイルされるものとする」.
足元が不安定な感じがある・・・けどまあいいのか.
Papadimitriou 1986: Theory of Database Concurrency Control
Bernstein 1987: Bernstein本のこと.
起源を探る2
OCCの生みの親Kungの79年の論文An Optimality Theory of Concurrency Control for Databasesでは,トランザクションを$ T_i,T_j...T_n,各命令(step)を$ T_{ij},T_{il},...T_{im}としている.read/write stepは定義されていない.
A log models the interleaved execution of a set of transactions. It corresponds to a "schedule" in 7 and a "history" in 11.
7: Gray et al. 76. The Notions of Consistency and Predicate Locks in a Database System
A transaction is a sequence1: $ T = (( T, a_i , e_j))^n_{i=1}of n steps where T is the transaction name, a is the action at step i and ei is the entity acted upon at step i
という表記でトランザクションを扱っている.内容は2PLに関するもの.
$ T_2 = ((T_2 , lock, e_k), (T_2 , read, e_k), (T_2 , write, e_k), (T_2 , unlock, e_k))
と表記される.つまり$ aにあたる action は自由であり,read/write stepのdefinitionがない.
11: 発見できず
起源を探る3
もうひとつの名著Concurrency Control And Recovery In Database Systems(Bernstein本)をたどる.p41:
Virtually all rigorous treatments of concurrency control use some form of the history model of executions.
See Bernstein, Shipman, Wong 79, Gray et al. 75, Paradimitriou 79, and Stearns, Lewis, Rosenkrantz 76.
An extensive treatment of serializability theory apperas in Paradimitriou 86.
Gray et al 75: GRANULARITY OF LOCKS IN A SHARED DATA BASE
ロックの粒度(Granularity)に関する話...に見える.関係ない?
Stearns, Lewis, Rosenkrantz 76: CONCURRENCY CONTROL FOR DATABASE SYSTEMS
From its external behavior, the process can be regarded as a sequence of read and write requests followed by a terminate request.
ここでdata operationとtermination operationにcomipleされることをハッキリ言っている.
さらに詳しい定義(要件)もある
Intermediate versions:
At each point in the running of a process, the concurrency control associates with certain entities, an intermediate version of that entity for that process. When a process is started, no entities have intermediate versions.
After the process performs a read or write request, the latest version it saw or created is its current intermediate version.
Granting a Write Request:
Make the version of the entity supplied by the process the intermediate version of that entity for that process.
Granting a Read Request:
(a) If it is the first request for that entity by that process, supply a valid version of that entity to that process, and make that version the intermediate version of that entity for that process.
The entity and its version is said to be part of the database substate seen by the running of the process.
(b) If it is not the first request for that entity by that process, supply the intermediate version of the requested entity to the process.