The Theory of Database Concurrency Control

Christos H. Papadimitriouの本

読書会のメモ

1. Introduction

この章はヒキと見せかけて定義を垂れ流しているので注意して読まなければならない．

メモりながら拾っていく

1.1 Databases and Concurrency

DatabaseとDatabase Management System

PhysicalとLogical data structure

entitiy: 他の文献ではdata itemとかtupleとか呼ばれる最小構成単位

state: a state is an assignment of values to the entities.

integrity constraints: real-world restriction

consistent state: integrity constraintsを満たすstate

state transition: stateが変化すること(つまりcurrent stateとかthe latest stateというのがある)

すべてのentitiyの更新がatomicにできないので，inconsistent stateを作るtransitionは当然ありうる．

つまり一つのトランザクションが複数のstate を作ってtransitionする

# まだtransactionというtermは出てきてなかった．a program generates multiple statesぐらいか

integrity constraintの一種としてserializabilityが(この後)出てくる．

integrity constraintを守るアルゴリズムがconcurrency control.

1.2 The Model

entities: $ E = \{x_1, x_2, ...\}．denotes $ x, y, ...

entitiesはinfinitely many だが at any instant, only a finite number of entities can be relevant.

entitiyはsingle atomic stepで更新される

partitionであったり，physicalにどういう構造をとっていようと，atomicに更新できる最小単位ならentityとする

domain: $ D_x. an enumerable set. each entity $ x \in E has associated with $ D_x.

Cartesian product of domainsを a database stateという．

i.e., a mapping from each entity $ x to a value in $ D_x.

The integrity constraints $ C: subset of cartesian product.

database step: programがdatabaseにcommunicateするときの単位

read: assign the current value of the entity

write: change the current value of the entity

change! insertはどういう扱い？

いわゆるinitial transaction($ T_0)がある？

あると考えると，$ Eの要素数は常に固定となる

だからinifinitely manyなんだけれど，at any instantではfinite number?

transaction: sequence of database steps resulting from the execution of a program.

transactionにcontrol structureはない．straight-line programである．

ちょっとおもしろい

A consequence is that there can be no "hidden restrictions" on inter-transaction behavior.

For example, if correctness requires that steps from two "transactions" are executed in some predifned order, then these two "transactions" are in fact a single transaction.

Definition 1.1

A transaction $ A is a finite sequence of steps $ A = a_1a_2 ... a_n.

...

With each step $ a_j of $ A we associate an action $ ACTION(a_j) \in \{W, R\} and an entity $ ENTITY(a_j) \in E.

...

If $ ACTION(a_j) = W and $ ENTITY(a_j) = x, then the step writes $ x.

WIP.ちょっと引用しきれない

Definition 1.2

#TBW

InterpretationとSemanticsの話( #TBW )

Schedules

schedule:

An interleaved execution of several transactions

Formally, a schedule of the transactions $ A_1, A_2, ..., A_k is a sequence of steps in the shuffle $ A_1 * A_2 * ... A*k of the transactions (recall that the shuffle of two or more sequences consisting of distinct elements is the set of all sequences that have the given sequences as subsequences, and contain no other elements.

Definition 1.3

A schedule, together with its interpretation, is tantamount to a real computation that acts on the database.

The way to formalize this is the following: Suppose that $ s is a schedule, $ I an interpretation, and $ X an initial state, that is, a value for each entity. We say that every database step $ a of $ s computes a value $ a_s(I,X).

In particular, if $ a is $ R(x), then $ a_s(I,X) is defined to be equal to $ w_s(I,X), where $ w is the latest $ W(x) step before $ ain the schedule; if no such write step exists, then $ a_s(I,X) is the value of entity $ x in the state $ X.

If $ a is a $ W(x) step, then let $ r^1, ..., r^k be the read steps in the same transation that preceed $ a.

Then, $ a_s(I,X) is defined as $ f_a(r^1_s(I,X), ..., r^k_s(I,X)).

The final state resulting from the execution of $ s, denoted $ s(I,X), is the state that has as value for each entity $ x the value computed by the last $ W(x) step of $ s; if no such step exists, then naturally entity $ x has in $ s(I,X) the same value as in $ X. []

エルブランセマンティクス(Herbrand Semantics)と同じようなことをいっている

Blind write: k=0の$ W(x) step

つまり，関数としてXもIも使わなくなる

過去のr/wに依存して読み書きされるというInterpretationの原則がBWには当てはまらない

correct

Recall that each transaction is correct, that is, if run in isolation it is guaranteed to map consistent database states to consistent database states.

Call a schedule correct if it has the same property. (after all, a transaction is a special case of a schedule!).

transactionをcorrect(consistent to consistent)に記述できて，全トランザクションがそうで，isolatedなら，scheduleはcorrect

よってserial scheduleはcorrect

consistentとはなにか？というのはユーザ(アプリケーション)の責務

each transaction is correct でなければそもそもcorrectにならない．当たり前だがプログラマにとっては難しい

Schedulers

The scheduler receives as input a stream of user requests for the execution of database steps, originating from different transactions (Figure1.1).

The scheduler grants or delays the arriving requets.

The output of the scheduler is another schedule of the same transactions;

it is executed by the rest of the database system.

4章でまじめに定義される.

Schedulerがcorrectかどうかは議論の余地はもうない

たとえばserialになるようにdelayするschedulerがあれば議論終了

しかし性能のためにはconcurrencyを高め，parallelismを上げていきたい

schedulerのparallelismをmathmaticalに定義するのを５章でやる

先にちょっと読んだ(p143)

Let $ Hdenote the set of all possible schedules, and let $ S be a scheduler.

Certain schedules in $ H have the following property: When they are submitted, step by step, as inputs of the scheduler $ S, none of their steps is delayed by the scheduler--that is, none fails the test of $ S.

Call this subset of $ H the concurrency of $ S, denoted $ C(S).

This set is our measure of the performance of $ S. If two schedulers $ S and $ S' satisfy $ C(S) \subset C(S'), then we say that $ S' "has more concurrency than $ S."

より多くのscheduleをacceptできるschedulerがconcurrencyが高い．

違う．より多くのscheduleを delayなしで acceptできるschedulerが concurrencyが高い．

schedulerの定義が，input scheduleにdelayを混ぜてoutput scheduleを作るというものになってる．

うーん，そうではなくて，「あるscheduleを実行するときのparallelism」もまた指標が要ると思うけどな

Appendix

メモしきれないので #TBW

inefficient == exponential function

3SAT(3充足性決定問題)

計算複雑性理論における重要な問題の一種として、連言標準形の論理式を「真」にする各変項の真偽の組合せを問う問題がある。これを充足可能性問題(SAT)という。変項が3種類の 3-SAT はNP完全問題（3変項以上のSATは全てNP完全）だが、2-SAT は多項式時間で解く事が出来る。

https://ja.wikipedia.org/wiki/連言標準形

３つのliteralsからなる和積形を考える．充足可能かどうか．

パタへネ本で積和形

あるinstance(literalsの入力)が真であるかはすぐ考えられる( <=> 3SAT in NP)が，すべてのinstanceについて真/偽か言い切るにはexponentialになる( <=> not P)と考えられる(polynominal timeで3SATが解けないという証明はない. P = NP problem)

3SAT is NP-Complete(S.A.Cook, 1971)

A->Bがpolynomialで還元可能ならばAとBのclassは変わらない

すべてのNP problemは3SATから還元可能

2. Correctness

Necessarily, the concurrency control must be application independent,...

we do not even know the integrity constarint

Usually what we know about the integrity constraints is that application programs preserve them!

Certainly that any serial schedule is correct ...

2.2 Final State Serializability

augumented schedule: given schedule $ sに$ T_0, T_\inftyを加えたもの

Ex 3.1

メモ

トランザクションの定義は命令列なので，全順序がある

エルブランセマンティクス(Herbrand Semantics)とか，この本におけるInterpretation($ I)は，「あるwrite step $ w_i(x_i)は，そのトランザクション$ T_iのすべてのread stepに依存する」という定義で，これがMVSR, VSR, FSRの根底にある概念．

that is, MVSR,VSR, FSRのvalidationは，write stepについて考えるとき，そのTxがread stepで何を読んだか？だけを気にする．

つまり同一txn内の r, wの順序はpreserveされるが，w, wの順序についてはpreserveされない

まあ，当たり前か

「xが書けたらyを書く」みたいな順序をプログラマが想定していると破綻するか？

2.3 The Critique of Final state serializability

FSRでは不十分

いくつかFSRのalternatives/expansionを考える．以下のSemanticsをエルブランのreplace/extensionあるいはcritiqueとして考慮する

Commutativity:

Identity: 恒等写像のようなトランザクション．r(x) w(y)がただのコピーみたいな．これは順序の制約をゆるくできる

Integrity constraints: Final stateの値に対する制約さえあればよい？ e.g. unsigned

値に対する制約はエルブランにはないので．新たなsemanticsとして持ち出してもいいし，FSRに加えてもいいし

dead steps / transaction: final stateに影響を与えていないstep/txnを除去したserial scheduleとequivalentならOKとする

deadの定義は，p28参照．read only transactionなら全step dead.

これFSRの定義とどう違うのか？

FSRの定義から「both schedules consist of the same transactions」を抜いてる

Infinite schedules: finite stateがserializableなprotocolだとしてもinfinite scheduleには使えないよね

Strict Serializability: step orderとserialization orderのinversionが起こってはいけない，という制約

serialization orderに何かしらの制約を与えるという点では Linearizabilityに似ている

というかCombining serializability and Linearizability yields strict serializabilityとPeter Bailisも言うてる

Incorrect Views: いわゆる Anomalyがあるという話

2.4 View Serializability

Def of VSR, Theorem, Proof

Weikumと同じなのでメモ飛ばす

Def 2.5 a free schema

a finite tree with predicates and steps

it is the control skeletons of loop-free programs

乱数がfree schemaに入っていたら？

出た値は初期値Xに入っている(x \in E)派

入らない派

2.5 The complexitiy of View Serializability

久しぶりに参加した

p40