Local Differential Privacy by programming DP

https://programming-dp.com/notebooks/ch13.html

Randomized Response

コインをflip

表であれば質問に正しく答える

裏であればもう一度コインをflip

表であればYes, 裏であればNo と答える

上記のrandomized responseアルゴリズムは、ε=log(3)=1.09でε-differential privacyを満たす (?!)

code:rand_resp_sales.py

def rand_resp_sales(response):

truthful_response = response == 'Sales'

# first coin flip

if np.random.randint(0, 2) == 0:

# answer truthfully

return truthful_response

else:

# answer randomly (second coin flip)

return np.random.randint(0, 2) == 0

salesで働く200人がrandomized responseに答えると結果は以下のようになる。

code:res_rand_resp.py

pd.Series(rand_resp_sales('Sales') for i in range(200)).value_counts()

True 151

False 49

dtype: int64

実際の、US Census data setを使って、それぞれに“is your occupation ‘Sales’?”のレスポンスを上記でエンコードする。

実際にデプロイされるシステムでは質問の答えを直接集めるのではなく、ローカルで rand_resp_salesを実行しdata curatorにrandomized responseをsubmitする。

code:us_censu_rand_resp.py

responses = [rand_resp_sales(r) for r in adult'Occupation']

pd.Series(responses).value_counts()

False 22550

True 10011

dtype: int64

このデータセットにおける参加者の大多数はsalesではないので結果の傾向は合っている。

それでは、どうやってこのレスポンス結果からデータセットの実際のsalespeopleの人数を推測できうるか。結果は、6500人程度増えてしまっている。

code:true_salespeople.py

len(adult[adult'Occupation' == 'Sales'])

3650

randomized responseアルゴリズムより、1回目が裏で2回目が表のケースでfake trueなので全体の1/4の確率で

ただし、1回目が裏で2回目が表でも本当にsalespeopleである人もいるので、２つのグループに分割すると考えて、

3895.5 を推定。

百分率誤差は6.726027397260275

Unary Encoding

yes/no questionではなく、ヒストグラムのローカル差分プライバシーを計算したいケース。

Optimizing Locally Differentially Private Protocolsで紹介されているヒストグラム系のローカル差分プライバシーで最もシンプルな手法

GoogleのRAPPORシステムとベースとなっているアプローチ。

以下の３つのアルゴリズムに分けられる。

encode

perturb

aggregate

encode

one-hot encoding

code:encode.py

def encode(response):

return 1 if d == response else 0 for d in domain

encode('Sales')

0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0

perturb

差分プライバシーを満たしつつresponse vectorのbitsをflipする

εによって決定するパラメタp, qを用いて、以下の確率でflip

https://gyazo.com/899a837d48cdca2d26f307b6cbdd71ff

code:perturb.py

def perturb(encoded_response):

return perturb_bit(b) for b in encoded_response

def perturb_bit(bit):

p = .75

q = .25

sample = np.random.random()

if bit == 1:

if sample <= p:

return 1

else:

return 0

elif bit == 0:

if sample <= q:

return 1

else:

return 0

perturb(encode('Sales'))

result: [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0]

εは以下のように定義

https://gyazo.com/313e1f3f65011b1a4a4c34431ef93009

p=0.75, q=0.25の場合、ε=2.1972245773362196

aggregation

もし、perturbしていない場合はone-hot encodingしたベクトルを要素ごとに足していくので、

code:1.py

counts = np.sum([encode(r) for r in adult'Occupation'], axis=0)

list(zip(domain, counts))

code:re

[('Adm-clerical', 3770),

('Exec-managerial', 4066),

('Handlers-cleaners', 1370),

('Prof-specialty', 4140),

('Other-service', 3295),

('Sales', 3650),

('Craft-repair', 4099),

('Transport-moving', 1597),

('Farming-fishing', 994),

('Machine-op-inspct', 2002),

('Tech-support', 928),

('Protective-serv', 649),

('Armed-Forces', 9),

('Priv-house-serv', 149)]

perturbすることでfakeが加わるので、

code:2.py

counts = np.sum([perturb(encode(r)) for r in adult'Occupation'], axis=0)

list(zip(domain, counts))

code:res

[('Adm-clerical', 10036),

('Exec-managerial', 10227),

('Handlers-cleaners', 8815),

('Prof-specialty', 10160),

('Other-service', 10017),

('Sales', 10073),

('Craft-repair', 10250),

('Transport-moving', 8977),

('Farming-fishing', 8722),

('Machine-op-inspct', 9203),

('Tech-support', 8674),

('Protective-serv', 8374),

('Armed-Forces', 8109),

('Priv-house-serv', 8286)]

パラメタp, q, レスポンス数n を考慮にいれ集計する

https://gyazo.com/6cedaf241a69d6654a08b219769a43a7

code:3.py

def aggregate(responses):

p = .75

q = .25

sums = np.sum(responses, axis=0)

n = len(responses)

return (v - n*q) / (p-q) for v in sums

responses = [perturb(encode(r)) for r in adult'Occupation']

counts = aggregate(responses)

list(zip(domain, counts))

[('Adm-clerical', 3903.5),

('Exec-managerial', 4139.5),

('Handlers-cleaners', 1411.5),

('Prof-specialty', 4111.5),

('Other-service', 3197.5),

('Sales', 3695.5),

('Craft-repair', 4031.5),

('Transport-moving', 1593.5),

('Farming-fishing', 1081.5),

('Machine-op-inspct', 1839.5),

('Tech-support', 975.5),

('Protective-serv', 693.5),

('Armed-Forces', 95.5),

('Priv-house-serv', 43.5)]