BigQueryにWooCommerce Membershipsのデータを読み込む
date_created datetime read-only
The date (in Atom format) when the user membership object was created, in the local timezone. This does not necessarily match with the start date.
date_created_gmt datetime read-only
The date (in Atom format) when the user membership object was created, in UTC. This does not necessarily match with the start date.
とあるのだけど、
code:members.json
[
...
{
...
"date_created": "2020-06-05T22:55:52+00:00",
"date_created_gmt": "2020-06-05T13:55:52+00:00",
...
},
...
]
となっている。_gmtじゃない方(ここではJST)にも+00:00が付いてしまっている(09:00)が正しい。
これを、(スクリプトなどで是正する処理を挟んず)BigQueryのスキーマ定義などだけでうまく取り込めるのか検証したい。 取り込みたいデータ:
code:gs://example.org/members.json
[
{
"id": 293,
"customer_id": 76,
"plan_id": 166,
"status": "active",
"order_id": null,
"product_id": null,
"date_created": "2020-06-05T22:55:53+00:00",
"date_created_gmt": "2020-06-05T13:55:53+00:00",
"start_date": "2020-06-06T09:00:00+00:00",
"start_date_gmt": "2020-06-06T00:00:00+00:00",
"end_date": null,
"end_date_gmt": null,
"paused_date": null,
"paused_date_gmt": null,
"cancelled_date": null,
"cancelled_date_gmt": null,
"meta_data": [],
"_links": {
"self": [
{
}
],
"collection": [
{
}
],
"customer": [
{
}
]
}
},
{
"id": 289,
"customer_id": 72,
"plan_id": 166,
"status": "active",
"order_id": null,
"product_id": null,
"date_created": "2020-06-05T22:55:52+00:00",
"date_created_gmt": "2020-06-05T13:55:52+00:00",
"start_date": "2020-06-06T09:00:00+00:00",
"start_date_gmt": "2020-06-06T00:00:00+00:00",
"end_date": null,
"end_date_gmt": null,
"paused_date": null,
"paused_date_gmt": null,
"cancelled_date": null,
"cancelled_date_gmt": null,
"meta_data": [],
"_links": {
"self": [
{
}
],
"collection": [
{
}
],
"customer": [
{
}
]
}
}
]
二オブジェクトだけの小さいファイルをGCSに置いておく。
BigQueryデータセットの作成
code:shell
% bq mk sample_members
Dataset 'epub-searcher:sample_members' successfully created.
取り合えず乱暴に
code:shell
% bq load --source_format=NEWLINE_DELIMITED_JSON --replace --autodetect sample_members.members ./members.json
Upload complete.
Waiting on bqjob_r17d023cc44b8b36e_00000174c857152f_1 ... (0s) Current status: DONE
BigQuery error in load operation: Error processing job 'epub-searcher:bqjob_r17d023cc44b8b36e_00000174c857152f_1': Error while reading data, error message: Failed to parse
JSON: No object found when new array is started.; BeginArray returned false
そもそもline delimited JSONじゃないとだめだから無理だ。
というわけで、日次データの間違いとか関係なくフォーマットを変更しないといけないので、その際に直せばいい。