Athena
手順
1.クエリ結果を保存するs3バケットを指定。
2.s3に保存されたアクセスログからクエリを実行するためのテーブルの作成。
下記のようなクエリでアクセスログ用のテーブルを作成できる。
code: sql
-- サンプルログに合わせたテーブル定義
CREATE EXTERNAL TABLE alb_logs (
type string,
time string,
elb string,
client_ip string,
client_port int,
target_ip string,
target_port int,
request_processing_time double,
target_processing_time double,
response_processing_time double,
elb_status_code int,
target_status_code string,
received_bytes bigint,
sent_bytes bigint,
request_verb string,
request_url string,
request_proto string,
user_agent string,
ssl_cipher string,
ssl_protocol string,
target_group_arn string,
trace_id string,
domain_name string,
chosen_cert_arn string,
matched_rule_priority string,
request_creation_time string,
actions_executed string,
redirect_url string,
lambda_error_reason string,
target_port_list string,
target_status_code_list string,
classification string,
classification_reason string,
tid string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1',
)
LOCATION 's3://your-alb-logs-bucket/AWSLogs/your-account-id/elasticloadbalancing/your-region/'
TBLPROPERTIES ('has_encrypted_data'='false');
3.クエリ実行。
code: sql
SELECT *
FROM alb_logs
where type is not null
パーティション
ログなどは、大量に毎日作られるので、クエリの対象範囲を絞ってコスト最適化するべきなので、パーティション分割したほうがいい。
パーティション分割方法
1.まず、パーティション分割されたALBログテーブルをつくる。
code: sql
CREATE EXTERNAL TABLE alb_logs_partitioned (
type string,
time string,
elb string,
client_ip string,
client_port int,
target_ip string,
target_port int,
request_processing_time double,
target_processing_time double,
response_processing_time double,
elb_status_code int,
target_status_code string,
received_bytes bigint,
sent_bytes bigint,
request_verb string,
request_url string,
request_proto string,
user_agent string,
ssl_cipher string,
ssl_protocol string,
target_group_arn string,
trace_id string,
domain_name string,
chosen_cert_arn string,
matched_rule_priority string,
request_creation_time string,
actions_executed string,
redirect_url string,
lambda_error_reason string,
target_port_list string,
target_status_code_list string,
classification string,
classification_reason string,
tid string
)
PARTITIONED BY (year string, month string, day string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1',
)
LOCATION 's3://your-alb-logs-bucket/AWSLogs/your-account-id/elasticloadbalancing/your-region/'
TBLPROPERTIES ('has_encrypted_data'='false');
2.パーティションを追加する
code: sql
ALTER TABLE alb_logs_partitioned ADD IF NOT EXISTS
PARTITION (year='2025', month='03', day='27')
LOCATION 's3://alb-access-log-202503/AWSLogs/872087213866/elasticloadbalancing/ap-northeast-1/2025/03/27/';
クエリを実行してみる
code: sql
SELECT *
FROM alb_logs
where type is not null
and year = '2025' and month = '03' and day = '27';
既存のパーティションを表示する方法
code: sql
SHOW PARTITIONS alb_logs_partitioned;