Athena - Kazuhiro Ito's Log

Athena

AWSのサービス。

Application Load Balancerからs3に保存されたアクセスログをSQLで検索できる。

手順

１．クエリ結果を保存するs3バケットを指定。

２．s3に保存されたアクセスログからクエリを実行するためのテーブルの作成。

下記のようなクエリでアクセスログ用のテーブルを作成できる。

code: sql

-- サンプルログに合わせたテーブル定義

CREATE EXTERNAL TABLE alb_logs (

type string,

time string,

elb string,

client_ip string,

client_port int,

target_ip string,

target_port int,

request_processing_time double,

target_processing_time double,

response_processing_time double,

elb_status_code int,

target_status_code string,

received_bytes bigint,

sent_bytes bigint,

request_verb string,

request_url string,

request_proto string,

user_agent string,

ssl_cipher string,

ssl_protocol string,

target_group_arn string,

trace_id string,

domain_name string,

chosen_cert_arn string,

matched_rule_priority string,

request_creation_time string,

actions_executed string,

redirect_url string,

lambda_error_reason string,

target_port_list string,

target_status_code_list string,

classification string,

classification_reason string,

tid string

)

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'

WITH SERDEPROPERTIES (

'serialization.format' = '1',

'input.regex' = '(^ *) (^ *) (^ *) (^ *):(0-9*) (^ *):(0-9*) (-.0-9*) (-.0-9*) (-.0-9*) (-.0-9*) (-.0-9*) (-.0-9*) (-.0-9*) \"(^ *) (^ *) (^ *)\" \"(^\"*)\" (^ *) (^ *) (^ *) \"(^\"*)\" \"(^\"*)\" \"(^\"*)\" (-.0-9*) (^ *) \"(^\"*)\" \"(^\"*)\" \"(^ *)\" \"(^\"*)\" \"(^\"*)\" \"(^ *)\" \"(^ *)\" (^ *)'

)

LOCATION 's3://your-alb-logs-bucket/AWSLogs/your-account-id/elasticloadbalancing/your-region/'

TBLPROPERTIES ('has_encrypted_data'='false');

３．クエリ実行。

code: sql

SELECT *

FROM alb_logs

where type is not null

パーティション

ログなどは、大量に毎日作られるので、クエリの対象範囲を絞ってコスト最適化するべきなので、パーティション分割したほうがいい。

パーティション分割方法

１．まず、パーティション分割されたALBログテーブルをつくる。

code: sql

CREATE EXTERNAL TABLE alb_logs_partitioned (

type string,

time string,

elb string,

client_ip string,

client_port int,

target_ip string,

target_port int,

request_processing_time double,

target_processing_time double,

response_processing_time double,

elb_status_code int,

target_status_code string,

received_bytes bigint,

sent_bytes bigint,

request_verb string,

request_url string,

request_proto string,

user_agent string,

ssl_cipher string,

ssl_protocol string,

target_group_arn string,

trace_id string,

domain_name string,

chosen_cert_arn string,

matched_rule_priority string,

request_creation_time string,

actions_executed string,

redirect_url string,

lambda_error_reason string,

target_port_list string,

target_status_code_list string,

classification string,

classification_reason string,

tid string

)

PARTITIONED BY (year string, month string, day string)

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'

WITH SERDEPROPERTIES (

'serialization.format' = '1',

)

LOCATION 's3://your-alb-logs-bucket/AWSLogs/your-account-id/elasticloadbalancing/your-region/'

TBLPROPERTIES ('has_encrypted_data'='false');

２．パーティションを追加する

code: sql

ALTER TABLE alb_logs_partitioned ADD IF NOT EXISTS

PARTITION (year='2025', month='03', day='27')

LOCATION 's3://alb-access-log-202503/AWSLogs/872087213866/elasticloadbalancing/ap-northeast-1/2025/03/27/';

クエリを実行してみる

code: sql

SELECT *

FROM alb_logs

where type is not null

and year = '2025' and month = '03' and day = '27';

既存のパーティションを表示する方法

code: sql

SHOW PARTITIONS alb_logs_partitioned;

AWS Glueで自動パーティションもできる