itakura-2023-12-15 進捗報告

#2023 #itakura2023

やったこと

入力と正解データのデータの取り出し方を同じようにするために、decoderの中身を確認した。また、サンプリングやスケール変更、クロップを入力、正解に同じ処理を施せるように変更した。

DR(eye)VE.py

・デコードとフレームのサンプリング

サンプリングするフレームを等しくするため、decoder関数に通す前にフレームのstartとendを設定

code:dreyeve.py

#start,endを先に指定

frames_length = video_container.streams.video0.frames

fps = float(video_container.streams.video0.average_rate)

num_frames = self.cfg.DATA.NUM_FRAMES

target_fps = self.cfg.DATA.TARGET_FPS

clip_size = sampling_rate * num_frames / target_fps * fps

delta = max(frames_length - clip_size, 0)

start = random.uniform(0, delta)

end = start + clip_size -1

通る関数に引数を追加

# Decode video. Meta info is used to perform selective decoding.

frames = decoder.decode(

video_container,

sampling_rate,

self.cfg.DATA.NUM_FRAMES,

temporal_sample_index,

self.cfg.TEST.NUM_ENSEMBLE_VIEWS,

video_meta=self._video_metaindex,

target_fps=self.cfg.DATA.TARGET_FPS,

backend=self.cfg.DATA.DECODING_BACKEND,

max_spatial_scale=min_scale,

start=start,

end=end,

)

正解の視線データも読み込む

# Decode map. Meta info is used to perform selective decoding.

sal_maps = decoder.decode(

map_container,

sampling_rate,

self.cfg.DATA.NUM_FRAMES,

temporal_sample_index,

self.cfg.TEST.NUM_ENSEMBLE_VIEWS,

video_meta=self._video_metaindex,

target_fps=self.cfg.DATA.TARGET_FPS,

backend=self.cfg.DATA.DECODING_BACKEND,

max_spatial_scale=min_scale,

start=start,

end=end,

)

・スケール変更とクロップ

code:dreyeve.py

#スケールのサイズを決める

size = int(round(np.random.uniform(min_scale, max_scale)))

入力と正解のクロップとサンプリング

通す関数にsizeの引数を追加

中央からクロップするように変更 spatial_sample_index = 1

frames = utils.spatial_sampling(

frames,

spatial_idx=spatial_sample_index,

min_scale=min_scale,

max_scale=max_scale,

crop_size=crop_size,

size=size,

random_horizontal_flip=self.cfg.DATA.RANDOM_FLIP,

inverse_uniform_sampling=self.cfg.DATA.INV_UNIFORM_SAMPLE,

)

#mapsのcrop

# Perform color normalization.

sal_maps = utils.tensor_normalize(

sal_maps, self.cfg.DATA.MEAN, self.cfg.DATA.STD

)

# T H W C -> C T H W.

sal_maps = sal_maps.permute(3, 0, 1, 2)

# Perform data augmentation.

sal_maps = utils.spatial_sampling(

sal_maps,

spatial_idx=spatial_sample_index,

min_scale=min_scale,

max_scale=max_scale,

crop_size=crop_size,

size=size,

random_horizontal_flip=self.cfg.DATA.RANDOM_FLIP,

inverse_uniform_sampling=self.cfg.DATA.INV_UNIFORM_SAMPLE,

)

・中央からクロップするように変更spatial_sample_index = 1

正解視線データのサイズを14×14に変更

code:dreyeve.py

sal_maps=torch.nn.functional.interpolate(

sal_maps,

size=(14, 14),

mode="bilinear",

align_corners=False,

)

3チャネルそれぞれの値が若干違った

code:dreyeve.py

sal_map = sal_maps:, -1, :, :#3, 14, 14

sal_map = 0.299 * sal_map0,:,: + 0.587 * sal_map1,:,: + 0.114 * sal_map2,:,:

utils.py

def spatial_samplingに指定したクロップの際も水平反転を行うよう追加（いらないかも）

エラー

loss_fun = losses.get_loss_func(cfg.MODEL.LOSS_FUNC)(reduction="mean")

TypeError: __init__() got an unexpected keyword argument 'reduction'

今後の予定

このまま続ける