英語論文メモ - 🍣YuWd(和田唯我)のメモ🍣

英語論文メモ

導入

命名

We are releasing our findings about an instruction-following language model, dubbed Alpaca, which is fine-tuned from Meta’s LLaMA 7B model

Stanford Alpaca

着想を得た

We take inspiration from NLP, where the next token prediction task is used for foundation model pre-training and

to solve diverse downstream tasks via prompt engineering.

Segment Anything

intro

Therefore, it is still an open question whether Transformer architecture is suitable to model graphs and how to make it work in graph representation learning.

https://arxiv.org/pdf/2106.05234.pdf

冒頭

We hereby derive a new class of models, namely data–controlled Neural ODEs.

https://arxiv.org/pdf/2002.08071.pdf

列挙

In summary, the contributions of this work are threefold:

A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning

The main challenges are twofold: (1) to incorporate those very different data types in training, e.g., part, semantic, instance, panoptic, person, medical image, aerial image, etc.; (2) to design a generalizable training scheme that differs from conventional multi-task learning, which is flexible on task definition and is capable of handling out-of-domain tasks.

SegGPT: Segmenting Everything In Context

結果

We observe Hyena to display characteristic few-shot capabilities of standard Transformers, with some tasks e.g., MultiRC seeing a lift of more than 20% accuracy over zero-shot when the model is provided additional prompts as context.

Hyena

数式

where the first term is the supervised loss calculated using the labeled data, while the second term is the semisupervised loss calculated based on the unlabeled data

SemiCDNet: A Semisupervised Convolutional Neural Network for Change Detection in High Resolution Remote-Sensing Images

困難・対処

難しさ

Formidable challenges exist in assembling partially annotated datasets

CLIP-Driven Universal Model for Organ Segmentation and Tumor Detection

隠れた欠点

A pitfall of this approach is its O(n + m) computational complexity.

https://arxiv.org/pdf/2302.04181.pdf

要請

A key design desiderata for S5 was matching the computational complexity of S4 for both online generation and offline recurrence

サブ問題に対処する

In particular, reconstructions of the same scene at different times may vary significantly due to variations in imaging conditions. To alleviate this issue, we employ a dual thresholding scheme where we com- pare between subsampled and original point clouds to detect changes.

City-scale Scene Change Detection using Point Clouds

<弱いもの>に対処する

To tackle this problem, specific segmentation losses have been proposed to cater for deficient segmentation supervision, including ...

大変

However, a specific dataset for this task, which is usually labor-intensive and time-consuming,

Weakly Supervised Silhouette-based Semantic Scene Change Detection

タスクの特徴

によって決まる・依存する

The success of this plan hinges on three components: task, model, and data

Segment Anything

避ける

The above change detection works often require accurate image registration, which can be difficult to achieve under scene changes or illumination variations. We circumvent these challenges by generating 3D point clouds from the input and registering the point clouds instead.

City-scale Scene Change Detection using Point Clouds

Points reconstructed from SfM may vary between reconstructions due to many factors (e.g. illumination), which lead to false positives during com- parison. To circumvent this problem, we employ a dual thresholding scheme

City-scale Scene Change Detection using Point Clouds

提案手法

use

We deploy two techniques to speed up the FFT-based convolution for sequences shorter than 8K: kernel fusion and block FFT.

既製の

We pre-processed document images with an off-the-shelf OCR toolkit to obtain textual content and corresponding 2D position information

LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

促進させる

To facilitate learning for rearrangement, we propose the RoomR dataset that provides a challenging testbed in visually rich interactive environments

図表

図表をなめらかに提示する

Figure 1 illustrates our GraphRNN approach, where the main idea is that we decompose graph generation into a process that generates a sequence of nodes (via a graph-level RNN), and another process that then generates a sequence of edges for each newly added node (via an edge-level RNN).

GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models

https://www.cs.ucr.edu/~eamonn/Keogh_SIGKDD09_tutorial.pdf