C言語の型キャストの内部

C言語のキャストは内部的にどうなっている？

単に4バイトの変数の領域を8バイトで見る、などの実装は無理そう

メモリの確保を行っている？

intのキャストを行うプログラム

code:other.c

#include <stdio.h>

int main(void)

{

int n = 5;

printf("%lld\n", (long long)n);

return 0;

}

gcc -S でアセンブリを出力する

code:other.asm

.file "other.c"

.intel_syntax noprefix

.text

.section .rodata

.LC0:

.string "%lld\n"

.text

.globl main

.type main, @function

main:

.LFB0:

.cfi_startproc

endbr64

push rbp

.cfi_def_cfa_offset 16

.cfi_offset 6, -16

mov rbp, rsp

.cfi_def_cfa_register 6

sub rsp, 16

mov DWORD PTR -4rbp, 5

mov eax, DWORD PTR -4rbp

cdqe

mov rsi, rax

lea rax, .LC0rip

mov rdi, rax

mov eax, 0

call printf@PLT

mov eax, 0

leave

.cfi_def_cfa 7, 8

ret

.cfi_endproc

.LFE0:

.size main, .-main

.ident "GCC: (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0"

.section .note.GNU-stack,"",@progbits

.section .note.gnu.property,"a"

.align 8

.long 1f - 0f

.long 4f - 1f

.long 5

.string "GNU"

.align 8

.long 0xc0000002

.long 3f - 2f

.long 0x3

.align 8

.align 8といった行はどういう意味？

.cfi_endprocなどの行から推測すると？

ChatGPTに聞く

アセンブリ言語のディレクティブだという説明を得た

.cfi_endprocもディレクティブ

ディレクティブはアセンブラに対する命令と考えられる

重要なのは、アセンブラへの命令としてこれが書かれているということ

gccでアセンブリ出力したときに.cfi_startprocなどが邪魔なので抑制する方法 - Qiita

.cfiディレクティブを抑制

cfiディレクティブはassembly - What are CFI directives in Gnu Assembler (GAS) used for? - Stack Overflowを参照

nasmでセクションを定義するするときとは書き方が違ったt6o_o6t.icon

nasmではsection .text

.textディレクティブなどを使ってテキストセクションを作るのは同じだった！！

このアセンブリでは.section .text

アセンブリ言語の違い

.textだけの行は.section .textの糖衣構文というような回答を得た

assembly - Understanding gcc -S output - Stack Overflow

この質問、回答、コメントによると、

LFEは関数の開始

LFBは関数の終了

アセンブリは機械語を簡単に書けるツールなのかもしれないt6o_o6t.icon

アセンブリで実行可能ファイルを手で書ける気がするt6o_o6t.icon

例

.stringで文字列を定義できる

.note.GNU-stackはセクションだから

C言語コンパイラの自作をすればコンパイル結果に詳しくなれそうt6o_o6t.icon

必要そうな部分は以下

code:asm-part.asm

push rbp

mov rbp, rsp

sub rsp, 16

mov DWORD PTR -4rbp, 5

mov eax, DWORD PTR -4rbp

cdqe

mov rsi, rax

lea rax, .LC0rip

mov rdi, rax

mov eax, 0

call printf@PLT

cdqeまではスタックフレームを操作していそう

saved ebpして、フレームの底辺をrspと同じにし、rspを引き伸ばす

スタックフレームの底のポインタがあるから、ローカル変数をセットする

cf. ローカル変数は$bpからの相対アドレスでアクセスされる

cdqeの時点でeaxに何が入っているのかを見たい

と思ったが、disasしてこれは変数nの初期値 = 5だと気づいた

gdbのdisas：

コメントも入れている

code:gdb.asm

Dump of assembler code for function main:

0x0000555555555149 <+0>: endbr64

0x000055555555514d <+4>: push rbp

0x000055555555514e <+5>: mov rbp,rsp

=> 0x0000555555555151 <+8>: sub rsp,0x10

0x0000555555555155 <+12>: mov DWORD PTR rbp-0x4,0x5 # 変数 n

0x000055555555515c <+19>: mov eax,DWORD PTR rbp-0x4 # n の値をeaxに

0x000055555555515f <+22>: cdqe

0x0000555555555161 <+24>: mov rsi,rax

0x0000555555555164 <+27>: lea rax,rip+0xe99 # 0x555555556004

0x000055555555516b <+34>: mov rdi,rax

0x000055555555516e <+37>: mov eax,0x0

0x0000555555555173 <+42>: call 0x555555555050 <printf@plt>

0x0000555555555178 <+47>: mov eax,0x0

0x000055555555517d <+52>: leave

0x000055555555517e <+53>: ret

ここでx86_64の呼出規約を確認

x64 での呼び出し規則 | Microsoft Learn

fastcall

これで本当に正しい？

nasmでprintf関数を使う - Qiitaでrdiで書式文字列を渡しているのが分からない

printfに必要なのはアドレスで、アドレスは上記のABIでは整数と同じように渡される

OSが異なればABIも異なる場合があるから、x86_64はこの呼出規約だ、とは言えない

ABIについて考えるときはアーキテクチャではなくOSを考慮する

readelfでOSとABIを見る

$ readelf -a ./other

今回はotherというファイル名でコンパイルしている

code:readelf.txt

OS/ABI: UNIX - System V

ABIについて

これによると、C言語の型キャストの内部#6516d3cf8458750000c83763のABIはMicrosoft x64 ABI

System V ABIを参照

Parameter Passingの項目に解説がある

引数

intとlong longとポインタはINTEGER classに分類される

If the class is INTEGER, the next available register of the sequence %rdi, %rsi, %rdx, %rcx, %r8 and %r9 is used13

この記述から、rdiが第一引数（=%lld\n）、rsiが第二引数（=n=5）

あれ？t6o_o6t.icon

キャストする処理は現れなかった

mov rsi raxで自然と64ビット？

ソースコードからキャストする部分を抜いてコンパイル

code:gdb.asm

Dump of assembler code for function main:

0x0000555555555149 <+0>: endbr64

0x000055555555514d <+4>: push rbp

0x000055555555514e <+5>: mov rbp,rsp

=> 0x0000555555555151 <+8>: sub rsp,0x10

0x0000555555555155 <+12>: mov DWORD PTR rbp-0x4,0x5

0x000055555555515c <+19>: mov eax,DWORD PTR rbp-0x4

0x000055555555515f <+22>: mov esi,eax

0x0000555555555161 <+24>: lea rax,rip+0xe9c # 0x555555556004

0x0000555555555168 <+31>: mov rdi,rax

0x000055555555516b <+34>: mov eax,0x0

0x0000555555555170 <+39>: call 0x555555555050 <printf@plt>

0x0000555555555175 <+44>: mov eax,0x0

0x000055555555517a <+49>: leave

0x000055555555517b <+50>: ret

cdqeが抜けた

プロセッサの防御機構のようなものだと予想していたが、ビット数にも関連している？t6o_o6t.icon

Intel SDMを参照する

Convert Doubleword to Quadword

In 64-bit mode, the default operation size is the size of the destination register. Use of the REX.W prefix promotes this instruction (CDQE when promoted) to operate on 64-bit operands. In which case, CDQE copies the sign (bit 31) of the doubleword in the EAX register into the high 32 bits of RAX.

前後の解説を読んで、この命令は符号つきのレジスタの値を符号拡張する命令だと考えた

符号を維持したまま拡張するということ

Double the size of the source operand by means of sign extension. The CBW (convert byte to word) instruction

copies the sign (bit 7) in the source operand into every bit in the AH register. The CWDE (convert word to doubleword) instruction copies the sign (bit 15) of the word in the AX register into the high 16 bits of the EAX register

rsiレジスタではなくesiレジスタが使われた

以下のように推測t6o_o6t.icon

キャストをする

1. nが格納されたeaxレジスタはCDQE命令でraxに符号を維持したまま拡張される

2. 以降nはrax（64bit）で参照される

キャストをしない

nはeax（32bit）で参照される

つまり、今回のキャストは内部的には符号拡張だったと言える