Unicode
Unicode文字符号化モデル
概念の階層: 抽象→具象
(Abstract) Character Repertoire
= ja: 文字レパートリー
set
Coded Character Set
= ja: 符号化文字集合
list, vector, mapping
set じゃないじゃんwint.icon
単射とは限らない
(Character) Encoding Form
= ja: 符号化形式
multi-words, mulit-bytes
(Character) Encoding Scheme
= ja: エンコーディング・スキーム
octets
直列化と endianness まで考慮する
ref. Unicodeの基礎知識
ref. UTR#17: Unicode Character Encoding Model
小ネタ
U+ は ⊎ (U+228E MULTISET UNION) が由来
ref. codepoint - Why is 'U+' used to designate a Unicode code point? - Stack Overflow, answer
Unicode Mail List Archive: Re: Origin of the U+nnnn notation
What is little-known generally is that the "U+" convention itself was an ASCII-fied compromise for what the Unicode designers *really* wanted to use for the Unicode hexadecimal prefix, which was U+228E MULTISET UNION (whose glyph is a union sign with a plus sign in it).
ref. https://util.unicode.org/UnicodeJsps/character.jsp?a=228E
cf. 多重集合の直和、非交和
ref. https://ja.wikipedia.org/wiki/多重集合#多重集合の演算と重複度函数
multiset union だけど、集合和ではない
リンク集
Unicode lookup tool
Unicode - Wikipedia