Unicode - wint

Unicode

Unicode文字符号化モデル

概念の階層: 抽象→具象

(Abstract) Character Repertoire

= ja: 文字レパートリー

set

Coded Character Set

= ja: 符号化文字集合

list, vector, mapping

set じゃないじゃんwint.icon

単射とは限らない

(Character) Encoding Form

= ja: 符号化形式

multi-words, mulit-bytes

(Character) Encoding Scheme

= ja: エンコーディング・スキーム

octets

直列化と endianness まで考慮する

ref. Unicodeの基礎知識

ref. UTR#17: Unicode Character Encoding Model

小ネタ

U+ は ⊎ (U+228E MULTISET UNION) が由来

ref. codepoint - Why is 'U+' used to designate a Unicode code point? - Stack Overflow, answer

Unicode Mail List Archive: Re: Origin of the U+nnnn notation

What is little-known generally is that the "U+" convention itself was an ASCII-fied compromise for what the Unicode designers *really* wanted to use for the Unicode hexadecimal prefix, which was U+228E MULTISET UNION (whose glyph is a union sign with a plus sign in it).

ref. https://util.unicode.org/UnicodeJsps/character.jsp?a=228E

cf. 多重集合の直和、非交和

ref. https://ja.wikipedia.org/wiki/多重集合#多重集合の演算と重複度函数

multiset union だけど、集合和ではない

リンク集

Unicode lookup tool

Unicode - Wikipedia