html2sb - くたくたじゅうよん

html2sb

2024-08-18 15:05:44 バグあり

document.body.appendChild(document)みたいなことをしてDOMException: Failed to execute 'appendChild' on 'Node': Nodes of type '#document' may not be inserted inside nodes of type 'BODY'.が投げられてしまう

そのうち直す

移動した→/customize/html2sb

/icons/hr.icon

HTMLをscrapbox記法に変換するUserScript

使い方

code:js

import {parse} from '/api/code/takker/html2sb/script.js';

const html = 'html形式のテキスト';

const text = parse({html: html, baseURL: 'https://scrapbox.io'});

引数

次のいずれか

html: 変換対象のhtmlテキスト

baseURL: htmlテキストの基準となるURL

dom: 変換対象のhtmlテキストのDOM

selector:変換する範囲のselector

config (optional): htmlの変換方法

書式はこう

code:ts

type Config = {

selector: string;

text?: (textContent: string, element: HTMLElement) => string;

replacer?: (element: HTMLElement) => HTMLElement;

}[];

selectorで指定したhtml要素をreplacerで変換する

単純にtextContentだけ変更すればいい場合はtextを使う

いずれの変換も指定されていない場合は、selectorに該当する要素を空にする

2021-08-20

14:23:53 gitに移動中

/icons/github.icontakker99/html2scrapbox

テストを書くのに難航している

deno

2021-03-13

02:48:53 DOMを直接渡せるようにした

02:29:59 base URLを指定できるようにした

2021-01-27 07:51:38 processorの仕様を変えた

名前をreplacerに変えた

戻り値で要素を置き換えるようにした

実装

/scrasobox/WebからコピペしたらSB記法に変換するをベースに、いくつか修正を加えている

forEach内の不要な中括弧を削除

DOMParserを使用しない

sessionStorageを消した

使わないとどうなるのか確かめる

既知の問題

code stringみたいなやつを変換するとscrapbox記法がcodeの中に入ってしまう

箇条書きの変換に失敗している

実装したいこと

/icons/done.icontable記法のparse

<caption>があればそれをtableの名前に使う

なければ*にする

/icons/hr.icon

Utilities

code:script.js

const format = text => text.split(/\n/).map(l => l.trim()).join('');

const ng = text => text.trim().replace(/[\\\n]/g, ' ');

defaultのparsing設定

code:script.js

export const defaultConfig = [

{

selector: 'pre',

text: code => 'code:code.*\n' + code.split('\n').map(l => ${l}).join('\n'),

{

selector: 'h3',

text: headline => [** ${format(headline)}],

{

selector: 'h2',

text: headline => [*** ${format(headline)}],

{

selector: 'h1',

text: headline => [**** ${format(headline)}],

{

selector: 'table',

replacer: table => {

const title = table.getElementsByTagName('caption')?.0?.textContent ?? '*';

const body = ...table.getElementsByTagName('tr')

.map(row => ` ${...row.querySelectorAll('th, td')

.map(column => column.textContent).join('\t')}`)

.join('\n');

const pre = document.createElement('pre');

pre.textContent = table:${title}\n${body};

return pre;

{

selector: 'em, i',

text: text => [/ ${format(text)}],

{

selector: 'strong, b',

text: text => [* ${format(text)}],

{

selector: 'ahref imgsrc',

replacer: img => {

const a = img.closest('a');

return document.createTextNode([${img.src.trim()}#.png ${decodeURIComponent(a.href)}]);

{

selector: 'ahref',

text: (_, a) => [${decodeURIComponent(a.href)} ${ng(a.text).trim()}],

{

selector: 'imgsrc',

replacer: img => document.createTextNode([${img.src.trim()}#.png]),

{

selector: 'iframesrc*="//www.youtube.com/embed/"',

replacer: iframe =>

document.createTextNode([https://www.youtube.com/watch?v=${iframe.src.split('/embed/')[1].split('?')[0]}]),

{

selector: 'code',

text: code => \`${code}\`,

];

code:script.js

export function parse({html, baseURL, dom: dom_, selector, config = defaultConfig} = {}) {

let dom = null;

if (dom_) {

dom = dom_.querySelector(selector);

} else {

dom = new DOMParser().parseFromString(html, 'text/html');

dom.head.insertAdjacentHTML('beforeend',<base href="${baseURL}">);

}

// configに基づいてHTMLを変換する

for (const {selector,text,replacer} of config) {

if (replacer) {

dom.querySelectorAll(selector).forEach(element => element.replaceWith(replacer(element)));

continue;

}

if (text) {

dom.querySelectorAll(selector)

.forEach(element => element.textContent = text(element.textContent, element));

continue;

}

dom.querySelectorAll(selector).forEach(element => element.textContent = '');

}

// 箇条書きを変換する

let depth = -1;

const li = node => {

depth++;

node.querySelectorAll('li').forEach(n => li(n));

return node.innerHTML = '@sp@'.repeat(depth--) + node.innerHTML;

};

li(dom);

// DOMを実体化して、テキスト形式でコピペする

document.body.appendChild(dom);

const range = document.createRange();

range.selectNode(dom);

const text = dom.innerText;

document.body.removeChild(dom); // 後始末

return text.replace(/(\s*\n){3,}/g, '\n\n')

.replace(/@sp@/gi, ' '); // ここで箇条書きのインデントを復元している

}

#2024-08-18 15:06:01

#2021-03-13 02:30:11

#2021-01-27 07:56:21

#2020-12-23 23:04:15