isbn_bookmarklet

コメントを頂いたのでAmazonからブックマークでページを作るとき内容紹介も取り込むを改修。

https://twitter.com/yoshinon/status/1058814885197271040

ありがとうございます！！

ものすごくありがたいです。

これ、出版社とかISBNも入るようにするのって、難しいですか？

まずは、ISBNの処理から。

要素から取得が面倒だと思ったのだが、そういえばブクログのブックマークレットを思い出した。

getElementById('ASIN')で、ASIN（アマゾンの商品管理番号）を取得している。

しかし、ページ内にはそういう要素はなかったはず。

ソースを覗いてみて、id="ASIN"で検索。

隠し要素があるようだ。

紙版とKindel版で少し違う。

紙版

https://gyazo.com/52a249ec47376f072b546498ee3d25f8

Kindle版

https://gyazo.com/eba4f7fa7b9cbeafa5c801b02d6f3b10

が、両方ASINに関する要素はある。紙はIDが振ってあって、Kindleはnameが割り当てられている。それぞれで処理を切り分ければOK。

code:getASIN.js

var asin = document.getElementById('ASIN');//

if(asin){

var a = 'ISBN:' + asin.value

} else{

var asin = document.getElementsByName('ASIN.0')0;

var a = 'ASIN:' + asin.value

}

これは簡単。

問題は出版社名。上記の隠し要素にはパブリッシャーの情報は無い。

ソースを見ても、登場するのはmetaタグのキーワードか、ページ内の以下の部分。

https://gyazo.com/e0f961082f287d3791f585d03d31ae7c

そして、この要素には特別なid,class,nameは割り当てられていない。

紙版

https://gyazo.com/5ad7d52ef2fbe4dc338745c70df2ad59

Kindle版

https://gyazo.com/d5a9fd01a6481617c4ab23a656f9dab1

ということはどこかのidを取れる要素から、正規表現等でテキストを取得する必要がありそう。

class="bucket"は複数あり、一応これが一番最初っぽい。が、紙版と構造が違うので面倒っぽい予感。

上部要素のinner.Textを取得し（ページ全体からやると、内容紹介などに出版社という言葉があるとややこしくなるので）、そこから、出版社:hogehoge(hogehoge)というテキストを抽出できればいい。

まずは紙版で考える

code:getDetailText.js

var detail = document.getElementById('detail_bullets_id');

var detailtext = detail.innerText;

抽出はmatchだろう。

var result = detailtext.match(/出版社:.+/);

https://gyazo.com/1a6584dc3356deaba2fbe90d3312120a

かっこの前の部分だけを取得すれば、出版社名が抽出できる。

ただし、出版年月日も欲しい場合も考えて、とりあえずこの文字列を処理することにする。

となると、グループ化して取得した方がよい？

たとえば、var result = detailtext.match(/出版社:(.+)($.+$)/);とする。

result[1]に出版社名、result[2]に出版年月が入る。

出版社名をリンクにする場合を考えて、パターン内に出版社:は入れないようにする

出版年月日のリンクの作り方は、[2018]とか[2018/10]とかいろいろありそうなので、とりあえずはスルー

紙版はこれでOK。

Kinde版は、内容紹介がiframeの中にあるので、それで処理を切り分ける。

code:iframeprocess.js

if (!detail) {

var subdoc = document.getElementById("product-description-iframe").contentWindow.document;

var detail = subdoc.getElementById("productDetailsTable");

}

とりあえずできた。

https://gyazo.com/4c73914deb8173aa799c7b977f0c10a6

こういうメタ情報を、どの場所にどう並べるのか、という問題はある。

上にまとめるか

下にまとめるか

あとは日付リンクの作り方。

できるだけ使う人がカスタマイズしやすいようにコードを欠いておきたい

あと、明らかに重複する処理が出ているのでなんとかしたい。

どこかの段階で、Kindle版かどうかを分けて、その処理を一括する。

あるいは、iframe内の処理だけをまとめる

セルフパブリッシング本で問題が

版数が表記されているものがある

https://gyazo.com/2a27c4af0288498e11a03365666178f7

そもそも出版社名がないものがある

https://gyazo.com/eecf1329a6477ab27819153de56aab8a

これらを切り分けた処理を考えないと

出版社名がない場合は、空白にする。

これは簡単

出版社名に版数が含まれている場合は？

どんな書き方のパターンがあるのかの実例が知りたいが、自分の本しか見つけられなかった

とりあえず、それをベースに考える。

取得した出版社名に倉下忠憲; 1版のように;が入っている場合が、版数が入っている場合だと想定する

regex.test(targetText)を使うとする

倉下忠憲; 1版→[倉下忠憲]; 1版

倉下忠憲→[倉下忠憲]

だいたいできた。

出版年月日も、リンク無し、年だけリンク、年月をリンクのパターンを作っておいた

2019/5/2

var title = window.prompt('Scrap "Amazon" to your scrapbox.', p.innerText);

のままだとKindle版のタイトルの最後に空白文字が入るので以下に変更

var title = window.prompt('Scrap "Amazon" to your scrapbox.', p.innerText.trim());

2021/1/4

メールにて複数人の著者がいるときに改行がべらぼーに入ってしまう部分の修正案をいただいた

大感謝

code:after.js

var pub = []; //著者情報の処理

var c = document.getElementsByClassName('author');

for (g = 0; g < c.length; g++) {

var at = cg.innerText.replace(/\r?\n/g, '').replace(/,/, ''); // ←ココにreplace追記

console.log(cg.innerText);

console.log(at);

var pu = at.match(/$.+$/);

var ct = at.replace(/$.+$/, '').replace(/ /g, '');

pub.push(pu + ' + ct + '');

旧版

ブックマークレットに直接登録する場合はこちら（文字数多くてエラーになるかも）

（2020/3/8に改定）

code:script_min.js

javascript:(function(){var e=document.getElementById("productTitle");e||(e=document.getElementById("ebooksProductTitle"));if(e=window.prompt('Scrap "Amazon" to your scrapbox.',e.innerText.trim())){e="\u300e"+e+"\u300f";var c=document.getElementById("ASIN");c?c="ISBN:"+c.value:(c=document.getElementsByName("ASIN.0")0,c="ASIN:"+c.value);var a=document.getElementById("detail_bullets_id");if(!a){var b=document.getElementById("product-description-iframe").contentWindow.document;a=b.getElementById("productDetailsTable")}(a=

a.innerText.match(/(\u51fa\u7248\u793e:.+)($.+$)/))?(a1=a1.replace(/:/,":["),a1=a1.match(/;/)?a1.replace(/;/,"];"):a1+"]",a2=a2.replace(/\((\d+\/\d+)\//,"($1/")+" "):a="","","";b=document.getElementById("productDescription");!b&&document.getElementById("product-description-iframe")&&(b=document.getElementById("product-description-iframe").contentWindow.document,b=b.getElementById("productDescription"));if(b){var d=b.getElementsByTagName("p")0;d||(d=b.getElementsByClassName("productDescriptionWrapper")0);

b=d.innerText.replace(/\n/g,"\n>")}else b="";(d=document.getElementById("imageBlockContainer"))||(d=document.getElementById("ebooksImageBlockContainer"));d=d.getElementsByTagName("img")0.getAttribute("src");var h=[],k=document.getElementsByClassName("author");for(g=0;g<k.length;g++){var f=kg.innerText.replace(/,/,""),l=f.match(/$.+$/);f=f.replace(/$.+$/,"").replace(/ /g,"");h.push(l+" "+f+"")}c=""+d+" "+window.location.href+"\n"+h.join(" ")+"\n"+a1+a2+c+"\n>"+b+"\n#\u672c\n";c=encodeURIComponent(c);

window.open("https://scrapbox.io/hokoxjouhou/"+encodeURIComponent(e.trim())+"?body="+c)}})();

code:script_min_old.js

javascript:(function(){var p=document.getElementById("productTitle");if (!p) var p=document.getElementById("ebooksProductTitle");var title=window.prompt('Scrap "Amazon" to your scrapbox.', p.innerText);if (!title) return;title='『'+title+'』';var asin=document.getElementById('ASIN');if(asin){var a='ISBN:' + asin.value;}else{var asin=document.getElementsByName('ASIN.0')0,a='ASIN:' + asin.value;}var detail=document.getElementById('detail_bullets_id');if (!detail) {var subdoc=document.getElementById("product-description-iframe").contentWindow.document;var detail=subdoc.getElementById("productDetailsTable");}var detailtext=detail.innerText;var pubdata=detailtext.match(/(出版社:.+)($.+$)/);if (pubdata){pubdata1=pubdata1.replace(/:/,':[');pubdata1=(pubdata1.match(/;/)?pubdata1.replace(/;/,'];'):pubdata1 + ']');pubdata2=pubdata2.replace(/$(\d+\/\d+)\//, '($1/') + ' ';}else{var pubdata='','','';}var d=document.getElementById("productDescription");if (!d) {var subdoc=document.getElementById("product-description-iframe").contentWindow.document;var d=subdoc.getElementById("productDescription");}var d1=d.getElementsByTagName("p")0;if (!d1) var d1=d.getElementsByClassName("productDescriptionWrapper")0;var d2=d1.innerText.replace(/\n/g,'\n>');var imagecontainer=document.getElementById("imageBlockContainer");if (!imagecontainer) var imagecontainer=document.getElementById("ebooksImageBlockContainer");var image=imagecontainer.getElementsByTagName("img")0;var imageurl=image.getAttribute("src");var pub=[];var c=document.getElementsByClassName('author');for (g=0;g < c.length;g++){var at=cg.innerText.replace(/,/,'');var pu=at.match(/\(.+$/);var ct=at.replace(/$.+$/,'').replace(/ /g,'');pub.push(pu + ' + ct + '');}var lines=''+imageurl+' '+window.location.href+'\n' + pub.join(' ')+'\n'+pubdata1+pubdata2+a+'\n>'+d2+'\n#本\n';var body=encodeURIComponent(lines);window.open('https://scrapbox.io/hokoxjouhou/'+encodeURIComponent(title.trim())+'?body='+body)})();

最新版（2021/1/4）ソースコードはこちら（最後の方のURLは自分のプロジェクトに書き換えてください）

code:script.js

javascript:(function(){

var p = document.getElementById("productTitle");//書籍のタイトルの処理

if (!p) var p = document.getElementById("ebooksProductTitle");

var title = window.prompt('Scrap "Amazon" to your scrapbox.', p.innerText.trim());

if (!title) return;

title = '『'+ title +'』';

var asin = document.getElementById('ASIN');//ASIN番号の処理

if(asin){

var a = 'ISBN:' + asin.value;

}else{

var asin = document.getElementsByName('ASIN.0')0,a = 'ASIN:' + asin.value;

}

var detail = document.getElementById('detailBullets_feature_div');//出版社と出版年月の処理

if (!detail) {

var subdoc = document.getElementById("product-description-iframe").contentWindow.document;

var detail = subdoc.getElementById("productDetailsTable");

}

var detailtext = detail.innerText;

var pubdata = detailtext.match(/(出版社 : .+)($.+$)/);//1出版社:シーアンドアール研究所,2(2018/7/27)

if (pubdata){

pubdata1 = pubdata1.replace(/:/,':[');//出版社名をリンクにしないならこの2行は削除する

pubdata1 = (pubdata1.match(/;/)?pubdata1.replace(/;/,'];'):pubdata1 + ']');

//pubdata2 = pubdata2 + ' ';//リンクなし

//pubdata2 = pubdata2.replace(/\((\d+)\//, '($1/') + ' ';//年をリンクに

pubdata2 = pubdata2.replace(/\((\d+\/\d+)\//, '($1/') + ' ';//年月をリンクに

}else{

var pubdata = '','','';

}

var isbookDesc_iframe = document.getElementById("bookDesc_iframe") != null

if (isbookDesc_iframe){

var decsdoc = document.getElementById("bookDesc_iframe").contentWindow.document;//内容紹介の処理

var d = decsdoc.getElementById("iframeContent");

if (d){//内容紹介が存在しているなら

var d1 = d.innerText.replace(/\n/g,'\n>');

}else{

var d1 = "";//内容紹介が空っぽの場合

}

}else{

var d1 = "";//内容紹介が空っぽの場合

}

var image=document.getElementById("imgBlkFront");//書影の処理

if (!image) var image = document.getElementById("ebooksImgBlkFront");

var imageurl = image.getAttribute("src");

var pub = [];//著者情報の処理

var c = document.getElementsByClassName('author');

for (g = 0; g < c.length ;g++){

var at = cg.innerText.replace(/\r?\n/g, '').replace(/,/,'');

var pu = at.match(/$.+$/);

var ct = at.replace(/$.+$/,'').replace(/ /g,'');

pub.push(pu + ' + ct + '');

}

var lines=''+imageurl+' '+window.location.href+'\n' + pub.join(' ') + '\n' + pubdata1 + pubdata2 + a + '\n>' + d1 + '\n#書籍名\n';//ページへの書き込み内容。ここで順番を変えればページ内容も変わります。

var body = encodeURIComponent(lines);

window.open('https://scrapbox.io/hokoxjouhou/'+encodeURIComponent(title.trim())+'?body='+body)

})();

changelog

2020/8/26

出版社名を取得するための要素名をdetailBullets_feature_divに変更

あまりにコードが長いので、ScrapboxのコードブロックからJavaScriptを読み込むようにするとよいかも。

2019/3/15

AmazonのKindle版ページが微妙にタイトル表記を変えていたので、対応。

https://gyazo.com/de3e63167743847301383d30cdc07dce

var title = window.prompt('Scrap "Amazon" to your scrapbox.', p.innerHTML);

を

var title = window.prompt('Scrap "Amazon" to your scrapbox.', p.innerText);

に変更

2019/6/18 いくつかのページから情報を拾えていない状況を確認

たとえば、以下のAmazonページから情報を取り込めない（途中でコードが止まっている）

https://www.amazon.co.jp/再生産-〔教育・社会・文化〕-ブルデュー・ライブラリー-ピエール・ブルデュー/dp/4938661241/ref=pd_sim_14_2/355-4284581-4471750?_encoding=UTF8&pd_rd_i=4938661241&pd_rd_r=1eaf7cec-9102-11e9-b268-0bb7725c95a0&pd_rd_w=rLS14&pd_rd_wg=PyPwE&pf_rd_p=b88353e4-7ed3-4da1-bc65-341dfa3a88ce&pf_rd_r=Z5AGWYCZJ4MCETFCVBA7&psc=1&refRID=Z5AGWYCZJ4MCETFCVBA7

おそらく中古しか在庫がないので、class名などが異なるのだろう。

productTitleはある

id="ASIN"もある

id="detail_bullets_id"もある

productDescriptionがない

id="imageBlockContainer" もある

productDescriptionが原因だった。

コードでは、productDescriptionがなかった場合電子版のページだと判断して、その処理を行っていたが、紙版で内容紹介がない、という場合が切り分けられていなかった。

2020/4/13

「試し読み」の画像が取り込まれてしまう問題。

imgにIDが埋め込まれていたので、それを使う。

imgBlkFront

ebooksImgBlkFront

で探せばいい

code:old.js

var imagecontainer=document.getElementById("imageBlockContainer");//書影の処理

if (!imagecontainer) var imagecontainer = document.getElementById("ebooksImageBlockContainer");

var image = imagecontainer.getElementsByTagName("img")0;

var imageurl = image.getAttribute("src");

code:new.js

var image=document.getElementById("imgBlkFront");//書影の処理

if (!image) var image = document.getElementById("ebooksImgBlkFront");

var imageurl = image.getAttribute("src");

2020/4/19

内容紹介が取得されないの改修する。

旧コード

code:old.js

var d = document.getElementById("productDescription");//内容紹介の処理

if (!d) {

if (document.getElementById("product-description-iframe")){//もしKindle版なら

var subdoc = document.getElementById("product-description-iframe").contentWindow.document;

var d = subdoc.getElementById("productDescription");

}

if (d){//内容紹介が存在しているなら

var d1 = d.getElementsByTagName("p")0;

if (!d1) var d1 = d.getElementsByClassName("productDescriptionWrapper")0;

var d2 = d1.innerText.replace(/\n/g,'\n>');

}else{

var d2 = "";//内容紹介が空っぽの場合

新コード

code:new.js

var decsdoc = document.getElementById("bookDesc_iframe").contentWindow.document;//内容紹介の処理

var d = decsdoc.getElementById("iframeContent");

if (d){//内容紹介が存在しているなら

var d1 = d.innerText.replace(/\n/g,'\n>');

}else{

var d1 = "";//内容紹介が空っぽの場合

}

2020/5/9

Amazonの在庫がないマーケットプレイスの商品がうまく取りこめない。

どこかがカラなのだろう。値段が怪しいが、さて。

imgBlkFrontがどこにはいっているのか？

マーケットプレイスだと、var decsdoc = document.getElementById("bookDesc_iframe").contentWindow.document;//内容紹介の処理がエラーをはくので、フレームが存在するかどうかで場合分けすることにした。フレームがない場合の内容取得はひどく面倒なので、いったんパス。

（考え）

h2要素を取得（複数取得される）

innterTextが「商品の内容」に一致するものを順繰りに探す。

見つかったら、その中身を保存する。