URL頻度カウント
どのURL(ドメイン)に何回アクセスしたか見たい
pattern
file:///directory/file.ext
http://domain/
https://domain/
fileの場合、ディレクトリ部分
それ以外の場合は先頭のドメインだけ取る
実装
code:py
def count_per_url_about_firefoxurls(s):
lines = Util.str2lines(s)
counter = {}
for line in lines:
isblank = len(line.strip())==0
if isblank:
continue
key = None
isfile = line.startswith('file:///')
if isfile:
_, filepath = line.split('///')
elements = filepath.split('/')
directory_with_underscode_delimitor = '_'.join(directory)
key = directory_with_underscode_delimitor
else:
# http or https
_, url_without_protocol = line.split('//', 1)
elements = url_without_protocol.split('/')
key = domain
print(key)
notfound = not key in counter
if notfound:
ls = []
for key in counter:
ls.append(element)
asc_ls = sorted(ls, key=lambda elm:elm1) desc_ls = reversed(asc_ls)
newlines = []
for element in desc_ls:
domain, count = element
line = '{} {}'.format(count, domain)
newlines.append(line)
newstr = Util.lines2str(newlines)
return newstr
splitで区切りつつ必要な場所を取り出しーの
カウントはdictで行いーの
降順整列するために「リストに変換」「sort」「sortはasc固定なのでreverseでdescに」しーの