pEnglish2023-12-20 - NISHIO Hirokazu's Scrapbox (Auto-translated from Japanese)

pEnglish2023-12-20

I practiced English conversation until I used up the GPT quota yesterday before sleep. Reviewing the chat logs in the morning and organizing them is a good practice for improving my English. It generates English contents and serves as a way to express my thoughts in English.

ChatGPT and English Conversation 2023-12-20

Observed cases of conflict in the process of title translation

num collision 54

https://gist.github.com/nishio/2a732b8b5ad9291bfaaf7603f362254b (v3)

Noooooooo, I don't want to be identified with this.

[Subjectivity]: [idiosyncrasy] / [independence]

[Common Sense]: [what is usual] / [natural]

I think not.

[Community]: [Community] / [community]

hmm

Two cases left, difficult to do, more to come tomorrow.

[Diversity]: [diversity] / [pluralistic]

[Empathy]: [empathy] / [the dharma-body (mind, body and spirit)]

To be done on 2023-12-21.

Resolution of the remaining two cases

✅

Start translation

Maybe the long one is an error.

Put on failure list and continue.

2023-12-21

Translation system, I'm wondering if it's not necessary to export JSON since it's crawling to get links anyway? I'm starting to think so.

Problem of not preserving indentation

A key feature of your translation is the preservation of formatting, especially spaces and tabs at the beginning of lines. These are crucial for indicating indentation in bullet lists and must be accurately replicated in your translations.

Translation in GPT 3.5, up to the 59th case done.

And as expected, an error in an overly long article.

code::

20%|███████████████████▎ | 59/300 04:55<20:05, 5.00s/it

openai.BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 4097 tokens. However, your messages resulted in 6138 tokens. Please reduce the length of the messages.", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}

python -m tasks.translate.from_jsonl 9.23s user 2.91s system 4% cpu 5:01.91 total

that's so

Proposal to go through the article that is too long and work through it anyway.

I'll just go through and translate what I can.

The page with the error only records the nature of the error and aims to run to the end for now.

Execution started.

See you tomorrow for the rest of the story.

A note for my future self

Once completed, look at the time and number of pages processed

You can see how many pages are logged as errors

Check the fees charged.

I think most of the errors are pages that are too long.

What to do with long pages needs to be considered

There are cases that do not need to be translated and cases that should be translated exactly.

Simply a case of doing it with a context wide API.

Cases of split translation

...

The appropriate way to cut out the need is determined after the need is identified.

Should pages that are too long be cut out?

On the other hand, there is a need to keep the original page

Is there a theory that PDF is the way to go?

2023-12-23

I checked the morning of 2023-12-22 and it was at 38%, but apparently it hung at that point.

38%|█... | 6668/17569 [11:16:08<7:32:13, 2.49s/it]

I just looked at it and it wasn't progressing. Hmmm... failure.

Hangs silently without error

https://gyazo.com/0c2e7da99d0c4b33464d1289050df692

2023-12-24

47%|█... | 8310/17569 [00:23<00:39, 234.63it/s]

return self._sslobj.read(len)

I'm blocking it on READ.

---

This page is auto-translated from /nishio/pEnglish2023-12-20 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.