pEnglish2023-12-20
I practiced English conversation until I used up the GPT quota yesterday before sleep. Reviewing the chat logs in the morning and organizing them is a good practice for improving my English. It generates English contents and serves as a way to express my thoughts in English.
Observed cases of conflict in the process of title translation
num collision 54
Noooooooo, I don't want to be identified with this.
[Subjectivity]: [idiosyncrasy] / [independence]
[Common Sense]: [what is usual] / [natural]
I think not.
[Community]: [Community] / [community]
hmm
Two cases left, difficult to do, more to come tomorrow.
[Diversity]: [diversity] / [pluralistic]
[Empathy]: [empathy] / [the dharma-body (mind, body and spirit)]
To be done on 2023-12-21.
Resolution of the remaining two cases
✅
Start translation
Maybe the long one is an error.
Put on failure list and continue.
2023-12-21
Translation system, I'm wondering if it's not necessary to export JSON since it's crawling to get links anyway? I'm starting to think so.
Problem of not preserving indentation
A key feature of your translation is the preservation of formatting, especially spaces and tabs at the beginning of lines. These are crucial for indicating indentation in bullet lists and must be accurately replicated in your translations.
Translation in GPT 3.5, up to the 59th case done.
And as expected, an error in an overly long article.
code::
openai.BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 4097 tokens. However, your messages resulted in 6138 tokens. Please reduce the length of the messages.", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}
python -m tasks.translate.from_jsonl 9.23s user 2.91s system 4% cpu 5:01.91 total
that's so
Proposal to go through the article that is too long and work through it anyway.
I'll just go through and translate what I can.
The page with the error only records the nature of the error and aims to run to the end for now.
Execution started.
See you tomorrow for the rest of the story.
A note for my future self
Once completed, look at the time and number of pages processed
You can see how many pages are logged as errors
Check the fees charged.
I think most of the errors are pages that are too long.
What to do with long pages needs to be considered
There are cases that do not need to be translated and cases that should be translated exactly.
Simply a case of doing it with a context wide API.
Cases of split translation
...
Should pages that are too long be cut out?
On the other hand, there is a need to keep the original page
Is there a theory that PDF is the way to go?
2023-12-23
I checked the morning of 2023-12-22 and it was at 38%, but apparently it hung at that point.
38%|█... | 6668/17569 [11:16:08<7:32:13, 2.49s/it]
I just looked at it and it wasn't progressing. Hmmm... failure.
Hangs silently without error
https://gyazo.com/0c2e7da99d0c4b33464d1289050df692
2023-12-24
47%|█... | 8310/17569 [00:23<00:39, 234.63it/s]
return self._sslobj.read(len)
I'm blocking it on READ.
---
This page is auto-translated from /nishio/pEnglish2023-12-20 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.