Whisper2024-03-25
python whisper.py 0.37s user 0.18s system 0% cpu 10:00.79 total
The second half was just noise, so it was actually two hours.
https://gyazo.com/93f3a6c6c1ab03e26ee499dee3a285a7
Try it with a one-hour study session audio
python whisper.py 0.34s user 0.14s system 0% cpu 3:27.50 total
https://gyazo.com/0237823730c2d31c4c1eda0f821e2430
Unlike the previous example, this one talks for almost the entire hour.
One hour of audio, with a processing time of 3.5 minutes and a cost of 40 cents.
I had Claude summarize the resulting transcriptions.
Nothing hard about the code.
code:python
audio_file = open(audio_path, "rb")
transcription = client.audio.transcriptions.create(
model="whisper-1", file=audio_file
)
print(transcription.text)
with open(f"whisper_out/{indir}_{audio}.txt", "w") as f:
f.write(transcription.text)
The most time-consuming part was installing ffmpeg and starting to put in gcc and stuff. $ brew install ffmpeg
Note that this is for audio file splitting, so it is not necessary if the audio file is less than 25MB.
The audio of the above hour-long talking session is 15MB, so if you're using a recording app that chops it up in an hour, you don't need it.
---
This page is auto-translated from /nishio/Whisper2024-03-25 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.