Best automatic audio/video transcription software?

Sphere · Jun 2, 2021

So I wanted to keep up with a certain youtuber that makes 1-2+ hour videos and often repeats himself, on top of having a nasal voice. But it's useful information, and instead of watching 2-4h videos on 2x speed, I could read the videos as text in half and hour or less.

Just to be clear, I don't want software to listen to an audio/video file and manually type the transcription, I want the software to do it for me.

What I tried is using Virtual Audio Cable to create a feedback loop (i.e. sending the output sound to a virtual microphone) and using the Voice Typing function in Google Docs. It worked for a while but the piece of shit (Voice Typing function) stops while the person is still speaking and you have to click it again. Which kind of defeats the purpose since I wanted to leave the videos transcribing while I sleep.

I'm looking for something open source or paid software easy to Jolly Roger.

Bloitzhole · Jun 3, 2021

Nuance Dragon Naturally Speaking probably, it's what some of my colleagues use and I think for your purposes it would suffice. Some versions are certainly "floating around" out there. It's a professional piece of software that's been around for ages, used for dictation/text to speech. But automatic transcription even with expensive software is still not entirely accurate, they tend to learn a voice profile and become more accurate over time.
This is one of the main reasons that transcription is still a job (and every now and again I'll make ~3€ per audio minute of transcribing and formatting interviews for academic purposes - and that's not an unusual rate)
If your Youtuber's audio is decent and he enunciates properly, it should work though.

Sphere · Jun 3, 2021

Thanks! I will check it out.

stares at error messages · Jun 3, 2021

Youtube-dl to download the videos
<https://pypi.org/project/SpeechRecognition/> to rip the audio to text. Only problem is it's not going to track different speakers. You might want to use a AI software that identifies the different speakers in the video and then muxes out their voices before sending to speech-to-text.
You could use a audio classifier to try to match audio of who is speaking to a known (labelled) sample. <https://librosa.org/doc/latest/index.html> and <https://medium.com/@anonyomous.ut.grad.student/building-an-audio-classifier-f7c4603aa989>

Sphere · Jun 3, 2021

stares at error messages said:
Youtube-dl to download the videos

<https://pypi.org/project/SpeechRecognition/> to rip the audio to text. Only problem is it's not going to track different speakers. You might want to use a AI software that identifies the different speakers in the video and then muxes out their voices before sending to speech-to-text.

You could use a audio classifier to try to match audio of who is speaking to a known (labelled) sample. <https://librosa.org/doc/latest/index.html> and <https://medium.com/@anonyomous.ut.grad.student/building-an-audio-classifier-f7c4603aa989>

For 2), do you have a version that's already compiled and ready to use in Windows? The installation instructions are too complicated and I don't understand them.

stares at error messages · Jun 4, 2021

Lou'sMysteriousBenefactor said:
For 2), do you have a version that's already compiled and ready to use in Windows? The installation instructions are too complicated and I don't understand them.

You're probably not a programmer and instead looking for something that's ready out of the box. Parts 2. and 3. aren't really going to help you. It's Python3, so you'd just install Python3 and then run pip3 install SpeechRecognition on the command tromp (terminal) in windows. You're probably bettor off with the other suggestion, this is all DIY.

Best automatic audio/video transcription software?

Sphere

Bloitzhole

Bloitzhole

My boat's got a rusty anchor...

Sphere

stares at error messages

Readn' Tea Leaves

Sphere

stares at error messages

Readn' Tea Leaves