Best automatic audio/video transcription software?

Sphere

kiwifarms.net
Joined
Dec 17, 2019
So I wanted to keep up with a certain youtuber that makes 1-2+ hour videos and often repeats himself, on top of having a nasal voice. But it's useful information, and instead of watching 2-4h videos on 2x speed, I could read the videos as text in half and hour or less.

Just to be clear, I don't want software to listen to an audio/video file and manually type the transcription, I want the software to do it for me.

What I tried is using Virtual Audio Cable to create a feedback loop (i.e. sending the output sound to a virtual microphone) and using the Voice Typing function in Google Docs. It worked for a while but the piece of shit (Voice Typing function) stops while the person is still speaking and you have to click it again. Which kind of defeats the purpose since I wanted to leave the videos transcribing while I sleep.

I'm looking for something open source or paid software easy to Jolly Roger.
 
Solution
Nuance Dragon Naturally Speaking probably, it's what some of my colleagues use and I think for your purposes it would suffice. Some versions are certainly "floating around" out there. It's a professional piece of software that's been around for ages, used for dictation/text to speech. But automatic transcription even with expensive software is still not entirely accurate, they tend to learn a voice profile and become more accurate over time.
This is one of the main reasons that transcription is still a job (and every now and again I'll make ~3€ per audio minute of transcribing and formatting interviews for academic purposes - and that's not an unusual rate)
If your Youtuber's audio is decent and he enunciates properly, it should work...
Nuance Dragon Naturally Speaking probably, it's what some of my colleagues use and I think for your purposes it would suffice. Some versions are certainly "floating around" out there. It's a professional piece of software that's been around for ages, used for dictation/text to speech. But automatic transcription even with expensive software is still not entirely accurate, they tend to learn a voice profile and become more accurate over time.
This is one of the main reasons that transcription is still a job (and every now and again I'll make ~3€ per audio minute of transcribing and formatting interviews for academic purposes - and that's not an unusual rate)
If your Youtuber's audio is decent and he enunciates properly, it should work though.
 
Solution
  1. Youtube-dl to download the videos
  2. <https://pypi.org/project/SpeechRecognition/> to rip the audio to text. Only problem is it's not going to track different speakers. You might want to use a AI software that identifies the different speakers in the video and then muxes out their voices before sending to speech-to-text.
  3. You could use a audio classifier to try to match audio of who is speaking to a known (labelled) sample. <https://librosa.org/doc/latest/index.html> and <https://medium.com/@anonyomous.ut.grad.student/building-an-audio-classifier-f7c4603aa989>
 
  1. Youtube-dl to download the videos
  2. <https://pypi.org/project/SpeechRecognition/> to rip the audio to text. Only problem is it's not going to track different speakers. You might want to use a AI software that identifies the different speakers in the video and then muxes out their voices before sending to speech-to-text.
  3. You could use a audio classifier to try to match audio of who is speaking to a known (labelled) sample. <https://librosa.org/doc/latest/index.html> and <https://medium.com/@anonyomous.ut.grad.student/building-an-audio-classifier-f7c4603aa989>

For 2), do you have a version that's already compiled and ready to use in Windows? The installation instructions are too complicated and I don't understand them.
 
For 2), do you have a version that's already compiled and ready to use in Windows? The installation instructions are too complicated and I don't understand them.
You're probably not a programmer and instead looking for something that's ready out of the box. Parts 2. and 3. aren't really going to help you. It's Python3, so you'd just install Python3 and then run pip3 install SpeechRecognition on the command tromp (terminal) in windows. You're probably bettor off with the other suggestion, this is all DIY.
 
Back