CART Problem Solving Series
Superscript and Subscript
Communicating Sans Steno
Speech Recognition, Part I
Speech Recognition, Part II
Speech Recognition, Part III
Speech Recognition, Part IV
CART PROBLEM: People claim that someday very soon human captioners will be replaced by automated speech recognition engines.
I've got a lot to say on this subject, but it's already late, so I'm going to leave most of the heavy duty analysis for next week's post. For now, I just want to show you a few examples. I first posted this video in 2010:
It's actual classroom audio from MIT's open courseware project. The video is of me captioning it live, using Eclipse. After posting it, I made a post on this blog about my CART accuracy versus the accuracy of YouTube's autocaptions, which at that time had just been released, with promises of increasing accuracy as the time went on.
Here's the transcript of the original autocaptions from 2010:
for implants support and you know as I haven't said anything about biology those folks didn't really need to be educated and genetics biochemistry more about it so about the to solve those problems and that's because biology as it used to be was not a science that engineers could addressed very well because in order for engineers really analyze study quantitatively develop models at the bill technologies all for the parts there's a lot of requirements on the science that really biology that that's why %um %um the actual mechanisms a function work understood yes you could see that moving your arm requires certain force in of would where certain load we really didn't know what was going on down in the proteins and cells and tissues of the muscles on the books okay but still you could decide maybe an artificial %uh to do this %um %uh in the plan you really know the molecular compliments so how the world he actually manipulate the system he didn't even know what the molecules work they're really underlying yes you couldn't really do the chemistry on the biological all %uh it's very hard to quantify you can even though the parts of the mechanisms how could you get quantitative measurements for them develop models so there's good reason why they're never really was a biological engineering until very recently while he wasn't the science that was released soon right here in dallas so there for the world biomedical engineering bailey although the deposition prompted it just talked about that that necessarily require miles per se but that's changed the good news for you folks is biology this change it's now a science that engineers had it been that to very well
Here's the updated transcript from the new autocaptions produced by YouTube's updated and improved speech recognition engine, circa 2012:
for implants and so forth and you know as i haven't said anything about biology those folks didn't really need to be educated in genetics biochemistry molecular body cell bowed to solve those problem and that's because biology as it used to be was not a science that engineers could address very well because in order for engineers really analyze study quantitatively develop models at the bill technologies alter the parts there's a lot of requirements on the science that really biology that satisfy uh... the actual mechanisms a function work understood yes you can see that moving your arm requires certain force in whip where a certain load we really didn't know what was going on down in the proteins and self-interest use of the muscles in the box creek but still you could decide maybe an artificial do this satanic plan you really know the molecular compliments so how the world to be actually manipulate the system to continue to know what the molecules were that are really underlying s thank you could really do the chemistry and biological molecule uh... it's very hard to quantify since if you need to know the parts of the mechanisms how could you get quantitative measurements for them develop models so there's good reason why there never really was a biological engineering until very recently has filed he wasn't a science that was really suited wrench in your analysis synthesis so there for the world biomedical engineering mainly involved all these application props that i've just talked about but that necessarily require biology per se but that's changed good news for you folks is biology those changes in our science that engineers had unfair connect to very well
It's replaced "%uh" with a more appropriate "uh..." and it gets certain words correct that it got wrong in the original, but it also concocted brand-new phrases like "certain force in whip", "self-interested use of the muscles in the box creek", and "an artificial do this satanic plan" for "certain force and would bear", "cells and tissues of the muscles and the bones. Okay?" and "an artificial bone to do this. An implant".
Here's the actual transcript of the video:
Or implants and so forth. And you notice I haven't said anything about biology. Those folks didn't really need to be educated in genetics, biochemistry, molecular biology, cell biology, to solve those problems. And that's because biology, as it used to be, was not a science that engineers could address very well. Because in order for engineers to really analyze, study, quantitatively develop models, and to build technologies, alter the parts, there's a lot of requirements on the science that really biology didn't satisfy. The actual mechanisms of function weren't understood. Yes, you could see that moving your arm required a certain force, and would bear a certain load, but you really didn't know what was going on down in the proteins and cells and tissues of the muscles and the bones. Okay? But still you could design maybe an artificial bone to do this. An implant. You didn't really know the molecular components, so how in the world could you actually manipulate the system, if you didn't even know what the molecules were, that are really underlying this? Okay? You couldn't really do the chemistry on the biological molecules. It's very hard to quantify, since if you didn't even know the parts and the mechanisms, how could you get quantitative measurements for them, develop models? So there's good reason why there never really was a biological engineering until very recently, because biology wasn't a science that was really suited for engineering analysis and engineering synthesis, and so therefore the world of biomedical engineering mainly involved all these application problems that I just talked about, that didn't necessarily require biology per se. But that's changed. Okay? The good news for you folks is biology has changed. It's now a science that engineers can in fact connect to very well.
If you like, go read my original post on the difference between technical accuracy and semantic accuracy. In that post, I determined that, counting only words that the autotranscription got wrong or omitted (not penalizing for extra words added, unlike on steno certification exams), the technical accuracy rate of the autotranscription was 71.24% (213/299 words correct). Two years and much supposed improvement later, the new transcription's technical accuracy rate is... Drum roll, please...
78.59% (235/299 words correct)
Now, I think it's important to point out that this video is essentially ideal in practically every respect for an autocaptioning engine.
* The rate of speech is quite slow. Speech engines tend to fail when the rate gets above 180 WPM or so.
* The speaker has excellent diction. Mumbling and swallowed syllables can wreak havoc with a speech engine.
* The speaker has an American accent. All the most advanced speech engines are calibrated to American accents, because they're all produced by American companies. There are programs that claim to understand various dialects of non-American-accented English (e.g. Scottish, English, Australian), but they're still many generations behind the cutting edge, because they got such a late start in development.
* The speaker is male. Speech engines have a harder time understanding female voices than male ones.
* The speaker is using a high number of fairly long but not excessively uncommon words. Speech engines are better at understanding long words (like "synthesis", "artificial", or "biochemistry") than short ones (like "would" or "weren't"), because they're phonologically more distinct from one another.
* The sound quality is excellent and there is no background noise or music in the video. Humans are able to listen through noise and pick out meaning from cacophony to a degree completely unmatched by any computer software. Even a speech engine that's performing quite well will fall completely to pieces when it's forced to listen through a small amount of static or a quiet instrumental soundtrack.
So if even a video like this can only attain a 78% technical accuracy rating, after two years of high-powered development from the captioning engine produced by one of the most technologically advanced companies in the world... Are you worried that it's going to supplant my 99.67% accuracy rating in another two years? Or ten years? Or 20? And that's just talking about the technical accuracy; I haven't even begun to get into the semantic accuracy. I'll have more to say on this subject in the next installment.