Monday, May 14, 2012

Speech Recognition Part I

CART Problem Solving Series

Sitting Apart
Handling Slides
Classroom Videos
Latin
Superscript and Subscript
Schlepping Gear
Late Hours
Expensive Machines
Communicating Sans Steno
Cash Flow
Lag
Summer
Test Nerves
Ergonomics
Speech Recognition, Part I
Speech Recognition, Part II
Speech Recognition, Part III
Speech Recognition, Part IV

CART PROBLEM: People claim that someday very soon human captioners will be replaced by automated speech recognition engines.

I've got a lot to say on this subject, but it's already late, so I'm going to leave most of the heavy duty analysis for next week's post. For now, I just want to show you a few examples. I first posted this video in 2010:



It's actual classroom audio from MIT's open courseware project. The video is of me captioning it live, using Eclipse. After posting it, I made a post on this blog about my CART accuracy versus the accuracy of YouTube's autocaptions, which at that time had just been released, with promises of increasing accuracy as the time went on.

Here's the transcript of the original autocaptions from 2010:

for implants support and you know as I haven't said anything about biology those folks didn't really need to be educated and genetics biochemistry more about it so about the to solve those problems and that's because biology as it used to be was not a science that engineers could addressed very well because in order for engineers really analyze study quantitatively develop models at the bill technologies all for the parts there's a lot of requirements on the science that really biology that that's why %um %um the actual mechanisms a function work understood yes you could see that moving your arm requires certain force in of would where certain load we really didn't know what was going on down in the proteins and cells and tissues of the muscles on the books okay but still you could decide maybe an artificial %uh to do this %um %uh in the plan you really know the molecular compliments so how the world he actually manipulate the system he didn't even know what the molecules work they're really underlying yes you couldn't really do the chemistry on the biological all %uh it's very hard to quantify you can even though the parts of the mechanisms how could you get quantitative measurements for them develop models so there's good reason why they're never really was a biological engineering until very recently while he wasn't the science that was released soon right here in dallas so there for the world biomedical engineering bailey although the deposition prompted it just talked about that that necessarily require miles per se but that's changed the good news for you folks is biology this change it's now a science that engineers had it been that to very well

Here's the updated transcript from the new autocaptions produced by YouTube's updated and improved speech recognition engine, circa 2012:

for implants and so forth and you know as i haven't said anything about biology those folks didn't really need to be educated in genetics biochemistry molecular body cell bowed to solve those problem and that's because biology as it used to be was not a science that engineers could address very well because in order for engineers really analyze study quantitatively develop models at the bill technologies alter the parts there's a lot of requirements on the science that really biology that satisfy uh... the actual mechanisms a function work understood yes you can see that moving your arm requires certain force in whip where a certain load we really didn't know what was going on down in the proteins and self-interest use of the muscles in the box creek but still you could decide maybe an artificial do this satanic plan you really know the molecular compliments so how the world to be actually manipulate the system to continue to know what the molecules were that are really underlying s thank you could really do the chemistry and biological molecule uh... it's very hard to quantify since if you need to know the parts of the mechanisms how could you get quantitative measurements for them develop models so there's good reason why there never really was a biological engineering until very recently has filed he wasn't a science that was really suited wrench in your analysis synthesis so there for the world biomedical engineering mainly involved all these application props that i've just talked about but that necessarily require biology per se but that's changed good news for you folks is biology those changes in our science that engineers had unfair connect to very well

It's replaced "%uh" with a more appropriate "uh..." and it gets certain words correct that it got wrong in the original, but it also concocted brand-new phrases like "certain force in whip", "self-interested use of the muscles in the box creek", and "an artificial do this satanic plan" for "certain force and would bear", "cells and tissues of the muscles and the bones. Okay?" and "an artificial bone to do this. An implant".

Here's the actual transcript of the video:

Or implants and so forth. And you notice I haven't said anything about biology. Those folks didn't really need to be educated in genetics, biochemistry, molecular biology, cell biology, to solve those problems. And that's because biology, as it used to be, was not a science that engineers could address very well. Because in order for engineers to really analyze, study, quantitatively develop models, and to build technologies, alter the parts, there's a lot of requirements on the science that really biology didn't satisfy. The actual mechanisms of function weren't understood. Yes, you could see that moving your arm required a certain force, and would bear a certain load, but you really didn't know what was going on down in the proteins and cells and tissues of the muscles and the bones. Okay? But still you could design maybe an artificial bone to do this. An implant. You didn't really know the molecular components, so how in the world could you actually manipulate the system, if you didn't even know what the molecules were, that are really underlying this? Okay? You couldn't really do the chemistry on the biological molecules. It's very hard to quantify, since if you didn't even know the parts and the mechanisms, how could you get quantitative measurements for them, develop models? So there's good reason why there never really was a biological engineering until very recently, because biology wasn't a science that was really suited for engineering analysis and engineering synthesis, and so therefore the world of biomedical engineering mainly involved all these application problems that I just talked about, that didn't necessarily require biology per se. But that's changed. Okay? The good news for you folks is biology has changed. It's now a science that engineers can in fact connect to very well.

If you like, go read my original post on the difference between technical accuracy and semantic accuracy. In that post, I determined that, counting only words that the autotranscription got wrong or omitted (not penalizing for extra words added, unlike on steno certification exams), the technical accuracy rate of the autotranscription was 71.24% (213/299 words correct). Two years and much supposed improvement later, the new transcription's technical accuracy rate is... Drum roll, please...

78.59% (235/299 words correct)

Now, I think it's important to point out that this video is essentially ideal in practically every respect for an autocaptioning engine.

* The rate of speech is quite slow. Speech engines tend to fail when the rate gets above 180 WPM or so.

* The speaker has excellent diction. Mumbling and swallowed syllables can wreak havoc with a speech engine.

* The speaker has an American accent. All the most advanced speech engines are calibrated to American accents, because they're all produced by American companies. There are programs that claim to understand various dialects of non-American-accented English (e.g. Scottish, English, Australian), but they're still many generations behind the cutting edge, because they got such a late start in development.

* The speaker is male. Speech engines have a harder time understanding female voices than male ones.

* The speaker is using a high number of fairly long but not excessively uncommon words. Speech engines are better at understanding long words (like "synthesis", "artificial", or "biochemistry") than short ones (like "would" or "weren't"), because they're phonologically more distinct from one another.

* The sound quality is excellent and there is no background noise or music in the video. Humans are able to listen through noise and pick out meaning from cacophony to a degree completely unmatched by any computer software. Even a speech engine that's performing quite well will fall completely to pieces when it's forced to listen through a small amount of static or a quiet instrumental soundtrack.

So if even a video like this can only attain a 78% technical accuracy rating, after two years of high-powered development from the captioning engine produced by one of the most technologically advanced companies in the world... Are you worried that it's going to supplant my 99.67% accuracy rating in another two years? Or ten years? Or 20? And that's just talking about the technical accuracy; I haven't even begun to get into the semantic accuracy. I'll have more to say on this subject in the next installment.

4 comments:

  1. Hello Mirabai Knight:
    Thank you for your postings regarding captioning. I am pursuing this postion for personal and financial reasons, as a career change. I have been conducting research and finding out as much as I can about this exciting industry and the various types of captioning that is needed in today's society. It is fascinating.
    Thank you again and please feel free to send me any other supportive links or information that you feel would be beneficial for someone like me just starting out in this field (though I do have an aptitude for it. Also my previous employment endeavors will prove to be helpful too).
    Thank you again for sharing and Shine On.

    Kay R from Michigan
    ladyofspirit@gmail.com

    ReplyDelete
  2. Totally agree with everything said here. Looking at YouTube's auto-captions, they fail for a number of reasons -- at 140 wpm, things get very shaky with YouTube (using quite possibly the best auto-captioning I have ever seen), if the speech is unclear, errors-aplenty. With maskers, they have a ceiling rate, as well. I look at our hits (from 175 to 200+ easy) and the multitude of accents we deal with at a college level, and nothing beats us tappity-tappers (grin). I am an official at the Sacramento Courthouse (that is, official American Sign Language interpreter). I pulled out the stenomachine some 20 years ago, when a boss saw I had gone to court reporting school - was in my 200's when I switched careers (before realtime) to interpret (ASL, a language I was first exposed to as a child); I work 32 hours at the courthouse as an interpreter and for a handful of universities and colleges online (remote) and in person. I so much enjoy what I am doing in my CART career, and see that the career will expand greatly in the next several years. Thanks Mirabai for all that you have done to bring about an interest in CART as a viable career for many people. In California, the CCRA is encouraging schools to have a separate track for students interested in CART, and we have so far developed a Certified CART Generalist certificate, working on a Certified CART Specialist exam and certificate.

    -- Mark Crossley, RID/CSC&SC:L, NAD/MstrV, CCRA/CCG
    Sacramento, California
    signlaw@gmail.com

    ReplyDelete
  3. Good on ya, Mark! I think you're one of only two combination stenocaptioners/ASL interpreters that I'm aware of in the US (the other is Alan Peacock). Very cool!

    ReplyDelete