Tuesday, September 14, 2010

Technical Accuracy and Semantic Accuracy

It's been a few days since I posted the video of my CART demo. If you've seen it, its subject material has probably gotten fairly hazy in your mind by this point. If you haven't seen it, don't watch it yet. For the record, the audio used in this excerpt was of very good quality, a single speaker with a standard American accent and clear diction, speaking at a rate of approximately 160 words per minute. First, read this transcript, created by YouTube's Autocaptioning software.

"for implants support and you know as I haven't said anything about biology those folks didn't really need to be educated and genetics biochemistry more about it so about the to solve those problems and that's because biology as it used to be was not a science that engineers could addressed very well because in order for engineers really analyze study quantitatively develop models at the bill technologies all for the parts there's a lot of requirements on the science that really biology that that's why %um %um the actual mechanisms a function work understood yes you could see that moving your arm requires certain force in of would where certain load we really didn't know what was going on down in the proteins and cells and tissues of the muscles on the books okay but still you could decide maybe an artificial %uh to do this %um %uh in the plan you really know the molecular compliments so how the world he actually manipulate the system he didn't even know what the molecules work they're really underlying yes you couldn't really do the chemistry on the biological all %uh it's very hard to quantify you can even though the parts of the mechanisms how could you get quantitative measurements for them develop models so there's good reason why they're never really was a biological engineering until very recently while he wasn't the science that was released soon right here in dallas so there for the world biomedical engineering bailey although the deposition prompted it just talked about that that necessarily require miles per se but that's changed the good news for you folks is biology this change it's now a science that engineers had it been that to very well"

Okay. Now, based on the above, I'm sure you were able to get the very general gist of the subject the professor was talking about, but if you were a student, sitting in class at an expensive, prestigious private institute of technology, and the paragraph above was the only access you were given to what the professor was saying, how would you feel about it? What if I told you that "at the bill technologies" was actually "and to build technologies", "muscles on the books" was actually "muscles and the bones", "the world biomedical engineering bailey although the deposition prompted" was actually "the world of biomedical engineering mainly involved all these application problems", and "released soon right here in dallas" was actually "really suited for engineering analysis and engineering synthesis"?

Here's the actual transcript of the excerpt:

"Or implants and so forth. And you notice I haven't said anything about biology. Those folks didn't really need to be educated in genetics, biochemistry, molecular biology, cell biology, to solve those problems. And that's because biology, as it used to be, was not a science that engineers could address very well. Because in order for engineers to really analyze, study, quantitatively develop models, and to build technologies, alter the parts, there's a lot of requirements on the science that really biology didn't satisfy. The actual mechanisms of function weren't understood. Yes, you could see that moving your arm required a certain force, and would bear a certain load, but you really didn't know what was going on down in the proteins and cells and tissues of the muscles and the bones. Okay? But still you could design maybe an artificial bone to do this. An implant. You didn't really know the molecular components, so how in the world could you actually manipulate the system, if you didn't even know what the molecules were, that are really underlying this? Okay? You couldn't really do the chemistry on the biological molecules. It's very hard to quantify, since if you didn't even know the parts and the mechanisms, how could you get quantitative measurements for them, develop models? So there's good reason why there never really was a biological engineering until very recently, because biology wasn't a science that was really suited for engineering analysis and engineering synthesis, and so therefore the world of biomedical engineering mainly involved all these application problems that I just talked about, that didn't necessarily require biology per se. But that's changed. Okay? The good news for you folks is biology has changed. It's now a science that engineers can in fact connect to very well."

My unedited realtime CART output had one error (I wrote "that didn't really require biology" instead of "that didn't necessarily require biology"), giving me an accuracy rating of 99.67%. I would argue that it gave me a semantic accuracy rating of 100%, since "necessarily" and "really" are more or less synonyms. YouTube's autocaptions, graded only on words it got right (not penalizing it for extra wrong words added), got 213 out of 299 words correct, for an accuracy rating of 71.24%. The big question is: What's its semantic accuracy rating?

If all you had to go on was the autocaptions, how useful would you find them, and how much meaning could you extract from the 71% that was correct? Keep in mind that you wouldn't have any external guidance as to which parts were correct and which parts were erroneous. Would you rate this transcript as "worthless", "better than nothing", "pretty good", or "quite useful"? Do you feel that a 71% technical accuracy rating translates to 71% of meaning transmitted and understood? Or do you feel that the scattered and jumbled effect of the machine translation interferes with understanding more severely than the 29% error rate would suggest?

A few caveats: I'm not addressing voice writing here. Independent machine translation and operation of voice recognition software by purposefully dictating or respeaking humans are two very different things. I also recognize that YouTube's autocaptions are not as advanced as those produced by other speaker-independent VR software out there. The main point of this post is to look at the difference between technical accuracy and semantic accuracy. I'd like to do it again sometime with automated software that boasted a technical rating of 90% or more; then I think the difference between technical and semantic accuracy would come out into even starker relief. Remember, 90% accuracy means one word out of every ten is incorrect. But this is the technology I've got available at the moment, so I welcome the input of everyone reading this blog. How useful do you find transcripts like this? Can you put a percentage on it? I'm really looking forward to reading the comments, so I hope a lot of people weigh in.

4 comments:

  1. This is an awesome blog. I am a closed captioner and was thinking about providing CART services because of the cut in captioning rates. I stumbled upon your blog from Linked In.

    I don't think the public realizes how poor a translation rate is at 90%. Anything below 98% is hardly usable, and a good captioner/cart provider should provide at 99% and above consistently.

    If a student is paying the huge tuition fee it costs to attend college, as the parent footing that bill, I would be very upset to think this is what my hearing-impaired child had to use. I know this is a youtube translation, but really, it's just not that uncommon to see poor captions. I almost hate to tell anyone I closed caption for television because I know I am going to get the usual comment, "Boy, you should have seen those horrible captions on the(fill in blank) show the other night."

    It gets old when you consider yourself a professional and yet you find yourself more times than not trying to defend the indefensable. Sort of like the direction our country is headed. But that's a debate for another day.

    Thanks for this post. I will be tagging it on Facebook.

    Educating the public is a good starting point.

    ReplyDelete
  2. Hi, Lorilyn! Thanks for the kind words. I absolutely agree that many people realize what the numbers really mean in terms of translation quality. 98% is far below an acceptable standard, but often people think 90% or 95% are "good enough", when really they're absolutely unreadable and even downright misleading.

    ReplyDelete
    Replies
    1. Obviously I meant "many people don't realize", not "many people realize".

      Delete
  3. I'm a speech technology researcher in my grad school program, and I found your blog due to the Geek Feminism post about Plover (via Erin McKean).

    I'm actually writing my dissertation (due in a month! so I'm procrastinating here!) on looking at grammatical structure to try to improve the error-metrics that speech recognizers use -- a big part of the problem you're identifying is that the machine transcriptions are graded on word error rate (not a semantic or even a syntactically-oriented measure). This stuff is fascinating to me.

    Your performance, of course, is much much better than the machine's, regardless of how you measure it, but I'm glad to see people who care about the details start to ask questions about the metrics (from the partially-automated perspective, rather than fully-automated).

    ReplyDelete