Tuesday, September 14, 2010

Technical Accuracy and Semantic Accuracy

It's been a few days since I posted the video of my CART demo. If you've seen it, its subject material has probably gotten fairly hazy in your mind by this point. If you haven't seen it, don't watch it yet. For the record, the audio used in this excerpt was of very good quality, a single speaker with a standard American accent and clear diction, speaking at a rate of approximately 160 words per minute. First, read this transcript, created by YouTube's Autocaptioning software.

"for implants support and you know as I haven't said anything about biology those folks didn't really need to be educated and genetics biochemistry more about it so about the to solve those problems and that's because biology as it used to be was not a science that engineers could addressed very well because in order for engineers really analyze study quantitatively develop models at the bill technologies all for the parts there's a lot of requirements on the science that really biology that that's why %um %um the actual mechanisms a function work understood yes you could see that moving your arm requires certain force in of would where certain load we really didn't know what was going on down in the proteins and cells and tissues of the muscles on the books okay but still you could decide maybe an artificial %uh to do this %um %uh in the plan you really know the molecular compliments so how the world he actually manipulate the system he didn't even know what the molecules work they're really underlying yes you couldn't really do the chemistry on the biological all %uh it's very hard to quantify you can even though the parts of the mechanisms how could you get quantitative measurements for them develop models so there's good reason why they're never really was a biological engineering until very recently while he wasn't the science that was released soon right here in dallas so there for the world biomedical engineering bailey although the deposition prompted it just talked about that that necessarily require miles per se but that's changed the good news for you folks is biology this change it's now a science that engineers had it been that to very well"

Okay. Now, based on the above, I'm sure you were able to get the very general gist of the subject the professor was talking about, but if you were a student, sitting in class at an expensive, prestigious private institute of technology, and the paragraph above was the only access you were given to what the professor was saying, how would you feel about it? What if I told you that "at the bill technologies" was actually "and to build technologies", "muscles on the books" was actually "muscles and the bones", "the world biomedical engineering bailey although the deposition prompted" was actually "the world of biomedical engineering mainly involved all these application problems", and "released soon right here in dallas" was actually "really suited for engineering analysis and engineering synthesis"?

Here's the actual transcript of the excerpt:

"Or implants and so forth. And you notice I haven't said anything about biology. Those folks didn't really need to be educated in genetics, biochemistry, molecular biology, cell biology, to solve those problems. And that's because biology, as it used to be, was not a science that engineers could address very well. Because in order for engineers to really analyze, study, quantitatively develop models, and to build technologies, alter the parts, there's a lot of requirements on the science that really biology didn't satisfy. The actual mechanisms of function weren't understood. Yes, you could see that moving your arm required a certain force, and would bear a certain load, but you really didn't know what was going on down in the proteins and cells and tissues of the muscles and the bones. Okay? But still you could design maybe an artificial bone to do this. An implant. You didn't really know the molecular components, so how in the world could you actually manipulate the system, if you didn't even know what the molecules were, that are really underlying this? Okay? You couldn't really do the chemistry on the biological molecules. It's very hard to quantify, since if you didn't even know the parts and the mechanisms, how could you get quantitative measurements for them, develop models? So there's good reason why there never really was a biological engineering until very recently, because biology wasn't a science that was really suited for engineering analysis and engineering synthesis, and so therefore the world of biomedical engineering mainly involved all these application problems that I just talked about, that didn't necessarily require biology per se. But that's changed. Okay? The good news for you folks is biology has changed. It's now a science that engineers can in fact connect to very well."

My unedited realtime CART output had one error (I wrote "that didn't really require biology" instead of "that didn't necessarily require biology"), giving me an accuracy rating of 99.67%. I would argue that it gave me a semantic accuracy rating of 100%, since "necessarily" and "really" are more or less synonyms. YouTube's autocaptions, graded only on words it got right (not penalizing it for extra wrong words added), got 213 out of 299 words correct, for an accuracy rating of 71.24%. The big question is: What's its semantic accuracy rating?

If all you had to go on was the autocaptions, how useful would you find them, and how much meaning could you extract from the 71% that was correct? Keep in mind that you wouldn't have any external guidance as to which parts were correct and which parts were erroneous. Would you rate this transcript as "worthless", "better than nothing", "pretty good", or "quite useful"? Do you feel that a 71% technical accuracy rating translates to 71% of meaning transmitted and understood? Or do you feel that the scattered and jumbled effect of the machine translation interferes with understanding more severely than the 29% error rate would suggest?

A few caveats: I'm not addressing voice writing here. Independent machine translation and operation of voice recognition software by purposefully dictating or respeaking humans are two very different things. I also recognize that YouTube's autocaptions are not as advanced as those produced by other speaker-independent VR software out there. The main point of this post is to look at the difference between technical accuracy and semantic accuracy. I'd like to do it again sometime with automated software that boasted a technical rating of 90% or more; then I think the difference between technical and semantic accuracy would come out into even starker relief. Remember, 90% accuracy means one word out of every ten is incorrect. But this is the technology I've got available at the moment, so I welcome the input of everyone reading this blog. How useful do you find transcripts like this? Can you put a percentage on it? I'm really looking forward to reading the comments, so I hope a lot of people weigh in.

18 comments:

  1. This is an awesome blog. I am a closed captioner and was thinking about providing CART services because of the cut in captioning rates. I stumbled upon your blog from Linked In.

    I don't think the public realizes how poor a translation rate is at 90%. Anything below 98% is hardly usable, and a good captioner/cart provider should provide at 99% and above consistently.

    If a student is paying the huge tuition fee it costs to attend college, as the parent footing that bill, I would be very upset to think this is what my hearing-impaired child had to use. I know this is a youtube translation, but really, it's just not that uncommon to see poor captions. I almost hate to tell anyone I closed caption for television because I know I am going to get the usual comment, "Boy, you should have seen those horrible captions on the(fill in blank) show the other night."

    It gets old when you consider yourself a professional and yet you find yourself more times than not trying to defend the indefensable. Sort of like the direction our country is headed. But that's a debate for another day.

    Thanks for this post. I will be tagging it on Facebook.

    Educating the public is a good starting point.

    ReplyDelete
  2. Hi, Lorilyn! Thanks for the kind words. I absolutely agree that many people realize what the numbers really mean in terms of translation quality. 98% is far below an acceptable standard, but often people think 90% or 95% are "good enough", when really they're absolutely unreadable and even downright misleading.

    ReplyDelete
    Replies
    1. Obviously I meant "many people don't realize", not "many people realize".

      Delete
  3. I'm a speech technology researcher in my grad school program, and I found your blog due to the Geek Feminism post about Plover (via Erin McKean).

    I'm actually writing my dissertation (due in a month! so I'm procrastinating here!) on looking at grammatical structure to try to improve the error-metrics that speech recognizers use -- a big part of the problem you're identifying is that the machine transcriptions are graded on word error rate (not a semantic or even a syntactically-oriented measure). This stuff is fascinating to me.

    Your performance, of course, is much much better than the machine's, regardless of how you measure it, but I'm glad to see people who care about the details start to ask questions about the metrics (from the partially-automated perspective, rather than fully-automated).

    ReplyDelete
  4. Bet365 is the world's favorite online sports betting company. It provides you with the most comprehensive in-play service. You can also watch live sport on bet365. To login in it you can go to - bet365 login . Touch ID, Face ID and Fingerprint login in the Bet365 app allows you to securely and securely login to Bet365. To login to Bet 365, go to its member area " bet365 login "and follow the instructions on the password page. You can visit our website " bet365 login " to get more information about it.

    ReplyDelete
  5. The Canon printer enhances scan functionality, and includes a robust security feature set. Using a Canon printer service phone, you can get a full installation of the canon.com/ijsetup printer and go to the installed Canon printer to download the canon.com/ijsetup driver. To get more and more information, visit our website canon.com/ijsetup and get the information according to convenience .

    ReplyDelete
  6. The Canon IJ Network Tool is a free application that allows you to set, view, or configure the printer's network settings that are connected through the network. For more information about this, you can visit our website canon.com/ijsetup . And using the Canon printer service phone, you can get the complete installation of the canon.com/ijsetup printer and go to the installed Canon printer to download the canon.com/ijsetup driver.

    ReplyDelete
  7. Using a Canon printer service phone, you can get a full installation of the canon.com/ijsetup printer and go to the installed Canon printer to download the canon.com/ijsetup driver. To get more and more information, visit our website canon.com/ijsetup and get the information according to convenience .

    ReplyDelete
  8. You can use the Amazon Instant Video application on your smart TV or by installing a digital media player or video game console, which will be given by the link amazon.com/mytv in my TV description. If you need any assistance related to this, you can contact our team at amazon.com/mytv . Our entire team will be available to help you. For more information go to this link amazon.com/mytv . Amazon Instant Video Application is a very fun application.

    ReplyDelete
  9. The Amazon My TV Code program requires the customer to have an Amazon account and a streaming device or TV. Amazon Prime Video is available on almost all streaming devices like Roku, Amazon Fire TV, Chromecast. All you have to do is go to amazon.com/mytv and activate amazon.com/mytv with the help of Amazon's activation code. You can visit our website amazon.com/mytv for details of the entire process.


    To log in to your Amazon Prime Video account, enter the amazon.com/mytv on your browser and input your login information. A new screen will open asking for the "Amazon Prime Verification Code".

    ReplyDelete
  10. A printer driver is software that your computer uses to talk to physical printers, which may be connected to your computer or another computer on your network. You can download printer drivers and software from our website 123.hp.com/setup . You can visit this 123.hp.com/setup site to install printer setup. To avoid any kind of problem you can visit our website 123.hp.com/setup and take help of our team.

    ReplyDelete
  11. Before enjoying the Hulu service you should subscribe tohttps://sites.google.com/view/site-hulucomactivate and hulu device activation code to service your Hulu account. To see all your favorite content online from https://sites.google.com/view/url-hulucomactivate enter the activation code to enter the code for hulu activation on your device. You will find this code on your registered email id. For more information, visit our website https://sites.google.com/site/hulucomactivateactivationcodes and get the information as per your convenience.

    ReplyDelete
  12. PrimeVideo is an online store with various products and services, which you can buy on their online website primevideo.com/mytv . Prime is a membership that includes two days of distribution, music, network programs and movies. primevideo.com/mytv offers you free multi-day selections. If you want to know more about it, go to our website primevideo.com/mytv and take advantage.

    ReplyDelete
  13. Webrot is quick and easy to download, install and run. In addition, the updates are automated so that you always have the current security. webroot.com/safe gives you excellent PC protection. Your antivirus software also allows downloading webroot with keycode. If you want to download and install Webroot Security on your device, go to webroot.com/safe and your Webroot safe download starts automatically. You can visit our website webroot.com/safe for details of the entire process.

    ReplyDelete
  14. Once you successfully purchase Ms. Office Equipment, you will get an office setup, in which you must enter the key. Download and install MS 365 on your PC with the help of website office.com/setup given by us. Provide the best MS Office product key for your office. To install the Office Setup Product Key, click on the link office.com/setup given by us. For more information about Office setup, visit office.com/setup and enjoy.

    ReplyDelete
  15. Select the office product you want to download and install on the device. Press on the office.com/setup option. And start downloading and installing Office products on the device. Enter the Office Setup Product Key from office.com/setup . If you do not have a new and never used product key, this product key can be used during the activation process. To get help from our team, contact the office.com/setup website.

    ReplyDelete
  16. Get your office setup from office.com/setup . If your version of Ms Office setup came with your PC, you can download or order a latest version from office.com/setup . You can also set up an office with a 25-digit activation code. For more information, you should visit our website office.com/setup . Our team is always available for your help.

    ReplyDelete
  17. ค่าย pg ทางเข้า มือถือ ใหม่ล่าสุด ความสนุกสนานร่าเริงรวมทั้งตื่นเต้นที่สุดในทางเกมคาสิโนออนไลน์ PG ปากทางเข้าโทรศัพท์มือถือใหม่ปัจจุบัน! ในปัจจุบันที่เทคโนโลยีก้าวล้ำขึ้นอย่างเร็ว

    ReplyDelete