Tuesday, May 7, 2013

Thresholds and Tolerance

I'm not a fan of starting a blog post by quoting the definition of the topic in question; it's virtually always just a lazy attempt to co-opt some of the dictionary's presumed authority or credibility and doesn't add anything of substance to the author's argument. That said...

"Tolerance is the permissible limit or limits of variation in a measured value or physical property of a material, manufactured object, system, or service. [...] A variation beyond the tolerance [...] is said to be non-compliant, rejected, or exceeding the tolerance."

I'm quoting this definition because it refers to a specific technical meaning of an otherwise well known word. Most people aren't familiar with the word "tolerance" used in this sense, but it's a useful concept not just in mechanical engineering but in the provision of transcription services for Deaf and hard of hearing students and professionals. In my CART Problem Solving series, I addressed the popular misconception that a tolerance of 90% accuracy was acceptable, because most people think of 90 and 100 as rather large numbers that are pretty much equivalent to each other, even though language is such a fine-grained system that 100 words constitutes only about a paragraph of text, and a 90% error rate works out to an error in just about every sentence. I also talked about the ways in which human captioners are able to use lateral context clues to fill in the gaps of non-ideal audio conditions, while outside of a perfectly amplified, perfectly enunciated standard American accent, automated speech recognition systems go from almost adequate to laughably awful perilously quickly.

Tolerance enters the captioning sphere in other cases as well. Speed, for instance; if a professor's average rate of speed is 160 words per minute (quite a bit below the typical rate of speech, which tends to be between 180 and 220 WPM), a stenocaptioner (AKA a CART provider like me) with a speed of 240 words per minute will be able to achieve virtually 100% accuracy, because any errors can be immediately caught and corrected. A text expansion provider (using a system such as C-Print or Typewell) may have a speed of 140 words per minute or so, which means that if the professor's rate stays completely steady all the way through, they will probably be able to capture a good 85% of what's spoken. Since they're human and not just a mindless speech recognition system, they will give preference to writing down important things (names, technical terms, relationships between concepts), and will try to make sure that the remaining 15% of speech that they're too slow to capture consists mainly of "Um", "Uh", "You know", repeated words, irrelevant asides, and inefficient phrasing that can be tightened up and paraphrased to use fewer keystrokes. In some cases, that will be enough. The professor's speed will never rise above 160 WPM throughout the entire class, and there will be plenty of chaff to ignore, leaving enough time to take down the important content, even though the provider's writing speed is lower than the professor's average rate of speech. By contrast, the stenocaptioner will probably choose to leave out the "Um", "Uh", and "You know" sorts of filler words for clarity's sake, but will not omit repeated words or attempt to paraphrase the professor's wording, no matter how inefficient it might be. Stenocaptioners are focused on providing a verbatim realtime stream, only omitting words that add absolutely no value to understanding, while text expansion providers are focused on tightening up whatever they hear so that it can be written in as few keystrokes as possible. So far, so good. This is a case where stenocaptioning and text expansion are more or less equivalent, and the difference lies mostly in whether the client wants the pure, unmediated words of their professors to interpret for themselves, or whether they'd rather have a condensed version of the information delivered in class, more along the lines of the bullet points on a PowerPoint slide.

Change any of the factors in play, and the results will be very different. For instance, say the professor's average rate of speed is still 160 words per minute, but that's because his rate is 135 when he's writing formulas on the board (about half the class) and 185 when he's explaining what the formulas mean (the other half of the class). Or it's 140 for long stretches at a time, when he's lecturing on the information mandated by the syllabus, but it shoots up to 200 for brief moments, when he gets excited about a particular detail of whatever he's talking about. The stenocaptioner, whose top speed is 240 WPM, is still able to get 100% in all of these situations. The text expansion provider, on the other hand, will be able to handle the 135 WPM formula sections almost perfectly, but will start cutting or condensing words and phrases from the 185 sections, and will be forced to leave out over a quarter of the material from the 200 WPM sections. If this particular professor has a tendency to repeat words, insert lots of filler words, pause between sentences to take a drink of water, or otherwise speak in a lightweight, inefficient way, the text expansion provider might be able to deliver a workable portion of the class's important material, because there will be enough less important stuff they can cut out and still have enough reserve speed to write down the good parts.

If, on the other hand, the professor is an accomplished speaker, who says precisely what she means in precisely the way she means it, if her lectures are a constant stream of dense technical jargon and precise, specific descriptions of how everything fits together, if there's no chaff or filler to cut out and no awkward repetitions to rephrase... The text expansion provider is out to sea. They've got to start cutting important material in favor of leaving in vital material, and that becomes a dangerous guessing game when it comes to the grade of the student they're transcribing for. Text expansion services acknowledge this to a certain extent; they tend to say that CART is recommended when the material is technical or highly precise, such as in the graduate and professional programs that I specialize in. And admittedly, there are some classes and some subjects and some professors where a 140 WPM typing speed, as slow as it is when compared to a stenocaptioner's 240 WPM typing speed, is enough to deliver most important material given in the class.

The question is: How do you tell which situation you're dealing with? If you're a disability director and you're trying to decide between hiring a text expansion provider or a certified CART provider for a given student's schedule of classes, it may seem obvious to choose the former, since text expansion services are cheaper and more widely available. But have you audited the professors in all of the classes in question? Does their average speed always stay under that 160-180 WPM sweet spot? Is there enough extraneous speech to discard and paraphrase without losing important information? Are there ever spikes of higher speeds, and if there are, can you guarantee that none of that high speed material will appear on the test? Have you checked to make sure that there won't be any guest lecturers or student presentations during the course of the semester? Guest experts, since they're not used to speaking for students, tend to speak at 200 to 220 WPM or higher. One that I transcribed a few years ago spoke at 280 WPM, and I found myself starting to do the same sort of paraphrasing and chaff cutting that my text expansion colleagues do as a matter of course. I think I managed a good 90% to 95% of relevant material given in that lecture. But I didn't reach that paraphrasing threshold until I encountered a speaker at the high end of the rate-of-speech bell curve; for text expansion providers, it's their starting point. They don't have any speed in reserve, and if there's nothing extraneous to cut out, they start losing important material very quickly. Give them a 280 WPM speaker, and they're now losing a full 50% of everything that's spoken.

Of course, you could make the argument that most students without hearing loss don't take in 100% of every lecture. They might daydream or nod off, experience a moment of inattention, miss a word or two here or there while skimming through their notes from the class before. Even without getting every word of every lecture, many students do quite well. But where's the cutoff? How many words can you lose and still receive equal access? Which words can you leave out and which must you absolutely leave in? Who do you trust to make that call? It all comes down to tolerance.


  1. This comment has been removed by a blog administrator.

  2. I really enjoyed reading this, Mirabai, as I do all of your articles! Thanks for referring us to this piece when @mytypewell and @NCRA were disagreeing over the definition and use of "realtime" (see https://www.facebook.com/TypeWell/posts/910360728994262).