I might return to the CART Problem Series in the future, but I think the 18 posts that I've made on the subject will do for now. At least for a little while, I'm going to go back to making posts on various and sundry subjects as they occur to me, rather than just framing them as problems and solutions.
I mentioned in my Communicating Sans Steno post that my dad has had significant hearing loss for as long as I can remember, but since he was largely in denial about it for most of that time, I didn't learn much about how hearing loss worked or what it was like to deal with in daily life, so I'm ashamed to admit that I've been staggeringly insensitive to at least two hard of hearing people I've known over the years.
The first was when I was a teenager. Her parents were friends with my parents, so she and her brother used to come over to my house when we had parties, and sometimes I'd go to theirs. We went to different schools, so we only saw each other a few times a year, but we always had a good time when we got together. I didn't notice that she wore hearing aids until several years after we first met, when she mentioned that she'd won an essay scholarship for teenagers with hearing loss. Like the ignorant blunderer I was, I said, "Wait, you have hearing loss?" For the first time I noticed the aids. "But you wear hearing aids." "Yes," she replied patiently. "So... Why do you qualify for a scholarship if your hearing aids have already fixed the problem?" Like so many people, I'd assumed that if my eyeglasses were able to correct my severe myopia to normal vision, then hearing aids would be able to do the same thing for anything short of total deafness. I had no idea until almost 20 years later that amplification often doesn't improve clarity, that some frequencies can be incapable of amplification due to permanent loss of specific cochlear hair cells, that hearing is an extremely complex mechanism that doesn't have an easy or complete fix when any of its components malfunction. My misrefracting cornea could be completely compensated for by a piece of light-bending plastic. Even with hearing aids, my friend's hearing loss remained something she needed to reckon with.
I'd never noticed her misunderstanding me or asking me to repeat myself when we talked (the way my dad often did), and I didn't realize that the casual one-on-one conversations we had at parties were totally unlike her situation in the classroom, where she was learning new material, sat several feet away from the teacher (losing any ability to lipread, especially since the teacher faced the board most of the time), and was forced to work twice as hard as her classmates to get the same amount of information through her ears and into her brain. The fact that my friend had managed to do this all her life, getting excellent grades and becoming an extremely literate and eloquent writer, totally blew past me. I took it for granted; instead of congratulating my friend on her essay, I was rude and dismissive. I haven't seen my old friend since high school, but if I ever run into her again, I'll apologize and explain that I know a lot more now than I did then -- not that that's any excuse. If I had actually asked her to tell me more about the scholarship instead of assuming that it made no sense, I would have learned something that day, instead of having to wait 20 years to realize how much of a jerk I'd been.
The second incident is even more problematic, because I was in a position of authority. At my college, all sophomores are required to take a year of music theory, even though their degree (there's only one on offer) is in Liberal Arts. Music classes are led by professional instructors, but there are also weekly practicum classes, where students are supposed to try out what they've learned in small 4-to-5-person groups. Students with musical experience are chosen to lead those groups as work-study assignments, and because I'd played in the pit orchestra of a summer repertory theater, I got to be one of them. My job involved drilling the students in singing simple multipart songs and rounds, helping them to analyze counterpoint examples discussed in class, and answering any questions they had about the stuff they were studying. The emphasis was on getting an intellectual understanding of the music rather than in becoming accomplished performers, so it wasn't a problem that a few students in each practicum were tone deaf. Most of them just hadn't been exposed to much formal music training, and once I gave them a few exercises, their pitch discrimination and singing tended to improve quite a bit.
There was one student, though, who found both the music class and the practicum intensely frustrating. I noticed his hearing aids right away, because he'd decorated the earmolds in bright colors. He was forthcoming about his hearing loss, and explained that he got very little out of all the singing, analysis, and call-and-response pitch practice, because he couldn't hear any of it accurately enough to duplicate. Again, I assumed that the hearing aids should have solved the problem, and didn't understand what his issue was. He wasn't being graded on his accuracy in singing, and he wouldn't be penalized if he wasn't able to appreciate the aesthetic nuances of the songs. All he had to do was understand the mathematics of the music on the page, so that he could speak about it in class. The singing exercises were just intended to help build first-hand experience with hearing and repeating music in realtime. I figured that his hearing loss put him in the category of the "tone deaf" students, and treated him accordingly. I didn't realize that, unlike them, the problem wasn't his ability to distinguish the notes on an intellectual level. Unlike them, he wasn't going to improve with practice. He couldn't hear the difference between pitches no matter how many times they were repeated, so he felt like he was being forced to bang his head against a wall every week in practicum. When he expressed his frustration to me, I thought he was being oversensitive, and just reassured him that it wouldn't affect his grade even if he didn't improve by the end of the semester. I didn't realize the emotional consequences of being asked to do something you weren't physically able to do every week in front of your peers, over and over, and failing every time. He eventually wound up transferring to another college, and I'm afraid that my inability to understand what he was telling me played into that decision.
Like any good essay writer, I've Googlestalked both of these people as research for this post, and today they're both extremely successful and well-respected in their fields. Obviously my ignorance didn't stop them from doing what they wanted to do. But when you add my ignorance to the ignorance of everyone else they had to deal with, how much more exhausting, frustrating, annoying, infuriating did it make their educational experiences, not to mention other parts of their lives? If I hadn't gone into CART, I never would have realized the mistakes I'd made in trusting my own assumptions instead of listening to their experiences. Now I do, and I'm mortified when I think of the way I behaved. There's no easy solution to this problem. One out of every seven people in this country have some degree of hearing loss, and yet so few people actually understand how it works. It'll take a lot to educate all 312 million people about the 45 million who are Deaf, deafened, or hard of hearing, but it badly needs to be done.
Friday, December 21, 2012
Thursday, December 6, 2012
CART Problem Solving: Speech Recognition Part IV
CART Problem Solving Series
Sitting Apart
Handling Slides
Classroom Videos
Latin
Superscript and Subscript
Schlepping Gear
Late Hours
Expensive Machines
Communicating Sans Steno
Cash Flow
Lag
Summer
Test Nerves
Ergonomics
Speech Recognition, Part I
Speech Recognition, Part II
Speech Recognition, Part III
Speech Recognition, Part IV
This video is only tangentially relevant to the post; I just found it adorable.
At long last, the final Speech Recognition installment.
CART PROBLEM:Speech recognition is almost always slower and less accurate than stenographic text entry, but there's a strong cultural push to use it, because it's perceived as cheaper and less complicated than hiring a qualified CART provider.
In the previous three posts, I discussed why speech recognition isn't yet capable of providing accurate text when presented with untrained multispeaker audio. I also spoke a bit about why the common assumption that it would only take a bit more development time and processing power to get to 100% accuracy is based on a misunderstanding of how language works and how speech recognition engines try to capture it.
Just because a lizard can play that bug-squishing iPhone game, it doesn't follow that upgrading the lizard to a cat will make it a champion at Dance Dance Revolution. A bigger speech corpus, faster computers, and even a neural-network pattern matching model still doesn't make up for the essential difference between human and mechanized speech recognition: Humans are able to make use of context and semantic cues; computers are not. Language is full of soundalike words and phrases, and imperfect audio is very much the rule and not the exception in most real-world situations. This means that humans will inevitably have the edge over computers in differentiating ambiguous sound patterns, and the improvements in speech recognition technology will follow an asymptotic trajectory, with each new improvement requiring vastly greater effort to achieve, and the final goal of accurate independent transcription a nearly impossible one, except in controlled settings, with a narrow range of speakers and vocabulary.
But of course there's a huge difference between a professional voice writer and an untrained one, and an even greater difference between any kind of respeaking system and a speaker-independent speech transcription program. Despite widespread public perception, voice writing isn't actually any easier to do than CART, and in fact is usually quite a bit harder in most circumstances.
The supposedly short training period is voice writing's major selling point over steno (aside from the cost of equipment), but from what I can tell, it's not actually true. You can train someone to a moderate degree of accuracy very quickly; all they have to do is speak into the microphone slowly and clearly, and it'll get a fair amount of words correct. For dictation or offline transcription, this can work well, assuming they have the stamina to speak consistently for long periods of time, because they can speak at a slow pace, stop, go back, and correct errors as they make them. Obviously, the closer a person's voice is to the standard paradigm (male, American, baritone), the better results they'll get. Many people with non-standard voices (such as this deaf female blogger) have a heck of a time getting software to understand them, even speaking as slowly and clearly as they can manage. But even for men with American accents, actual live realtime respeaking at CART levels of accuracy (ideally over 99% correct) is much, much harder than dictation.
* Short words are more difficult for the speech engine to recognize than multisyllabic words are, and are more likely to be ignored or mistranscribed.
* If the voice captioner does mostly direct-echo respeaking, meaning that they don't pronounce common words in nonstandard ways, they have to repeat multisyllabic words using the same number of syllables as in the original audio; if they try to "brief" long words by assigning a voice macro that lets them say the word in one syllable, they run up against the software's difficulty in dealing with monosyllabic words that I mentioned above.
* Because they're mostly saying words in the same amount of time as they were originally spoken (unlike in steno, where a multisyllabic word can be represented by a single split-second stroke), they don't have much "reserve speed" to make corrections if the audio is mistranscribed. They also have to verbally insert punctuation and use macros to differentiate between homonyms, which takes time and can be fatiguing.
* Compensating for the lack of reserve speed by speaking the words more quickly than they were originally spoken can be problematic, because the software is better able to transcribe words spoken with clearly delineated spaces between them, as opposed to words that are all run together.
* This means that if the software makes a mistake and the audio is fairly rapid, the voice captioner is forced to choose between taking time to delete the mistake and then catching up by paraphrasing the speaker, or to keep up with the speaker while letting the mistake stand.
* The skill of echoing previously spoken words aloud while listening to a steady stream of incoming words can be quite tricky, especially when the audio quality is less than perfect; unlike simultaneous writing and listening, simultaneous speaking and listening can cause cross-channel interference.
This doesn't even go into the potential changes in a person's voice brought about by fatigue, allergies, colds, or minor day-to-day variations, all of which can wreak havoc with even a well-trained voice engine.
Low or moderate accuracy offline voice writing = short training period; most people can do it.
Low or moderate accuracy realtime voice writing = somewhat longer training period; machine-compatible voice timbre and accent required.
CART-level accuracy realtime voice writing = extremely long training period; an enormous amount of talent and dedication required.
I want to emphasize again that none of this is meant to denigrate the real skill that well-trained voice writers have developed over their years of training. It's just to point out that while voice writer training seems on the surface to be easier and quicker than steno training, that's very seldom the case in practice, as long as appropriate accuracy standards (99% or better) are adhered to. The problem comes in when the people paying for accommodations, either due to a shortage of qualified steno or voice writers, or due to cost considerations, decide that 95% or lower accuracy is "good enough" and that deaf people should be able to "read through the mistakes".
So let's talk about some other potential competitors to CART services. These fall into two general categories: Offline transcription and text expansion. I think I'll leave text expansion for a future series of posts, since it's a fairly complex subject. Offline transcription is much simpler to address.
I've seen several press releases recently from companies bragging about contracts they've secured with universities, that claim to offer verbatim captioning at rock bottom prices. The catch is that the captioning isn't live. No university or conference organizer I know of is foolhardy enough to set completely automated captions up on a large screen in front of the entire audience for everyone to see. The mistakes made by automated engines are far too frequent and hilarious to get away with. But they will, it seems, let lectures be captured by automated engines, then give the rough transcripts to either in-house editors (mostly graduate students) or employees of the lecture-capturing companies, to produce a clean transcript at speeds that are admittedly somewhat better than they used to be, back when making a transcript or synchronized caption file offline usually involved a qwerty typist starting from scratch.
I'm worried that this is starting to be perceived as an appropriate accommodation for students with hearing loss, because there's a crucial piece missing from the equation: Realtime access. Imagine a lecture hall filled with 250 students at a well-regarded American private university, sitting with laptops and notebooks and audio recorders, facing the PowerPoint screen, ready to learn. It's Monday morning. In walks the professor, who pulls up her slideshow and begins the lecture.
PROFESSOR: Tanulmányait a kolozsvári zenekonzervatóriumban, majd a budapesti Zeneakadémián végezte, Farkas Ferenc, Bárdos Lajos, Járdányi Pál és Veress Sándor tanítványaként. Tanulmányai elvégzése után népzenekutatással foglalkozott. Romániában ösztöndíjasként több száz erdélyi magyar népdalt gyűjtött.
After a few seconds, the students start looking at each other in confusion. They don't speak this language. What's going on? The professor continues speaking in this way for 50 minutes, then steps down from the podium and says, "The English translation of the last hour will be available within 48 hours. Please remember that there is a test on Wednesday."
These students are paying $50,000 or $60,0000 a year to attend this school. They're outraged. Not only do they have less than 24 hours to study the transcript before the test, but they were unable to ask questions or to see the slides juxtaposed with the lecture material. Plus they just had to sit there for 50 minutes, bored and confused, without the slightest idea of what was going on. It wouldn't stand. The professor would be forced to conduct future lectures in English rather than Hungarian, or risk losing her job. This is the state of affairs for deaf and hard of hearing students offered transcripts rather than live captioning. It deprives them of an equal opportunity for learning alongside their peers, and it forces them to waste hours of their life in classes that they can't hear and therefore can't benefit from. I'm waiting for the day when the first student accommodated in this way sues their school for violating the Americans with Disabilities Act, and at that point the fast-turnaround transcript and captioning companies are going to be in a good deal of trouble. There is the possibility of training realtime editors who might be able to keep up with the pace of mistakes and correct each error a few seconds after it's made before the realtime is delivered to the student, but that adds yet another person into the workflow, reducing the savings the university was hoping to get when they laid off their CART providers. In some classes, a relatively untrained editor with a qwerty keyboard will be able to zap the errors and clean up the transcript in realtime, but in others -- where the professor doesn't speak Standard Male American (true for a significant and increasing number of professors in the US college system), or there's too much technical jargon, or the noise of the ventilation system interferes with the microphone, or any of a hundred other reasons -- the rate of errors made by the speech engine will outpace the corrections any human editor can make in realtime.
So what lies ahead in the future? Yes, speech recognition engines will continue to improve. Voice writer training times might decrease somewhat, though fully accurate automated systems will stay out of reach. People don't realize that speech is an analogue system, like handwriting. Computer recognition of the printed word has improved dramatically in the past few decades, and even though transcripts produced via OCR still need to be edited, it's become a very useful technology. Recognition of handwriting has lagged far behind, because the whorls and squiggles of each handwritten letter varies drastically from individual to individual and from day to day. There's too much noise and too little unambiguous signal, apart from the meaning of the words themselves, which allows us to decipher in context whether the grocery list reads "buy toothpaste" or "butter the pasta". Human speech is much more like handwriting than it is like print. Steno allows us to produce clear digital signals that can be interpreted and translated with perfect accuracy by any computer with the appropriate lexicon. Speech is an inextricably analogue input system; there will always be fuzz and flutter.
Sitting Apart
Handling Slides
Classroom Videos
Latin
Superscript and Subscript
Schlepping Gear
Late Hours
Expensive Machines
Communicating Sans Steno
Cash Flow
Lag
Summer
Test Nerves
Ergonomics
Speech Recognition, Part I
Speech Recognition, Part II
Speech Recognition, Part III
Speech Recognition, Part IV
This video is only tangentially relevant to the post; I just found it adorable.
At long last, the final Speech Recognition installment.
CART PROBLEM:Speech recognition is almost always slower and less accurate than stenographic text entry, but there's a strong cultural push to use it, because it's perceived as cheaper and less complicated than hiring a qualified CART provider.
In the previous three posts, I discussed why speech recognition isn't yet capable of providing accurate text when presented with untrained multispeaker audio. I also spoke a bit about why the common assumption that it would only take a bit more development time and processing power to get to 100% accuracy is based on a misunderstanding of how language works and how speech recognition engines try to capture it.
Just because a lizard can play that bug-squishing iPhone game, it doesn't follow that upgrading the lizard to a cat will make it a champion at Dance Dance Revolution. A bigger speech corpus, faster computers, and even a neural-network pattern matching model still doesn't make up for the essential difference between human and mechanized speech recognition: Humans are able to make use of context and semantic cues; computers are not. Language is full of soundalike words and phrases, and imperfect audio is very much the rule and not the exception in most real-world situations. This means that humans will inevitably have the edge over computers in differentiating ambiguous sound patterns, and the improvements in speech recognition technology will follow an asymptotic trajectory, with each new improvement requiring vastly greater effort to achieve, and the final goal of accurate independent transcription a nearly impossible one, except in controlled settings, with a narrow range of speakers and vocabulary.
But of course there's a huge difference between a professional voice writer and an untrained one, and an even greater difference between any kind of respeaking system and a speaker-independent speech transcription program. Despite widespread public perception, voice writing isn't actually any easier to do than CART, and in fact is usually quite a bit harder in most circumstances.
The supposedly short training period is voice writing's major selling point over steno (aside from the cost of equipment), but from what I can tell, it's not actually true. You can train someone to a moderate degree of accuracy very quickly; all they have to do is speak into the microphone slowly and clearly, and it'll get a fair amount of words correct. For dictation or offline transcription, this can work well, assuming they have the stamina to speak consistently for long periods of time, because they can speak at a slow pace, stop, go back, and correct errors as they make them. Obviously, the closer a person's voice is to the standard paradigm (male, American, baritone), the better results they'll get. Many people with non-standard voices (such as this deaf female blogger) have a heck of a time getting software to understand them, even speaking as slowly and clearly as they can manage. But even for men with American accents, actual live realtime respeaking at CART levels of accuracy (ideally over 99% correct) is much, much harder than dictation.
* Short words are more difficult for the speech engine to recognize than multisyllabic words are, and are more likely to be ignored or mistranscribed.
* If the voice captioner does mostly direct-echo respeaking, meaning that they don't pronounce common words in nonstandard ways, they have to repeat multisyllabic words using the same number of syllables as in the original audio; if they try to "brief" long words by assigning a voice macro that lets them say the word in one syllable, they run up against the software's difficulty in dealing with monosyllabic words that I mentioned above.
* Because they're mostly saying words in the same amount of time as they were originally spoken (unlike in steno, where a multisyllabic word can be represented by a single split-second stroke), they don't have much "reserve speed" to make corrections if the audio is mistranscribed. They also have to verbally insert punctuation and use macros to differentiate between homonyms, which takes time and can be fatiguing.
* Compensating for the lack of reserve speed by speaking the words more quickly than they were originally spoken can be problematic, because the software is better able to transcribe words spoken with clearly delineated spaces between them, as opposed to words that are all run together.
* This means that if the software makes a mistake and the audio is fairly rapid, the voice captioner is forced to choose between taking time to delete the mistake and then catching up by paraphrasing the speaker, or to keep up with the speaker while letting the mistake stand.
* The skill of echoing previously spoken words aloud while listening to a steady stream of incoming words can be quite tricky, especially when the audio quality is less than perfect; unlike simultaneous writing and listening, simultaneous speaking and listening can cause cross-channel interference.
This doesn't even go into the potential changes in a person's voice brought about by fatigue, allergies, colds, or minor day-to-day variations, all of which can wreak havoc with even a well-trained voice engine.
Low or moderate accuracy offline voice writing = short training period; most people can do it.
Low or moderate accuracy realtime voice writing = somewhat longer training period; machine-compatible voice timbre and accent required.
CART-level accuracy realtime voice writing = extremely long training period; an enormous amount of talent and dedication required.
I want to emphasize again that none of this is meant to denigrate the real skill that well-trained voice writers have developed over their years of training. It's just to point out that while voice writer training seems on the surface to be easier and quicker than steno training, that's very seldom the case in practice, as long as appropriate accuracy standards (99% or better) are adhered to. The problem comes in when the people paying for accommodations, either due to a shortage of qualified steno or voice writers, or due to cost considerations, decide that 95% or lower accuracy is "good enough" and that deaf people should be able to "read through the mistakes".
So let's talk about some other potential competitors to CART services. These fall into two general categories: Offline transcription and text expansion. I think I'll leave text expansion for a future series of posts, since it's a fairly complex subject. Offline transcription is much simpler to address.
I've seen several press releases recently from companies bragging about contracts they've secured with universities, that claim to offer verbatim captioning at rock bottom prices. The catch is that the captioning isn't live. No university or conference organizer I know of is foolhardy enough to set completely automated captions up on a large screen in front of the entire audience for everyone to see. The mistakes made by automated engines are far too frequent and hilarious to get away with. But they will, it seems, let lectures be captured by automated engines, then give the rough transcripts to either in-house editors (mostly graduate students) or employees of the lecture-capturing companies, to produce a clean transcript at speeds that are admittedly somewhat better than they used to be, back when making a transcript or synchronized caption file offline usually involved a qwerty typist starting from scratch.
I'm worried that this is starting to be perceived as an appropriate accommodation for students with hearing loss, because there's a crucial piece missing from the equation: Realtime access. Imagine a lecture hall filled with 250 students at a well-regarded American private university, sitting with laptops and notebooks and audio recorders, facing the PowerPoint screen, ready to learn. It's Monday morning. In walks the professor, who pulls up her slideshow and begins the lecture.
PROFESSOR: Tanulmányait a kolozsvári zenekonzervatóriumban, majd a budapesti Zeneakadémián végezte, Farkas Ferenc, Bárdos Lajos, Járdányi Pál és Veress Sándor tanítványaként. Tanulmányai elvégzése után népzenekutatással foglalkozott. Romániában ösztöndíjasként több száz erdélyi magyar népdalt gyűjtött.
After a few seconds, the students start looking at each other in confusion. They don't speak this language. What's going on? The professor continues speaking in this way for 50 minutes, then steps down from the podium and says, "The English translation of the last hour will be available within 48 hours. Please remember that there is a test on Wednesday."
These students are paying $50,000 or $60,0000 a year to attend this school. They're outraged. Not only do they have less than 24 hours to study the transcript before the test, but they were unable to ask questions or to see the slides juxtaposed with the lecture material. Plus they just had to sit there for 50 minutes, bored and confused, without the slightest idea of what was going on. It wouldn't stand. The professor would be forced to conduct future lectures in English rather than Hungarian, or risk losing her job. This is the state of affairs for deaf and hard of hearing students offered transcripts rather than live captioning. It deprives them of an equal opportunity for learning alongside their peers, and it forces them to waste hours of their life in classes that they can't hear and therefore can't benefit from. I'm waiting for the day when the first student accommodated in this way sues their school for violating the Americans with Disabilities Act, and at that point the fast-turnaround transcript and captioning companies are going to be in a good deal of trouble. There is the possibility of training realtime editors who might be able to keep up with the pace of mistakes and correct each error a few seconds after it's made before the realtime is delivered to the student, but that adds yet another person into the workflow, reducing the savings the university was hoping to get when they laid off their CART providers. In some classes, a relatively untrained editor with a qwerty keyboard will be able to zap the errors and clean up the transcript in realtime, but in others -- where the professor doesn't speak Standard Male American (true for a significant and increasing number of professors in the US college system), or there's too much technical jargon, or the noise of the ventilation system interferes with the microphone, or any of a hundred other reasons -- the rate of errors made by the speech engine will outpace the corrections any human editor can make in realtime.
So what lies ahead in the future? Yes, speech recognition engines will continue to improve. Voice writer training times might decrease somewhat, though fully accurate automated systems will stay out of reach. People don't realize that speech is an analogue system, like handwriting. Computer recognition of the printed word has improved dramatically in the past few decades, and even though transcripts produced via OCR still need to be edited, it's become a very useful technology. Recognition of handwriting has lagged far behind, because the whorls and squiggles of each handwritten letter varies drastically from individual to individual and from day to day. There's too much noise and too little unambiguous signal, apart from the meaning of the words themselves, which allows us to decipher in context whether the grocery list reads "buy toothpaste" or "butter the pasta". Human speech is much more like handwriting than it is like print. Steno allows us to produce clear digital signals that can be interpreted and translated with perfect accuracy by any computer with the appropriate lexicon. Speech is an inextricably analogue input system; there will always be fuzz and flutter.
Monday, June 25, 2012
Sorry for the Radio Silence
Apologies again for the last several weeks of no posts. I had today's blog post all sketched out, but then a situation came up and I don't think I'll be able to actually write it today. I'm currently helping a family member through an ongoing crisis, and it's soaking up a lot of my posting time. Hopefully I'll be able to get back on track soon.
Monday, June 4, 2012
Taking a Mulligan
Hey, guys. I'm really sorry, but Speech Recognition IV is going to have to wait until next week. I've just got too much to do, preparing for my three-hour Steno Crash Course and Plover Programming Sprint at PyGotham this Friday and Saturday. In the mean time, I'll tide over your desire for speech recognition schadenfreude with this:
Gazpacho Soup Day for Siri
Gazpacho Soup Day for Siri
Monday, May 28, 2012
CART Problem Solving: Speech Recognition Part III
CART Problem Solving Series
Sitting Apart
Handling Slides
Classroom Videos
Latin
Superscript and Subscript
Schlepping Gear
Late Hours
Expensive Machines
Communicating Sans Steno
Cash Flow
Lag
Summer
Test Nerves
Ergonomics
Speech Recognition, Part I
Speech Recognition, Part II
Speech Recognition, Part III
Speech Recognition, Part IV
Apologies for the lack of captioning in the first few seconds of the video, but I had to post it. It's a fantastic illustration of not just how often automatic speech recognition gets things wrong, but how wrong it tends to get them. There's a whole series of these Caption Fail videos, but this is the most work safe (and, in my opinion, funniest) of the lot.
See, because computers aren't able to make up for lossy audio by filling in the gaps using semantic and contextual clues, they make mistakes that a human transcriber would never make in a million years. On the one hand, you get "displaces" instead of "this place is". That's reasonable enough, and out of context a human might have made that mistake. But when a human hears "fwooosssssh" as a man tries to imitate the sound of the ocean with his mouth, the computer continues to try to read it as speech, and translates it "question of." Not only was it not able to differentiate between words and sound effects, but "fwoooosh" doesn't sound anything like "question of." The algorithms that computers use to match similar sound patterns to each other are so alien to our way of thinking that, unlike mistakes made by humans, we can't even hope to read through them to figure out what the correct version should have been.
I promised you some illustrations to use when trying to explain why accurate speaker independent automated speech recognition is not "just around the corner", despite the popular conception that it is. I think it's useful to consider your audience when trying to explain these. If you're talking to computer people, bringing in the parallels with OCR might be more effective than if you're talking to people who haven't used that sort of technology. If someone has never heard a beatboxer, my voicewriting to steno analogy comparing beat boxing to drumming won't mean much. Try to get an idea of the person's frame of reference first, and then construct your argument.
Belize, the Passport, and the Pixelated Photo
You know those procedural crime shows? Where the first 20 minutes is taken up with chasing seemingly disconnected clues, and then at last one of the detectives has a sudden flash of insight, puts all the pieces together, and knows where to find the suspect? Sometimes those shows do a very misleading thing. They'll get a blurry photo, something captured by a security camera or a helicopter, and the detective will say, "There! Zoom in right there!" Then the screen will shift and you'll get the same photo, except that the formerly blurry house will be nice and clear, and in the window... Is that a man with a gun? The detective will shout, "Zoom in on that flash of light there!" Again, the pixels will smear and redraw themselves, and look! Reflected in the man's glasses is another man in a Panama hat, wearing a mechanic's shirt with "Jerry" embroidered on the pocket and wielding a tire iron!
It's all very exciting, but it's also very wrong. If you take a blurry photograph and blow it up, you don't get a clearer view of its details; you just get a blurrier photograph with larger pixels. This is the essence of lossiness, but in an image rather than a sound. That visual information was lost when the photo was taken, and no amount of enhancement or sharpening tools will ever get it back. Unlike computers, humans are extremely good at inferring ways of filling in the gaps of lossy material, by using lateral clues. If the hard-won blurry photo of the criminal's coffee table just before he set his house on fire depicts a guide book, a passport, and a bottle of SPF 75 suntan lotion, a computer will either throw up its hands and say, "Not enough information found" or it will produce complete gibberish when trying to decipher its details. A human, on the other hand, will see that the letters on the guidebook, while extremely indistinct, seem to have approximately five spaces between them. The first letter is either an R, a P, or a B, and the last one is quite possibly an E. The passport shows that the criminal will be leaving the country, and the suntan lotion indicates that the location gets lots of sun. The savvy human detective paces around a bit and says -- I've got it! Belize! It's the only country that fits the pattern! Then they hop on the next flight and catch the criminal before the final credits.
The humans didn't need to fill in the gaps of the letters on the guide book in order to figure out the word it spelled, because they were able to use a host of non-textual clues to make up for the lossiness. Because computers programmed to recognize text aren't able to draw on all that subsidiary information, humans will always have an advantage in recognizing patterns, drawing inferences, and correcting errors caused by lossy input.
Why Hasn't Microphone Technology Improved More in 100 Years?
This leads me to the subject of speech recognition, which is a much thornier problem than text recognition. The answer to the question is simple: It has. Listen to an old Edison wax disc record and compare it to CD-quality sound produced by today's audio engineers, and you can hardly claim that recording technology has been stagnant. But the question behind my question is this: With all this fantastic audio recording technology, why is it still close to impossible to get quality audio in a room full of more than a handful of people? Make sure every speaker passes around a hand mic, or wears a lapel mic, or goes up to talk into the lectern mic, and you're fine. Put the best and most expensive multi-directional microphone on the market in the center of the room and half a dozen people sitting around a conference table, and you're sunk. Everyone sounds like they're underwater. The guy adjusting his tie near the microphone sounds like a freight train, while the guy speaking clearly and distinctly at the other end of the table sounds like he's gargling with marbles. Even $4,000 hearing aids have this problem. They're simply not as good as human ears (or, more accurately, the human brain) at filtering out meaningless room noises and selectively enhancing the audio of speakers at a distance. That's why onsite CART is often more accurate by several orders of magnitude than remote CART, no matter how much money is spent on microphones and AV equipment. When the bottleneck of sound input is a microphone, it's limited by its sensitivity, its distance from the speaker, and any interference between the two. That's the problem that still hasn't been solved, over a hundred years since the invention of recorded audio.
Having to transcribe terrible audio, guessing at omissions and listening to a five-second clip of fuzzy sound a dozen times before finally figuring out from context what it's actually about has been a real lesson in empathy for me. The frustration I feel in my home office, clicking the foot pedal over and over and listening to imperfect audio many times over, is nothing compared to what my hard of hearing clients feel in the course of their lives every day. They don't have a foot pedal to rewind the last few seconds of conversation, and they're not even getting paid to do this endlessly unrewarding detective work. I suppose I should feel even more sorry for the poor computers, who are trying to deal with substandard audio but don't have the luxury of lateral thinking or contextual clues or the ability to differentiate between soundalike phrases semantically. I've often wanted to rent out an hour of an audiologist's time and hook up the most popular commercial speech recognition software to their test system. I'd be very interested to see how it did. Of course, it could recognize all the tones perfectly well. It might even be all right at the individual words. But unlike a human with hearing loss, who usually does better at guessing words in the contexts of sentences than hearing them on their own, I bet you that the software would do considerably less well, and would probably come out with an audiogram that put them in the range of moderate to severe hearing loss, especially if any of the tests were given with stimulated noise interference mixed into the audio feed. I could be wrong, of course; I haven't yet gotten a chance to actually do this. But I'd be very interested to find out.
Well, this has run rather longer than I thought it would. I guess I'm going to have to do Speech Recognition Part IV next week. 'Til then!
Sitting Apart
Handling Slides
Classroom Videos
Latin
Superscript and Subscript
Schlepping Gear
Late Hours
Expensive Machines
Communicating Sans Steno
Cash Flow
Lag
Summer
Test Nerves
Ergonomics
Speech Recognition, Part I
Speech Recognition, Part II
Speech Recognition, Part III
Speech Recognition, Part IV
Apologies for the lack of captioning in the first few seconds of the video, but I had to post it. It's a fantastic illustration of not just how often automatic speech recognition gets things wrong, but how wrong it tends to get them. There's a whole series of these Caption Fail videos, but this is the most work safe (and, in my opinion, funniest) of the lot.
See, because computers aren't able to make up for lossy audio by filling in the gaps using semantic and contextual clues, they make mistakes that a human transcriber would never make in a million years. On the one hand, you get "displaces" instead of "this place is". That's reasonable enough, and out of context a human might have made that mistake. But when a human hears "fwooosssssh" as a man tries to imitate the sound of the ocean with his mouth, the computer continues to try to read it as speech, and translates it "question of." Not only was it not able to differentiate between words and sound effects, but "fwoooosh" doesn't sound anything like "question of." The algorithms that computers use to match similar sound patterns to each other are so alien to our way of thinking that, unlike mistakes made by humans, we can't even hope to read through them to figure out what the correct version should have been.
I promised you some illustrations to use when trying to explain why accurate speaker independent automated speech recognition is not "just around the corner", despite the popular conception that it is. I think it's useful to consider your audience when trying to explain these. If you're talking to computer people, bringing in the parallels with OCR might be more effective than if you're talking to people who haven't used that sort of technology. If someone has never heard a beatboxer, my voicewriting to steno analogy comparing beat boxing to drumming won't mean much. Try to get an idea of the person's frame of reference first, and then construct your argument.
Belize, the Passport, and the Pixelated Photo
You know those procedural crime shows? Where the first 20 minutes is taken up with chasing seemingly disconnected clues, and then at last one of the detectives has a sudden flash of insight, puts all the pieces together, and knows where to find the suspect? Sometimes those shows do a very misleading thing. They'll get a blurry photo, something captured by a security camera or a helicopter, and the detective will say, "There! Zoom in right there!" Then the screen will shift and you'll get the same photo, except that the formerly blurry house will be nice and clear, and in the window... Is that a man with a gun? The detective will shout, "Zoom in on that flash of light there!" Again, the pixels will smear and redraw themselves, and look! Reflected in the man's glasses is another man in a Panama hat, wearing a mechanic's shirt with "Jerry" embroidered on the pocket and wielding a tire iron!
It's all very exciting, but it's also very wrong. If you take a blurry photograph and blow it up, you don't get a clearer view of its details; you just get a blurrier photograph with larger pixels. This is the essence of lossiness, but in an image rather than a sound. That visual information was lost when the photo was taken, and no amount of enhancement or sharpening tools will ever get it back. Unlike computers, humans are extremely good at inferring ways of filling in the gaps of lossy material, by using lateral clues. If the hard-won blurry photo of the criminal's coffee table just before he set his house on fire depicts a guide book, a passport, and a bottle of SPF 75 suntan lotion, a computer will either throw up its hands and say, "Not enough information found" or it will produce complete gibberish when trying to decipher its details. A human, on the other hand, will see that the letters on the guidebook, while extremely indistinct, seem to have approximately five spaces between them. The first letter is either an R, a P, or a B, and the last one is quite possibly an E. The passport shows that the criminal will be leaving the country, and the suntan lotion indicates that the location gets lots of sun. The savvy human detective paces around a bit and says -- I've got it! Belize! It's the only country that fits the pattern! Then they hop on the next flight and catch the criminal before the final credits.
The humans didn't need to fill in the gaps of the letters on the guide book in order to figure out the word it spelled, because they were able to use a host of non-textual clues to make up for the lossiness. Because computers programmed to recognize text aren't able to draw on all that subsidiary information, humans will always have an advantage in recognizing patterns, drawing inferences, and correcting errors caused by lossy input.
Why Hasn't Microphone Technology Improved More in 100 Years?
This leads me to the subject of speech recognition, which is a much thornier problem than text recognition. The answer to the question is simple: It has. Listen to an old Edison wax disc record and compare it to CD-quality sound produced by today's audio engineers, and you can hardly claim that recording technology has been stagnant. But the question behind my question is this: With all this fantastic audio recording technology, why is it still close to impossible to get quality audio in a room full of more than a handful of people? Make sure every speaker passes around a hand mic, or wears a lapel mic, or goes up to talk into the lectern mic, and you're fine. Put the best and most expensive multi-directional microphone on the market in the center of the room and half a dozen people sitting around a conference table, and you're sunk. Everyone sounds like they're underwater. The guy adjusting his tie near the microphone sounds like a freight train, while the guy speaking clearly and distinctly at the other end of the table sounds like he's gargling with marbles. Even $4,000 hearing aids have this problem. They're simply not as good as human ears (or, more accurately, the human brain) at filtering out meaningless room noises and selectively enhancing the audio of speakers at a distance. That's why onsite CART is often more accurate by several orders of magnitude than remote CART, no matter how much money is spent on microphones and AV equipment. When the bottleneck of sound input is a microphone, it's limited by its sensitivity, its distance from the speaker, and any interference between the two. That's the problem that still hasn't been solved, over a hundred years since the invention of recorded audio.
Having to transcribe terrible audio, guessing at omissions and listening to a five-second clip of fuzzy sound a dozen times before finally figuring out from context what it's actually about has been a real lesson in empathy for me. The frustration I feel in my home office, clicking the foot pedal over and over and listening to imperfect audio many times over, is nothing compared to what my hard of hearing clients feel in the course of their lives every day. They don't have a foot pedal to rewind the last few seconds of conversation, and they're not even getting paid to do this endlessly unrewarding detective work. I suppose I should feel even more sorry for the poor computers, who are trying to deal with substandard audio but don't have the luxury of lateral thinking or contextual clues or the ability to differentiate between soundalike phrases semantically. I've often wanted to rent out an hour of an audiologist's time and hook up the most popular commercial speech recognition software to their test system. I'd be very interested to see how it did. Of course, it could recognize all the tones perfectly well. It might even be all right at the individual words. But unlike a human with hearing loss, who usually does better at guessing words in the contexts of sentences than hearing them on their own, I bet you that the software would do considerably less well, and would probably come out with an audiogram that put them in the range of moderate to severe hearing loss, especially if any of the tests were given with stimulated noise interference mixed into the audio feed. I could be wrong, of course; I haven't yet gotten a chance to actually do this. But I'd be very interested to find out.
Well, this has run rather longer than I thought it would. I guess I'm going to have to do Speech Recognition Part IV next week. 'Til then!
Monday, May 21, 2012
CART Problem Solving: Speech Recognition Part II
CART Problem Solving Series
Sitting Apart
Handling Slides
Classroom Videos
Latin
Superscript and Subscript
Schlepping Gear
Late Hours
Expensive Machines
Communicating Sans Steno
Cash Flow
Lag
Summer
Test Nerves
Ergonomics
Speech Recognition, Part I
Speech Recognition, Part II
Speech Recognition, Part III
Speech Recognition, Part IV
CART PROBLEM: People don't understand why accurate automated speech recognition is incredibly hard for a computer to do.
I hear it all the time: "Hey, where can I buy that software you're using? It's so cool! I want it for my cocktail parties!" or "Wow, is your computer the one writing down what I'm saying? How much did that cost?" or "Oh, so you're the caption operator? Hey, what's that weird-looking machine for? Does it hook into the speech recognition software somehow?"
So many people think that completely automated speaker-independent speech recognition is already here. They think it's been here for years. Why? Well, people have had personal computers for several decades now, and even before computers were affordable, they could see people on television -- on Star Trek, most prominently, but in nearly every other science fiction show as well -- telling their computers to do things, asking them questions, and getting cogent, grammatical answers back. Why was this such a common trope in popular culture? Because typing is boring. It's bad television. Much better to turn exposition into a conversation than to force the viewers to read it off a screen. So in fiction, people have made themselves understood to computers by talking for a long, long time. In real life, they never have, and I think it's pretty plausible that they never will.
Don't get me wrong. I'm not denying the utility of voice recognition software for the purposes of dictation. It's very useful stuff, and it's improved the world -- especially the worlds of people with fine motor disabilities -- immeasurably. But the following statement, while true, turns out to be incredibly counter-intuitive to most people:
There is a huge qualitative difference between the voice of someone speaking to a computer and the voice of someone speaking to other humans.
People who have used voice recognition software with any success know that they need to make sure of several things if they want a clean transcript:
1. They need to speak directly into the microphone. 2. They need to articulate each word clearly. 3. They need to be on their guard for errors, so they can stop and correct them as they occur. 4. They need to eliminate any background interference. 5. The software needs to be trained specifically to their voice.
Even so, not everyone can produce speech that can be recognized by voice recognition software no matter how much training they do (see Speech Recognition Part I for more potential stumbling blocks), and they'll also find that if they try to record themselves speaking normally in typical conversation with other people and then feed that recording through the speech engine, their otherwise tolerable accuracy will drop alarmingly. People don't speak to computers the way they speak to other people. If they did, the computers would never have a chance.
Why is this? The answer is so obvious that many people have never thought about it before: Ordinary human speech is an incredibly lossy format, and we only understand each other as well as we do by making use of semantic, contextual, and gestural clues. But because so much of this takes place subconsciously, we never notice that we're filling in any of those gaps. Like the eye's blind spot, our brain smooths over any drops or jagged edges in our hearing by interpolating auxiliary information from our pattern-matching and language centers, and doesn't even tell us that it's done it.
What does it mean to say that human speech is lossy? People don't tend to talk in reverberant sound stages with crisp, clear diction and an agreed-upon common vocabulary. They mumble. They stutter. They trail off at the ends of sentences. They use unfamiliar words or make up neologisms. They turn their heads from one side to the other, greatly altering the pattern of sound that reaches your ears. A fire truck will zoom by beneath the window, drowning out half a sentence. But most of the time, unless it's really bad, you don't even notice. You're paying attention to the content of the speech, not just the sounds. An interesting demonstration of this is to try to listen to a few minutes of people speaking in a language you don't know and to try to pick out a particular word from the otherwise indiscernible flow. If it's several syllables long, or if it's pronounced in an accent similar to your own, you'll probably be able to do it. But if it's just one or two syllables, you'll have a very difficult time, much harder than if you were listening to the same conversation in your own language -- even if the audio quality was much worse than the other conversation, with tons of static interference and distortion -- and you were trying to latch on to a familiar word instead.
Humans can ignore an awful lot of random fluff and noise if they're able to utilize meaning in speech to compensate for errors in the sound of it. Without meaning, they're in the same state as computers: Reduced to approximations and probabilistic guessing.
No computers can use these semantic clues to steer by, and they won't be able to until they've achieved real, actual artificial intelligence; independent consciousness. It's an open question whether that will ever happen (though I'm putting my money on nope), but it's certainly true that in 50 years of trying to achieve it, computer scientists have made little to no progress. What they have been able to do is to mimic certain abilities that humans have in a way that makes them look as if they're understanding meaning. If you say a word or phrase to a speech recognition engine, it'll be able to sort through vast networks of data, in which connections between sound patterns or words are linked by how common or prominent they are compared to the rest of the network. For example, if you said something that sounded like "wenyu dottanai", it would compare thousands of short speech snippets until it found several matches for sounds that were very close (though never completely identical) to what "wenyu" sounded like in your own individual voice, in your own individual accent. It would probably come up with "when you". "Dottanai", likewise, would go through the same treatment, and the vast majority of similar matches would come up "dot an i"; it's a very common phrase. In most circumstances, it would probably be a pretty good bet.
If you were using this engine to transcribe the optometry interview I transcribed this evening, though, the answer it came up with would be completely wrong. Because this optometrist wasn't talking about dotting an i or crossing a t. He was talking about measuring the optical centers of a patient's vision, which he does by marking dots on a lens over each pupil. It wouldn't be the computer's fault for not getting that; it wouldn't have been paying attention to the conversation. Computers just figure out probabilities for each new chunk of sound. On Google, "dot an eye" gets 11,500 results, compared to 340,000 for "dot an i". Mathematically, it was a pretty safe bet. Semantically, it was completely nonsensical.
It can be hard to convince people of this principle, because a lot of times they still want to believe in the Star Trek model of speech recognition rather than the actual real-life one. So I've come up with a few brief anecdotes and illustrations to help get the message across. It's awfully late, though, so I think I'll have to leave those for Speech Recognition Part III. Here's a teaser, if you're interested:
* Belize, the passport, and the pixelated photo.
* Why hasn't microphone technology improved more in 100 years?
* Why do OCR errors still exist in paper-to-digital text conversion?
* Your physics professor decided to do her lecture in Hungarian today, but don't worry; you'll get a printed translation in two days.
* Trusting big screen open captioning to an automated system is a mistake event organizers only make once.
* Determinism versus the black box.
* The Beatboxer model of voice writing.
Sitting Apart
Handling Slides
Classroom Videos
Latin
Superscript and Subscript
Schlepping Gear
Late Hours
Expensive Machines
Communicating Sans Steno
Cash Flow
Lag
Summer
Test Nerves
Ergonomics
Speech Recognition, Part I
Speech Recognition, Part II
Speech Recognition, Part III
Speech Recognition, Part IV
CART PROBLEM: People don't understand why accurate automated speech recognition is incredibly hard for a computer to do.
I hear it all the time: "Hey, where can I buy that software you're using? It's so cool! I want it for my cocktail parties!" or "Wow, is your computer the one writing down what I'm saying? How much did that cost?" or "Oh, so you're the caption operator? Hey, what's that weird-looking machine for? Does it hook into the speech recognition software somehow?"
So many people think that completely automated speaker-independent speech recognition is already here. They think it's been here for years. Why? Well, people have had personal computers for several decades now, and even before computers were affordable, they could see people on television -- on Star Trek, most prominently, but in nearly every other science fiction show as well -- telling their computers to do things, asking them questions, and getting cogent, grammatical answers back. Why was this such a common trope in popular culture? Because typing is boring. It's bad television. Much better to turn exposition into a conversation than to force the viewers to read it off a screen. So in fiction, people have made themselves understood to computers by talking for a long, long time. In real life, they never have, and I think it's pretty plausible that they never will.
Don't get me wrong. I'm not denying the utility of voice recognition software for the purposes of dictation. It's very useful stuff, and it's improved the world -- especially the worlds of people with fine motor disabilities -- immeasurably. But the following statement, while true, turns out to be incredibly counter-intuitive to most people:
There is a huge qualitative difference between the voice of someone speaking to a computer and the voice of someone speaking to other humans.
People who have used voice recognition software with any success know that they need to make sure of several things if they want a clean transcript:
1. They need to speak directly into the microphone. 2. They need to articulate each word clearly. 3. They need to be on their guard for errors, so they can stop and correct them as they occur. 4. They need to eliminate any background interference. 5. The software needs to be trained specifically to their voice.
Even so, not everyone can produce speech that can be recognized by voice recognition software no matter how much training they do (see Speech Recognition Part I for more potential stumbling blocks), and they'll also find that if they try to record themselves speaking normally in typical conversation with other people and then feed that recording through the speech engine, their otherwise tolerable accuracy will drop alarmingly. People don't speak to computers the way they speak to other people. If they did, the computers would never have a chance.
Why is this? The answer is so obvious that many people have never thought about it before: Ordinary human speech is an incredibly lossy format, and we only understand each other as well as we do by making use of semantic, contextual, and gestural clues. But because so much of this takes place subconsciously, we never notice that we're filling in any of those gaps. Like the eye's blind spot, our brain smooths over any drops or jagged edges in our hearing by interpolating auxiliary information from our pattern-matching and language centers, and doesn't even tell us that it's done it.
What does it mean to say that human speech is lossy? People don't tend to talk in reverberant sound stages with crisp, clear diction and an agreed-upon common vocabulary. They mumble. They stutter. They trail off at the ends of sentences. They use unfamiliar words or make up neologisms. They turn their heads from one side to the other, greatly altering the pattern of sound that reaches your ears. A fire truck will zoom by beneath the window, drowning out half a sentence. But most of the time, unless it's really bad, you don't even notice. You're paying attention to the content of the speech, not just the sounds. An interesting demonstration of this is to try to listen to a few minutes of people speaking in a language you don't know and to try to pick out a particular word from the otherwise indiscernible flow. If it's several syllables long, or if it's pronounced in an accent similar to your own, you'll probably be able to do it. But if it's just one or two syllables, you'll have a very difficult time, much harder than if you were listening to the same conversation in your own language -- even if the audio quality was much worse than the other conversation, with tons of static interference and distortion -- and you were trying to latch on to a familiar word instead.
Humans can ignore an awful lot of random fluff and noise if they're able to utilize meaning in speech to compensate for errors in the sound of it. Without meaning, they're in the same state as computers: Reduced to approximations and probabilistic guessing.
No computers can use these semantic clues to steer by, and they won't be able to until they've achieved real, actual artificial intelligence; independent consciousness. It's an open question whether that will ever happen (though I'm putting my money on nope), but it's certainly true that in 50 years of trying to achieve it, computer scientists have made little to no progress. What they have been able to do is to mimic certain abilities that humans have in a way that makes them look as if they're understanding meaning. If you say a word or phrase to a speech recognition engine, it'll be able to sort through vast networks of data, in which connections between sound patterns or words are linked by how common or prominent they are compared to the rest of the network. For example, if you said something that sounded like "wenyu dottanai", it would compare thousands of short speech snippets until it found several matches for sounds that were very close (though never completely identical) to what "wenyu" sounded like in your own individual voice, in your own individual accent. It would probably come up with "when you". "Dottanai", likewise, would go through the same treatment, and the vast majority of similar matches would come up "dot an i"; it's a very common phrase. In most circumstances, it would probably be a pretty good bet.
If you were using this engine to transcribe the optometry interview I transcribed this evening, though, the answer it came up with would be completely wrong. Because this optometrist wasn't talking about dotting an i or crossing a t. He was talking about measuring the optical centers of a patient's vision, which he does by marking dots on a lens over each pupil. It wouldn't be the computer's fault for not getting that; it wouldn't have been paying attention to the conversation. Computers just figure out probabilities for each new chunk of sound. On Google, "dot an eye" gets 11,500 results, compared to 340,000 for "dot an i". Mathematically, it was a pretty safe bet. Semantically, it was completely nonsensical.
It can be hard to convince people of this principle, because a lot of times they still want to believe in the Star Trek model of speech recognition rather than the actual real-life one. So I've come up with a few brief anecdotes and illustrations to help get the message across. It's awfully late, though, so I think I'll have to leave those for Speech Recognition Part III. Here's a teaser, if you're interested:
* Belize, the passport, and the pixelated photo.
* Why hasn't microphone technology improved more in 100 years?
* Why do OCR errors still exist in paper-to-digital text conversion?
* Your physics professor decided to do her lecture in Hungarian today, but don't worry; you'll get a printed translation in two days.
* Trusting big screen open captioning to an automated system is a mistake event organizers only make once.
* Determinism versus the black box.
* The Beatboxer model of voice writing.
Wednesday, May 16, 2012
Ergonomic update
I've been thinking about ergonomics a lot since my previous post on the subject. In my onsite work, I only spend a few hours at a time in any one position; classes range from 1 to 3 hours, but there are usually breaks every hour or so. Because I'm working from home this summer, though, I've been spending up to 4 hours at a time at my desk doing CART, and then several more hours doing transcript editing, transcription work, or miscellaneous administrative tasks. Unfortunately, it's made me realize how un-ergonomic my setup really is, and how vital it is not to succumb to the temptation to just plant myself in one place and not move from it until the end of the day. My back and shoulders have been warning me that I'd better mix something up soon, or they're really going to start complaining.
I've helped solve my leg fatigue by using foam blocks as foot rests, because they can be shifted around and rolled from back to front under my feet whenever my legs start tightening up. I can also change their height by resting them on their three different edges, and if I want them even higher, I can stack one on top of the other.
One thing that's helped with the planting problem has been to move from the desk to the couch for transcription work, as I mentioned in my previous post, but also to run off of battery power initially, so that when my laptop's battery dies about 1.5 hours later, I'm forced to get up and go into the office for the charger. It might sound silly, but if I don't create those distractions for myself, I have a tendency not to move until my work is done, which is a habit I need to figure out how to break.
Here's another thing I've done, which seems to help a fair amount during the actual remote CART work itself:
As I've said many times, I adore my split-keyboard setup. The only thing that sometimes bugged me, though, is that my desk chair is a little too deep, so in order to reach the keyboard I have to either lean forward (hard on the back), bolster the seat back with several pillows (they tend to slip around and aren't that comfortable), or tilt the tripod forward and the two halves of the steno machine up, which doesn't quite work, because the main arm of the tripod still tends to get in the way. Yesterday I hit on a new solution: I took the armature from my old Gemini 2 machine, put it on a second tripod, and then unscrewed my Infinity Ergonomic from its own armature, putting one half of it on the original tripod and one half on the new one. This allows me to put one tripod on either side of my desk chair, eliminating interference from the tripod's main arm. It's working quite well so far.
I've got a feeling there's one more piece to the puzzle, though. My current desk chair was $50. I bought it at Staples last year. It's really not ideal; there's no lumbar support, it doesn't go high enough, it wobbles a lot, and it's just generally uncomfortable. Every day I spend in it makes me resent it a little bit more. I'm seriously considering buying a fancier chair, but they can be amazingly expensive. Someone on one of the captioner forums I read recommended this one:
It looks great, doesn't it? Ball jointed lumbar support. Headrest. Tons of adjustable settings. But it's $500. Yikes. Do I really want to spend that much money for a chair? Are there cheaper but still ergonomic alternatives out there? If any of you have recommendations, I'd very much like to hear them.
I've helped solve my leg fatigue by using foam blocks as foot rests, because they can be shifted around and rolled from back to front under my feet whenever my legs start tightening up. I can also change their height by resting them on their three different edges, and if I want them even higher, I can stack one on top of the other.
One thing that's helped with the planting problem has been to move from the desk to the couch for transcription work, as I mentioned in my previous post, but also to run off of battery power initially, so that when my laptop's battery dies about 1.5 hours later, I'm forced to get up and go into the office for the charger. It might sound silly, but if I don't create those distractions for myself, I have a tendency not to move until my work is done, which is a habit I need to figure out how to break.
Here's another thing I've done, which seems to help a fair amount during the actual remote CART work itself:
As I've said many times, I adore my split-keyboard setup. The only thing that sometimes bugged me, though, is that my desk chair is a little too deep, so in order to reach the keyboard I have to either lean forward (hard on the back), bolster the seat back with several pillows (they tend to slip around and aren't that comfortable), or tilt the tripod forward and the two halves of the steno machine up, which doesn't quite work, because the main arm of the tripod still tends to get in the way. Yesterday I hit on a new solution: I took the armature from my old Gemini 2 machine, put it on a second tripod, and then unscrewed my Infinity Ergonomic from its own armature, putting one half of it on the original tripod and one half on the new one. This allows me to put one tripod on either side of my desk chair, eliminating interference from the tripod's main arm. It's working quite well so far.
I've got a feeling there's one more piece to the puzzle, though. My current desk chair was $50. I bought it at Staples last year. It's really not ideal; there's no lumbar support, it doesn't go high enough, it wobbles a lot, and it's just generally uncomfortable. Every day I spend in it makes me resent it a little bit more. I'm seriously considering buying a fancier chair, but they can be amazingly expensive. Someone on one of the captioner forums I read recommended this one:
It looks great, doesn't it? Ball jointed lumbar support. Headrest. Tons of adjustable settings. But it's $500. Yikes. Do I really want to spend that much money for a chair? Are there cheaper but still ergonomic alternatives out there? If any of you have recommendations, I'd very much like to hear them.
Monday, May 14, 2012
Speech Recognition Part I
CART Problem Solving Series
Sitting Apart
Handling Slides
Classroom Videos
Latin
Superscript and Subscript
Schlepping Gear
Late Hours
Expensive Machines
Communicating Sans Steno
Cash Flow
Lag
Summer
Test Nerves
Ergonomics
Speech Recognition, Part I
Speech Recognition, Part II
Speech Recognition, Part III
Speech Recognition, Part IV
CART PROBLEM: People claim that someday very soon human captioners will be replaced by automated speech recognition engines.
I've got a lot to say on this subject, but it's already late, so I'm going to leave most of the heavy duty analysis for next week's post. For now, I just want to show you a few examples. I first posted this video in 2010:
It's actual classroom audio from MIT's open courseware project. The video is of me captioning it live, using Eclipse. After posting it, I made a post on this blog about my CART accuracy versus the accuracy of YouTube's autocaptions, which at that time had just been released, with promises of increasing accuracy as the time went on.
Here's the transcript of the original autocaptions from 2010:
for implants support and you know as I haven't said anything about biology those folks didn't really need to be educated and genetics biochemistry more about it so about the to solve those problems and that's because biology as it used to be was not a science that engineers could addressed very well because in order for engineers really analyze study quantitatively develop models at the bill technologies all for the parts there's a lot of requirements on the science that really biology that that's why %um %um the actual mechanisms a function work understood yes you could see that moving your arm requires certain force in of would where certain load we really didn't know what was going on down in the proteins and cells and tissues of the muscles on the books okay but still you could decide maybe an artificial %uh to do this %um %uh in the plan you really know the molecular compliments so how the world he actually manipulate the system he didn't even know what the molecules work they're really underlying yes you couldn't really do the chemistry on the biological all %uh it's very hard to quantify you can even though the parts of the mechanisms how could you get quantitative measurements for them develop models so there's good reason why they're never really was a biological engineering until very recently while he wasn't the science that was released soon right here in dallas so there for the world biomedical engineering bailey although the deposition prompted it just talked about that that necessarily require miles per se but that's changed the good news for you folks is biology this change it's now a science that engineers had it been that to very well
Here's the updated transcript from the new autocaptions produced by YouTube's updated and improved speech recognition engine, circa 2012:
for implants and so forth and you know as i haven't said anything about biology those folks didn't really need to be educated in genetics biochemistry molecular body cell bowed to solve those problem and that's because biology as it used to be was not a science that engineers could address very well because in order for engineers really analyze study quantitatively develop models at the bill technologies alter the parts there's a lot of requirements on the science that really biology that satisfy uh... the actual mechanisms a function work understood yes you can see that moving your arm requires certain force in whip where a certain load we really didn't know what was going on down in the proteins and self-interest use of the muscles in the box creek but still you could decide maybe an artificial do this satanic plan you really know the molecular compliments so how the world to be actually manipulate the system to continue to know what the molecules were that are really underlying s thank you could really do the chemistry and biological molecule uh... it's very hard to quantify since if you need to know the parts of the mechanisms how could you get quantitative measurements for them develop models so there's good reason why there never really was a biological engineering until very recently has filed he wasn't a science that was really suited wrench in your analysis synthesis so there for the world biomedical engineering mainly involved all these application props that i've just talked about but that necessarily require biology per se but that's changed good news for you folks is biology those changes in our science that engineers had unfair connect to very well
It's replaced "%uh" with a more appropriate "uh..." and it gets certain words correct that it got wrong in the original, but it also concocted brand-new phrases like "certain force in whip", "self-interested use of the muscles in the box creek", and "an artificial do this satanic plan" for "certain force and would bear", "cells and tissues of the muscles and the bones. Okay?" and "an artificial bone to do this. An implant".
Here's the actual transcript of the video:
Or implants and so forth. And you notice I haven't said anything about biology. Those folks didn't really need to be educated in genetics, biochemistry, molecular biology, cell biology, to solve those problems. And that's because biology, as it used to be, was not a science that engineers could address very well. Because in order for engineers to really analyze, study, quantitatively develop models, and to build technologies, alter the parts, there's a lot of requirements on the science that really biology didn't satisfy. The actual mechanisms of function weren't understood. Yes, you could see that moving your arm required a certain force, and would bear a certain load, but you really didn't know what was going on down in the proteins and cells and tissues of the muscles and the bones. Okay? But still you could design maybe an artificial bone to do this. An implant. You didn't really know the molecular components, so how in the world could you actually manipulate the system, if you didn't even know what the molecules were, that are really underlying this? Okay? You couldn't really do the chemistry on the biological molecules. It's very hard to quantify, since if you didn't even know the parts and the mechanisms, how could you get quantitative measurements for them, develop models? So there's good reason why there never really was a biological engineering until very recently, because biology wasn't a science that was really suited for engineering analysis and engineering synthesis, and so therefore the world of biomedical engineering mainly involved all these application problems that I just talked about, that didn't necessarily require biology per se. But that's changed. Okay? The good news for you folks is biology has changed. It's now a science that engineers can in fact connect to very well.
If you like, go read my original post on the difference between technical accuracy and semantic accuracy. In that post, I determined that, counting only words that the autotranscription got wrong or omitted (not penalizing for extra words added, unlike on steno certification exams), the technical accuracy rate of the autotranscription was 71.24% (213/299 words correct). Two years and much supposed improvement later, the new transcription's technical accuracy rate is... Drum roll, please...
78.59% (235/299 words correct)
Now, I think it's important to point out that this video is essentially ideal in practically every respect for an autocaptioning engine.
* The rate of speech is quite slow. Speech engines tend to fail when the rate gets above 180 WPM or so.
* The speaker has excellent diction. Mumbling and swallowed syllables can wreak havoc with a speech engine.
* The speaker has an American accent. All the most advanced speech engines are calibrated to American accents, because they're all produced by American companies. There are programs that claim to understand various dialects of non-American-accented English (e.g. Scottish, English, Australian), but they're still many generations behind the cutting edge, because they got such a late start in development.
* The speaker is male. Speech engines have a harder time understanding female voices than male ones.
* The speaker is using a high number of fairly long but not excessively uncommon words. Speech engines are better at understanding long words (like "synthesis", "artificial", or "biochemistry") than short ones (like "would" or "weren't"), because they're phonologically more distinct from one another.
* The sound quality is excellent and there is no background noise or music in the video. Humans are able to listen through noise and pick out meaning from cacophony to a degree completely unmatched by any computer software. Even a speech engine that's performing quite well will fall completely to pieces when it's forced to listen through a small amount of static or a quiet instrumental soundtrack.
So if even a video like this can only attain a 78% technical accuracy rating, after two years of high-powered development from the captioning engine produced by one of the most technologically advanced companies in the world... Are you worried that it's going to supplant my 99.67% accuracy rating in another two years? Or ten years? Or 20? And that's just talking about the technical accuracy; I haven't even begun to get into the semantic accuracy. I'll have more to say on this subject in the next installment.
Sitting Apart
Handling Slides
Classroom Videos
Latin
Superscript and Subscript
Schlepping Gear
Late Hours
Expensive Machines
Communicating Sans Steno
Cash Flow
Lag
Summer
Test Nerves
Ergonomics
Speech Recognition, Part I
Speech Recognition, Part II
Speech Recognition, Part III
Speech Recognition, Part IV
CART PROBLEM: People claim that someday very soon human captioners will be replaced by automated speech recognition engines.
I've got a lot to say on this subject, but it's already late, so I'm going to leave most of the heavy duty analysis for next week's post. For now, I just want to show you a few examples. I first posted this video in 2010:
It's actual classroom audio from MIT's open courseware project. The video is of me captioning it live, using Eclipse. After posting it, I made a post on this blog about my CART accuracy versus the accuracy of YouTube's autocaptions, which at that time had just been released, with promises of increasing accuracy as the time went on.
Here's the transcript of the original autocaptions from 2010:
for implants support and you know as I haven't said anything about biology those folks didn't really need to be educated and genetics biochemistry more about it so about the to solve those problems and that's because biology as it used to be was not a science that engineers could addressed very well because in order for engineers really analyze study quantitatively develop models at the bill technologies all for the parts there's a lot of requirements on the science that really biology that that's why %um %um the actual mechanisms a function work understood yes you could see that moving your arm requires certain force in of would where certain load we really didn't know what was going on down in the proteins and cells and tissues of the muscles on the books okay but still you could decide maybe an artificial %uh to do this %um %uh in the plan you really know the molecular compliments so how the world he actually manipulate the system he didn't even know what the molecules work they're really underlying yes you couldn't really do the chemistry on the biological all %uh it's very hard to quantify you can even though the parts of the mechanisms how could you get quantitative measurements for them develop models so there's good reason why they're never really was a biological engineering until very recently while he wasn't the science that was released soon right here in dallas so there for the world biomedical engineering bailey although the deposition prompted it just talked about that that necessarily require miles per se but that's changed the good news for you folks is biology this change it's now a science that engineers had it been that to very well
Here's the updated transcript from the new autocaptions produced by YouTube's updated and improved speech recognition engine, circa 2012:
for implants and so forth and you know as i haven't said anything about biology those folks didn't really need to be educated in genetics biochemistry molecular body cell bowed to solve those problem and that's because biology as it used to be was not a science that engineers could address very well because in order for engineers really analyze study quantitatively develop models at the bill technologies alter the parts there's a lot of requirements on the science that really biology that satisfy uh... the actual mechanisms a function work understood yes you can see that moving your arm requires certain force in whip where a certain load we really didn't know what was going on down in the proteins and self-interest use of the muscles in the box creek but still you could decide maybe an artificial do this satanic plan you really know the molecular compliments so how the world to be actually manipulate the system to continue to know what the molecules were that are really underlying s thank you could really do the chemistry and biological molecule uh... it's very hard to quantify since if you need to know the parts of the mechanisms how could you get quantitative measurements for them develop models so there's good reason why there never really was a biological engineering until very recently has filed he wasn't a science that was really suited wrench in your analysis synthesis so there for the world biomedical engineering mainly involved all these application props that i've just talked about but that necessarily require biology per se but that's changed good news for you folks is biology those changes in our science that engineers had unfair connect to very well
It's replaced "%uh" with a more appropriate "uh..." and it gets certain words correct that it got wrong in the original, but it also concocted brand-new phrases like "certain force in whip", "self-interested use of the muscles in the box creek", and "an artificial do this satanic plan" for "certain force and would bear", "cells and tissues of the muscles and the bones. Okay?" and "an artificial bone to do this. An implant".
Here's the actual transcript of the video:
Or implants and so forth. And you notice I haven't said anything about biology. Those folks didn't really need to be educated in genetics, biochemistry, molecular biology, cell biology, to solve those problems. And that's because biology, as it used to be, was not a science that engineers could address very well. Because in order for engineers to really analyze, study, quantitatively develop models, and to build technologies, alter the parts, there's a lot of requirements on the science that really biology didn't satisfy. The actual mechanisms of function weren't understood. Yes, you could see that moving your arm required a certain force, and would bear a certain load, but you really didn't know what was going on down in the proteins and cells and tissues of the muscles and the bones. Okay? But still you could design maybe an artificial bone to do this. An implant. You didn't really know the molecular components, so how in the world could you actually manipulate the system, if you didn't even know what the molecules were, that are really underlying this? Okay? You couldn't really do the chemistry on the biological molecules. It's very hard to quantify, since if you didn't even know the parts and the mechanisms, how could you get quantitative measurements for them, develop models? So there's good reason why there never really was a biological engineering until very recently, because biology wasn't a science that was really suited for engineering analysis and engineering synthesis, and so therefore the world of biomedical engineering mainly involved all these application problems that I just talked about, that didn't necessarily require biology per se. But that's changed. Okay? The good news for you folks is biology has changed. It's now a science that engineers can in fact connect to very well.
If you like, go read my original post on the difference between technical accuracy and semantic accuracy. In that post, I determined that, counting only words that the autotranscription got wrong or omitted (not penalizing for extra words added, unlike on steno certification exams), the technical accuracy rate of the autotranscription was 71.24% (213/299 words correct). Two years and much supposed improvement later, the new transcription's technical accuracy rate is... Drum roll, please...
78.59% (235/299 words correct)
Now, I think it's important to point out that this video is essentially ideal in practically every respect for an autocaptioning engine.
* The rate of speech is quite slow. Speech engines tend to fail when the rate gets above 180 WPM or so.
* The speaker has excellent diction. Mumbling and swallowed syllables can wreak havoc with a speech engine.
* The speaker has an American accent. All the most advanced speech engines are calibrated to American accents, because they're all produced by American companies. There are programs that claim to understand various dialects of non-American-accented English (e.g. Scottish, English, Australian), but they're still many generations behind the cutting edge, because they got such a late start in development.
* The speaker is male. Speech engines have a harder time understanding female voices than male ones.
* The speaker is using a high number of fairly long but not excessively uncommon words. Speech engines are better at understanding long words (like "synthesis", "artificial", or "biochemistry") than short ones (like "would" or "weren't"), because they're phonologically more distinct from one another.
* The sound quality is excellent and there is no background noise or music in the video. Humans are able to listen through noise and pick out meaning from cacophony to a degree completely unmatched by any computer software. Even a speech engine that's performing quite well will fall completely to pieces when it's forced to listen through a small amount of static or a quiet instrumental soundtrack.
So if even a video like this can only attain a 78% technical accuracy rating, after two years of high-powered development from the captioning engine produced by one of the most technologically advanced companies in the world... Are you worried that it's going to supplant my 99.67% accuracy rating in another two years? Or ten years? Or 20? And that's just talking about the technical accuracy; I haven't even begun to get into the semantic accuracy. I'll have more to say on this subject in the next installment.
Monday, May 7, 2012
CART Problem Solving: Ergonomics
CART Problem Solving Series
Sitting Apart
Handling Slides
Classroom Videos
Latin
Superscript and Subscript
Schlepping Gear
Late Hours
Expensive Machines
Communicating Sans Steno
Cash Flow
Lag
Summer
Test Nerves
Ergonomics
Speech Recognition, Part I
Speech Recognition, Part II
Speech Recognition, Part III
Speech Recognition, Part IV
CART PROBLEM: Repetitive stress injuries can shorten a CART provider's career
Today I bought a 4-foot body pillow from Amazon. Why am I telling you this? What does it have to do with being a CART provider? Well, I've noticed recently that I've developed the habit of falling asleep with my arm underneath my head, and sometimes when I wake up my fingers tingle slightly. When it happened again this morning, I knew I needed to do something about it. If I have a body pillow to hang onto at night, I'll be less tempted to sleep on my arm, and hopefully that'll eliminate the worry that I might eventually start doing damage to the nerves in my arms and fingers overnight.
Ergonomics are no joke. Steno, by and large, is a much more ergonomic technology than qwerty typing (you can read my What Is Steno Good For? post about it), but anyone who does anything with their hands for several consecutive hours a day risks damaging them. When I started steno school, I was on a Stentura 400 SRT. 40 hours of qwerty typing every week for my day job at an offline captioning company, plus 10 hours at school on the Stentura, plus at least 10 to 15 additional hours practicing and doing weekend transcription work meant that my arms were screaming by the end of nearly every day. After a year, I knew I had to make a change, or my career would be over before it began. I bought a Gemini 2, and all the pain vanished in an instant. As soon as I felt a twinge, I'd make a slight adjustment to the angle and the fatigued muscles would get a rest while other muscles kicked in to relieve them. It was magical. I've since had a Revolution Grand and an Infinity Ergonomic (my current machine), and I haven't had any trouble since. I've been able to write up to 7 hours at a stretch without a break, and still the pain hasn't recurred. It's fantastic.
But I'm not here to sell you on the advantages of split-keyboard steno machines (though I'd encourage you to try one if you can; writing with both wrists parallel to the ground is an uncomfortable and unnatural position, but let your right hand tilt just a few degrees to the right and your left hand a few degrees to the left and feel how much difference it makes to your whole posture. You might be surprised.) I want to list a few things that have helped me make my work life more ergonomic apart from the steno machine. If you've got more ideas, please feel free to write them in the comments. Working through pain out of a misguided macho idea of toughness isn't smart. It can worsen your accuracy and overall endurance, make you feel a subliminal resentment towards your work, and even cut short an otherwise flourishing career. Pay attention to what your body tells you and adjust your environment accordingly. It's never a wasted effort.
* If you use a laptop on your lap, the built-in trackpad is probably fine to use, but if you put it on a desk, consider getting an external mouse and keyboard. Desk heights that are comfortable for the eyes are almost always very bad for the arms, hands, and shoulders. If you find yourself with an aching or knotted-up neck, shoulder, or wrist after using a laptop on a desk, try an external mouse -- either with a mousepad on your lap or on a pull-out keyboard tray significantly lower than the level of the desk. It'll make a huge difference. You don't need an expensive laptop port; I stuck an external USB port to my glass desk with double-sided tape, which always has a mouse, keyboard, scanner, and foot pedal plugged into it. That port outputs to a single USB lead, which I plug into the back of my computer whenever I sit down to my desk. It's much easier than manually connecting a mouse and keyboard each time.
* If you can, mix up your working positions. After a four-hour remote CART job at your desk, take your transcript editing to the couch, a comfortable chair, on the floor leaning against a wall, or even in bed. The more different positions you put yourself into over the course of the day, the less likely you are to freeze into any given one of them.
* If you use a backpack to carry your gear, like I do, always make sure to get one with chest and belly straps, and don't forget to buckle them each time you wear the bag. Otherwise your shoulders carry the lion's share of the weight, and they won't thank you for it that evening.
* This is probably only helpful for transcriptionists, but I've found since I started using Plover that my legs are less sore after a long session of transcription work, because I don't have to keep them poised to press the foot pedal whenever I need to rewind a section of audio; Plover allows me to send the rewind command right from the steno keyboard. Even if you don't have Plover, you can get a Kinesis Savant Elite single pedal rather than one of the big three-button floor pedals. It's lightweight enough that you can hold it under your armpit while writing rather than having to click it with your foot all the time. Don't laugh! I used it as an armpit pedal for a good two years, and it saved me from a lot of unnecessary leg pain.
* If you're CARTing a high speed job and your hands start cramping or getting sore, take advantage of the first available break to shake them out, roll your wrists back and forth, and then squeeze and release your fingers several times in a row. This helps to relax the muscles, restore blood flow, and subtly indicate to the person you're captioning that it might be nice if they slowed their pace just a touch. Not everyone picks up on it, but sometimes people get the hint.
Sitting Apart
Handling Slides
Classroom Videos
Latin
Superscript and Subscript
Schlepping Gear
Late Hours
Expensive Machines
Communicating Sans Steno
Cash Flow
Lag
Summer
Test Nerves
Ergonomics
Speech Recognition, Part I
Speech Recognition, Part II
Speech Recognition, Part III
Speech Recognition, Part IV
CART PROBLEM: Repetitive stress injuries can shorten a CART provider's career
Today I bought a 4-foot body pillow from Amazon. Why am I telling you this? What does it have to do with being a CART provider? Well, I've noticed recently that I've developed the habit of falling asleep with my arm underneath my head, and sometimes when I wake up my fingers tingle slightly. When it happened again this morning, I knew I needed to do something about it. If I have a body pillow to hang onto at night, I'll be less tempted to sleep on my arm, and hopefully that'll eliminate the worry that I might eventually start doing damage to the nerves in my arms and fingers overnight.
Ergonomics are no joke. Steno, by and large, is a much more ergonomic technology than qwerty typing (you can read my What Is Steno Good For? post about it), but anyone who does anything with their hands for several consecutive hours a day risks damaging them. When I started steno school, I was on a Stentura 400 SRT. 40 hours of qwerty typing every week for my day job at an offline captioning company, plus 10 hours at school on the Stentura, plus at least 10 to 15 additional hours practicing and doing weekend transcription work meant that my arms were screaming by the end of nearly every day. After a year, I knew I had to make a change, or my career would be over before it began. I bought a Gemini 2, and all the pain vanished in an instant. As soon as I felt a twinge, I'd make a slight adjustment to the angle and the fatigued muscles would get a rest while other muscles kicked in to relieve them. It was magical. I've since had a Revolution Grand and an Infinity Ergonomic (my current machine), and I haven't had any trouble since. I've been able to write up to 7 hours at a stretch without a break, and still the pain hasn't recurred. It's fantastic.
But I'm not here to sell you on the advantages of split-keyboard steno machines (though I'd encourage you to try one if you can; writing with both wrists parallel to the ground is an uncomfortable and unnatural position, but let your right hand tilt just a few degrees to the right and your left hand a few degrees to the left and feel how much difference it makes to your whole posture. You might be surprised.) I want to list a few things that have helped me make my work life more ergonomic apart from the steno machine. If you've got more ideas, please feel free to write them in the comments. Working through pain out of a misguided macho idea of toughness isn't smart. It can worsen your accuracy and overall endurance, make you feel a subliminal resentment towards your work, and even cut short an otherwise flourishing career. Pay attention to what your body tells you and adjust your environment accordingly. It's never a wasted effort.
* If you use a laptop on your lap, the built-in trackpad is probably fine to use, but if you put it on a desk, consider getting an external mouse and keyboard. Desk heights that are comfortable for the eyes are almost always very bad for the arms, hands, and shoulders. If you find yourself with an aching or knotted-up neck, shoulder, or wrist after using a laptop on a desk, try an external mouse -- either with a mousepad on your lap or on a pull-out keyboard tray significantly lower than the level of the desk. It'll make a huge difference. You don't need an expensive laptop port; I stuck an external USB port to my glass desk with double-sided tape, which always has a mouse, keyboard, scanner, and foot pedal plugged into it. That port outputs to a single USB lead, which I plug into the back of my computer whenever I sit down to my desk. It's much easier than manually connecting a mouse and keyboard each time.
* If you can, mix up your working positions. After a four-hour remote CART job at your desk, take your transcript editing to the couch, a comfortable chair, on the floor leaning against a wall, or even in bed. The more different positions you put yourself into over the course of the day, the less likely you are to freeze into any given one of them.
* If you use a backpack to carry your gear, like I do, always make sure to get one with chest and belly straps, and don't forget to buckle them each time you wear the bag. Otherwise your shoulders carry the lion's share of the weight, and they won't thank you for it that evening.
* This is probably only helpful for transcriptionists, but I've found since I started using Plover that my legs are less sore after a long session of transcription work, because I don't have to keep them poised to press the foot pedal whenever I need to rewind a section of audio; Plover allows me to send the rewind command right from the steno keyboard. Even if you don't have Plover, you can get a Kinesis Savant Elite single pedal rather than one of the big three-button floor pedals. It's lightweight enough that you can hold it under your armpit while writing rather than having to click it with your foot all the time. Don't laugh! I used it as an armpit pedal for a good two years, and it saved me from a lot of unnecessary leg pain.
* If you're CARTing a high speed job and your hands start cramping or getting sore, take advantage of the first available break to shake them out, roll your wrists back and forth, and then squeeze and release your fingers several times in a row. This helps to relax the muscles, restore blood flow, and subtly indicate to the person you're captioning that it might be nice if they slowed their pace just a touch. Not everyone picks up on it, but sometimes people get the hint.
Monday, April 30, 2012
CART Problem Solving: Test Nerves
CART Problem Solving Series
Sitting Apart
Handling Slides
Classroom Videos
Latin
Superscript and Subscript
Schlepping Gear
Late Hours
Expensive Machines
Communicating Sans Steno
Cash Flow
Lag
Summer
Test Nerves
Ergonomics
Speech Recognition, Part I
Speech Recognition, Part II
Speech Recognition, Part III
Speech Recognition, Part IV
CART PROBLEM: Nerves can make it harder to pass steno tests
Like many of my steno brethren, I'm signed up to take a National Court Reporters Association Certification Test this Saturday. I've already passed the only test for CART providers (the CCP, which is five minutes of dictation at 180 words per minute at 96% realtime accuracy -- a very low bar to clear, considering that in my daily CART work I strive for at least 200 to 220 words per minute at 99.9% accuracy. To put that in perspective, the average paragraph is 100 words long, so a 96% accuracy rating means 4 errors per paragraph, whereas a 99.9% accuracy rating is one error every 10 paragraphs.) I also hold CBC and CRR certification, but the CBC is just a written test once you've passed your CCP, and the CRR is granted automatically when a stenographer holds either the CBC or CCP plus the RPR, which is a non-realtime test (after the five minute take, testers are given an hour to clean up the transcript) at speeds ranging between 180 to 225 words per minute. For more information about NCRA certifications, check out my FAQ.
I spoke a little bit about my experience with steno testing in my article How I Got Out of Steno School. In the fine print at the bottom of that article, I mention that I began my apprenticeship as a CART provider, subcontracting under an experienced provider, in 2007, while I didn't pass the CCP until 2009. Why? Well, a number of reasons. First, the CCP test is given only twice a year. Secondly, I tend to have terrible test nerves that fortunately have never kicked in at an actual CART job; they seem to be exclusively bound up with the test-taking process, for which I have to say I'm grateful. Beats the alternative by a mile, at least. So on the first two CCP tests I took, I crashed and burned out of sheer nerves. The third one I'm pretty sure I passed, but I was so light-headed at the thought that I'd done it I didn't follow one of the formatting rules and failed on a technicality. The fourth one I finally passed officially, and then I decided to go for the next step up. Unfortunately the NCRA only offers one level of CART exams, so I had to set my sights on the court reporting exams. This was challenging, because I've never done court reporting, so I don't have any brief forms for common phrases like "to the best of your recollection" or "the preponderance of the evidence". They're so rarely encountered in my academic work that when they come up at all I just write them out.
I passed the RPR, the prerequisite for the RMR (currently the hardest skills test offered by the NCRA) on my first try; it was substantially easier than even the quite easy CCP, as I'd expected. The RMR has proven to be a tougher nut to crack. The 200 literary was a snap; I passed it on the first go with my eyes closed. (Actually, I recommend taking steno tests with your eyes closed. It cuts down on distractions.) The 240 Jury Charge and 260 Q&A have still stymied me up to this point, but I'm hoping to nab at least one of them (most likely the 240 Jury Charge) this weekend. The thing is, I have the speed. I'll be flying along at 240 WPM, note-perfect, easy as anything, and then all of a sudden my fingers will make a slight stumble, I'll misstroke a word, and everything derails. See, I'm a realtime writer by training and by inclination, and it goes against all my well-honed CART provider training to leave a misstroke on the screen. Whenever I see a wrong word during my day job, my top priority is to fix it so that the error doesn't confuse my clients. They should never have to read through my sloppy mistakes. If necessary, I'll cut out a non-essential word -- like "y'know" -- in order to make up time after fixing the error. My ultimate goal is 100% verbatim accuracy, but if the choice comes down to putting down a steno stroke (no matter how incorrectly it translates) for every single word in the transcript versus making my realtime feed as readable as possible, readability wins, no contest.
So it's tough for me to try to bend my brain back into steno school mode, where all that matters is whether I can read through the slop during the transcription phase of the test, where I can leave out punctuation willy-nilly, knowing that I'll put it in later (an abomination for any self-respecting CART provider), and of course the ever-present threat of test nerves. Why should it be so much harder to recover from an error gracefully during a test than it is during an actual CART job? For one reason, test dictation is highly unnatural. Every word is spoken at precisely even intervals, like bullets out of a machine gun. Fall a beat behind and you've got to write precisely twice as fast to get back on top of the word without dropping anything. Actual speech isn't like that. Sometimes people put on bursts of speed where they're talking at 280 words per minute, but then they'll take a short pause to arrange their notes or take a sip of water, and I'll use those few seconds to write out the 8-to-10-word buffer I always keep in my verbal memory. When people speak, they slow down for emphasis, speed up when they get excited, slow down when they're thinking, speed up when they're reading, and it becomes this push-and-pull experience, like riding on a camel -- coasting through the fast sections and catching your breath on the slow ones. Steno tests, on the other hand, are like the mechanical rabbit at a dog track. You've got no choice but to sprint at top speed for the whole five minutes, with absolutely no room for variation in your pace.
I've been practicing RMR mp3s and gradually helping myself learn how to recover from errors without correcting them or getting thrown off my pace. A great web-based app called Beeminder has kept me honest about my practice sessions each day. The largest part of all this just consists of convincing myself and my muscle memory that I actually do have the speed for this stuff. When I'm able to relax my fingers and just stroke every word loosely and naturally, it feels remarkably slow, and I get it all down no problem. It's only when I make that first mistake and start tensing up, pounding the keys and flailing my arms like a T-Rex, that the audio suddenly feels like it's been flipped to double speed. There are just three vital things to remember: Keep writing, keep breathing, and don't look back.
Sitting Apart
Handling Slides
Classroom Videos
Latin
Superscript and Subscript
Schlepping Gear
Late Hours
Expensive Machines
Communicating Sans Steno
Cash Flow
Lag
Summer
Test Nerves
Ergonomics
Speech Recognition, Part I
Speech Recognition, Part II
Speech Recognition, Part III
Speech Recognition, Part IV
CART PROBLEM: Nerves can make it harder to pass steno tests
Like many of my steno brethren, I'm signed up to take a National Court Reporters Association Certification Test this Saturday. I've already passed the only test for CART providers (the CCP, which is five minutes of dictation at 180 words per minute at 96% realtime accuracy -- a very low bar to clear, considering that in my daily CART work I strive for at least 200 to 220 words per minute at 99.9% accuracy. To put that in perspective, the average paragraph is 100 words long, so a 96% accuracy rating means 4 errors per paragraph, whereas a 99.9% accuracy rating is one error every 10 paragraphs.) I also hold CBC and CRR certification, but the CBC is just a written test once you've passed your CCP, and the CRR is granted automatically when a stenographer holds either the CBC or CCP plus the RPR, which is a non-realtime test (after the five minute take, testers are given an hour to clean up the transcript) at speeds ranging between 180 to 225 words per minute. For more information about NCRA certifications, check out my FAQ.
I spoke a little bit about my experience with steno testing in my article How I Got Out of Steno School. In the fine print at the bottom of that article, I mention that I began my apprenticeship as a CART provider, subcontracting under an experienced provider, in 2007, while I didn't pass the CCP until 2009. Why? Well, a number of reasons. First, the CCP test is given only twice a year. Secondly, I tend to have terrible test nerves that fortunately have never kicked in at an actual CART job; they seem to be exclusively bound up with the test-taking process, for which I have to say I'm grateful. Beats the alternative by a mile, at least. So on the first two CCP tests I took, I crashed and burned out of sheer nerves. The third one I'm pretty sure I passed, but I was so light-headed at the thought that I'd done it I didn't follow one of the formatting rules and failed on a technicality. The fourth one I finally passed officially, and then I decided to go for the next step up. Unfortunately the NCRA only offers one level of CART exams, so I had to set my sights on the court reporting exams. This was challenging, because I've never done court reporting, so I don't have any brief forms for common phrases like "to the best of your recollection" or "the preponderance of the evidence". They're so rarely encountered in my academic work that when they come up at all I just write them out.
I passed the RPR, the prerequisite for the RMR (currently the hardest skills test offered by the NCRA) on my first try; it was substantially easier than even the quite easy CCP, as I'd expected. The RMR has proven to be a tougher nut to crack. The 200 literary was a snap; I passed it on the first go with my eyes closed. (Actually, I recommend taking steno tests with your eyes closed. It cuts down on distractions.) The 240 Jury Charge and 260 Q&A have still stymied me up to this point, but I'm hoping to nab at least one of them (most likely the 240 Jury Charge) this weekend. The thing is, I have the speed. I'll be flying along at 240 WPM, note-perfect, easy as anything, and then all of a sudden my fingers will make a slight stumble, I'll misstroke a word, and everything derails. See, I'm a realtime writer by training and by inclination, and it goes against all my well-honed CART provider training to leave a misstroke on the screen. Whenever I see a wrong word during my day job, my top priority is to fix it so that the error doesn't confuse my clients. They should never have to read through my sloppy mistakes. If necessary, I'll cut out a non-essential word -- like "y'know" -- in order to make up time after fixing the error. My ultimate goal is 100% verbatim accuracy, but if the choice comes down to putting down a steno stroke (no matter how incorrectly it translates) for every single word in the transcript versus making my realtime feed as readable as possible, readability wins, no contest.
So it's tough for me to try to bend my brain back into steno school mode, where all that matters is whether I can read through the slop during the transcription phase of the test, where I can leave out punctuation willy-nilly, knowing that I'll put it in later (an abomination for any self-respecting CART provider), and of course the ever-present threat of test nerves. Why should it be so much harder to recover from an error gracefully during a test than it is during an actual CART job? For one reason, test dictation is highly unnatural. Every word is spoken at precisely even intervals, like bullets out of a machine gun. Fall a beat behind and you've got to write precisely twice as fast to get back on top of the word without dropping anything. Actual speech isn't like that. Sometimes people put on bursts of speed where they're talking at 280 words per minute, but then they'll take a short pause to arrange their notes or take a sip of water, and I'll use those few seconds to write out the 8-to-10-word buffer I always keep in my verbal memory. When people speak, they slow down for emphasis, speed up when they get excited, slow down when they're thinking, speed up when they're reading, and it becomes this push-and-pull experience, like riding on a camel -- coasting through the fast sections and catching your breath on the slow ones. Steno tests, on the other hand, are like the mechanical rabbit at a dog track. You've got no choice but to sprint at top speed for the whole five minutes, with absolutely no room for variation in your pace.
I've been practicing RMR mp3s and gradually helping myself learn how to recover from errors without correcting them or getting thrown off my pace. A great web-based app called Beeminder has kept me honest about my practice sessions each day. The largest part of all this just consists of convincing myself and my muscle memory that I actually do have the speed for this stuff. When I'm able to relax my fingers and just stroke every word loosely and naturally, it feels remarkably slow, and I get it all down no problem. It's only when I make that first mistake and start tensing up, pounding the keys and flailing my arms like a T-Rex, that the audio suddenly feels like it's been flipped to double speed. There are just three vital things to remember: Keep writing, keep breathing, and don't look back.
Monday, April 23, 2012
CART Problem Solving: Summer
CART Problem Solving Series
Sitting Apart
Handling Slides
Classroom Videos
Latin
Superscript and Subscript
Schlepping Gear
Late Hours
Expensive Machines
Communicating Sans Steno
Cash Flow
Lag
Summer
Test Nerves
Ergonomics
Speech Recognition, Part I
Speech Recognition, Part II
Speech Recognition, Part III
Speech Recognition, Part IV
CART PROBLEM: CART work is harder to find in summer than in the rest of the year.
Most full-time CART providers work in universities, which tends to be highly seasonal work. Academic CART work is great because it means steady weekly hours over the course of a semester, but when the semester is over and the summer begins, it can be tricky to fill the gap until the school year starts up again in the fall. There's occasionally some conference captioning to be had, but it's hard to keep a roster of conference captioning clients on deck, since so many opportunities have to be turned down while the school year is busiest. There's also the lucky break of getting summer classes to caption, which I'm happy to say is going to be my solution to the problem this year. I just got word that my medical student passed her classes and will be passing on to her second semester. Unlike most universities, medical schools tend to run year-round, so even though my onsite work has dropped off drastically (I'll have only one weekly onsite class), I'll have plenty of work to keep me busy, which is a real treat. In past years it hasn't been so easy, and I have no guarantee of what next summer will bring, but I'll definitely enjoy it while it lasts. When steady summer academic work is thin on the ground, what are the potential alternatives? Here's a couple I can think of:
* Get daily remote work (usually a mix of academic CART and employee CART, with occasional public meetings thrown in) from a big national company. These jobs are usually scheduled only a day or so in advance and there's a lot of competition for them, so you've got to be glued to your computer on Monday if you want to schedule anything for Tuesday. Even so you'll usually have a few false starts before getting any assignments. These sorts of jobs can be good if you live somewhere with a relatively low cost of living, since rates for remote work and onsite work might be close to par even with the national firm's cut off the top, but if you live somewhere expensive like New York City, you'll be working at a significant discount. The jobs are also usually fairly short -- an hour or two per day, typically -- and the audio quality is not always of the best. But even an hour here and an hour there is a lot better than nothing, and at least there's always a nice variety of work. The biggest disadvantage to relying on this sort of work is that you always feel like you're hustling. You spend a lot of time camping out at your computer waiting for jobs to come in, and you can't plan out your schedule each week, since it changes from day to day.
* Get daily offline transcription work. This pays much less than remote CART work, but it has the advantage of flexible scheduling; you can get it in the morning, go for an afternoon walk in the park, and turn it in that evening, unlike the remote work, which has to be done at the particular hour appointed by the firm. Still, it takes an awful lot of transcription work to make ends meet, and again the audio quality can sometimes be a little dodgy.
* Do depositions. I think this is the option preferred by most of my colleagues, but I've personally never done a deposition, so I wouldn't even know where to start. I'm not a notary public and I've never prepared a legal transcript, so if you need advice on the details of this one, I'm not really the one to ask.
* Do quick-turnaround sports transcriptions. One of my colleagues does this sort of work, going around to various games and transcribing post-game interviews with athletes for journalists. I know nothing about sports, so this is definitely not the job for me, but it sounds like a fun summer interlude for people who are into that sort of thing.
* Live off savings. One the one hand, you get lots of free time to go out in the sunshine and enjoy the summer. On the other hand, watching that bank balance drip down day by day with nothing to fill it back up until September can be a pretty unnerving feeling. Still, if you plan ahead well enough and you've got a fairly solid margin for error (estimated taxes, unexpected expenses, overbudget vacations, late checks in the beginning of the school year), you'll be sitting pretty. I know a few CART providers who go this route, and while I don't think I'll be planning to kick back all summer any time soon, I do intend to take a few weeks of unpaid vacation this summer to see my brother in Seattle and my parents in Montana, which will definitely be a nice respite from captioning medical school.
* Mix and match. A combination of any or all of the above. A little here, a little there, and maybe some other paying piecework to spackle in the gaps. Anything I missed? What do you do?
Sitting Apart
Handling Slides
Classroom Videos
Latin
Superscript and Subscript
Schlepping Gear
Late Hours
Expensive Machines
Communicating Sans Steno
Cash Flow
Lag
Summer
Test Nerves
Ergonomics
Speech Recognition, Part I
Speech Recognition, Part II
Speech Recognition, Part III
Speech Recognition, Part IV
CART PROBLEM: CART work is harder to find in summer than in the rest of the year.
Most full-time CART providers work in universities, which tends to be highly seasonal work. Academic CART work is great because it means steady weekly hours over the course of a semester, but when the semester is over and the summer begins, it can be tricky to fill the gap until the school year starts up again in the fall. There's occasionally some conference captioning to be had, but it's hard to keep a roster of conference captioning clients on deck, since so many opportunities have to be turned down while the school year is busiest. There's also the lucky break of getting summer classes to caption, which I'm happy to say is going to be my solution to the problem this year. I just got word that my medical student passed her classes and will be passing on to her second semester. Unlike most universities, medical schools tend to run year-round, so even though my onsite work has dropped off drastically (I'll have only one weekly onsite class), I'll have plenty of work to keep me busy, which is a real treat. In past years it hasn't been so easy, and I have no guarantee of what next summer will bring, but I'll definitely enjoy it while it lasts. When steady summer academic work is thin on the ground, what are the potential alternatives? Here's a couple I can think of:
* Get daily remote work (usually a mix of academic CART and employee CART, with occasional public meetings thrown in) from a big national company. These jobs are usually scheduled only a day or so in advance and there's a lot of competition for them, so you've got to be glued to your computer on Monday if you want to schedule anything for Tuesday. Even so you'll usually have a few false starts before getting any assignments. These sorts of jobs can be good if you live somewhere with a relatively low cost of living, since rates for remote work and onsite work might be close to par even with the national firm's cut off the top, but if you live somewhere expensive like New York City, you'll be working at a significant discount. The jobs are also usually fairly short -- an hour or two per day, typically -- and the audio quality is not always of the best. But even an hour here and an hour there is a lot better than nothing, and at least there's always a nice variety of work. The biggest disadvantage to relying on this sort of work is that you always feel like you're hustling. You spend a lot of time camping out at your computer waiting for jobs to come in, and you can't plan out your schedule each week, since it changes from day to day.
* Get daily offline transcription work. This pays much less than remote CART work, but it has the advantage of flexible scheduling; you can get it in the morning, go for an afternoon walk in the park, and turn it in that evening, unlike the remote work, which has to be done at the particular hour appointed by the firm. Still, it takes an awful lot of transcription work to make ends meet, and again the audio quality can sometimes be a little dodgy.
* Do depositions. I think this is the option preferred by most of my colleagues, but I've personally never done a deposition, so I wouldn't even know where to start. I'm not a notary public and I've never prepared a legal transcript, so if you need advice on the details of this one, I'm not really the one to ask.
* Do quick-turnaround sports transcriptions. One of my colleagues does this sort of work, going around to various games and transcribing post-game interviews with athletes for journalists. I know nothing about sports, so this is definitely not the job for me, but it sounds like a fun summer interlude for people who are into that sort of thing.
* Live off savings. One the one hand, you get lots of free time to go out in the sunshine and enjoy the summer. On the other hand, watching that bank balance drip down day by day with nothing to fill it back up until September can be a pretty unnerving feeling. Still, if you plan ahead well enough and you've got a fairly solid margin for error (estimated taxes, unexpected expenses, overbudget vacations, late checks in the beginning of the school year), you'll be sitting pretty. I know a few CART providers who go this route, and while I don't think I'll be planning to kick back all summer any time soon, I do intend to take a few weeks of unpaid vacation this summer to see my brother in Seattle and my parents in Montana, which will definitely be a nice respite from captioning medical school.
* Mix and match. A combination of any or all of the above. A little here, a little there, and maybe some other paying piecework to spackle in the gaps. Anything I missed? What do you do?
Monday, April 16, 2012
CART Problem Solving: Lag
CART Problem Solving Series
Sitting Apart
Handling Slides
Classroom Videos
Latin
Superscript and Subscript
Schlepping Gear
Late Hours
Expensive Machines
Communicating Sans Steno
Cash Flow
Lag
Summer
Test Nerves
Ergonomics
Speech Recognition, Part I
Speech Recognition, Part II
Speech Recognition, Part III
Speech Recognition, Part IV
CART PROBLEM: There's too much delay between when a word is written and when it appears onscreen.
If you read The Plover Blog, you'll have seen today's post about the updated Windows version. This means, among many other exciting things, that I can use Plover as a monitoring system for my steno output when my clients are sitting too far away to read comfortably from their screens. Unlike systems like StreamText (which charge per minute) and Bridge (which, in my experience, are unreliable and prone to freezing the entire connection -- client computer and monitoring computer alike), Plover runs on a completely separate connection that has no effect on my client's connection. I can turn it off and on with impunity and never affect what my client views on their screen. This is a big deal. Plover's still not quite at the stage where it can completely take over from Eclipse, my proprietary steno software, but it's definitely advanced enough to be very useful as a monitoring system. One of its best features is that unlike all the other steno software on the market, it has a length-based stroke buffer rather than a time-based stroke buffer, so there's absolutely no lag between when I write a word and when it appears in its active window.
Proprietary steno software is always written with court reporters in mind, and CART functionality tends to be put in as an afterthought. Lawyers don't care if there's a 1.5 second delay between when something is said and when it appears on the screen, and sometimes court reporters are grateful for the time difference between what they see on their screen with pending translation display turned on (though they still have to read through lots of ugly metacharacters) and what the lawyers see on their realtime screens, since it allows them to correct any errors they might have made before the lawyers can spot them. But CART is a very different business. In order for CART to be truly useful to our clients, the text output has to be as smooth and instantaneous as possible. There's inevitably going to be some delay built in; it takes time for the provider to hear and write the words at one end, and it takes time for the client to read and comprehend the words on the other end. But even a small additional delay added on can mean unnecessary frustration and embarrassment. When a professor asks one of my clients a question and they pause to let the words appear on the screen, each tiny fraction of a second can diminish the professor's estimation of my client's intelligence, competence, or attentiveness. It's not rational and it's not fair, but it's a fact, and I think it's important to minimize it as much as humanly possible.
Since I can't yet use Plover in all my daily work, what do I do? I certainly don't want to make them read through the metacharacters. So I turn pending translation display off, and at every pause in speech, no matter how tiny, I invoke the {FLUSH} command, which I write "TPHR-RB". This manually dumps the buffer and avoids the 1.5 second delay. I just looked at my dictionary statistics, and it tells me that I've written {FLUSH} over 1.1 million times and counting. 1.1 million wasted keystrokes. (Sometimes instead of {FLUSH} I'll write STPH-B instead, which roughly translates to "press the right arrow key" and accomplishes the same thing. I've written that about 105,000 times.) It's so annoying that I've got to slam on the flush stroke pretty much every sentence I write, but it's the only way to give my clients truly instant text output. I've asked the developers of Eclipse if they can include a pending translation display without the messy markup and line jumping, but they've told me it's impossible. I've just got to keep working on making Plover completely functional, and then I'll be able to finally throw away that silly superfluous flush command for good.
Sitting Apart
Handling Slides
Classroom Videos
Latin
Superscript and Subscript
Schlepping Gear
Late Hours
Expensive Machines
Communicating Sans Steno
Cash Flow
Lag
Summer
Test Nerves
Ergonomics
Speech Recognition, Part I
Speech Recognition, Part II
Speech Recognition, Part III
Speech Recognition, Part IV
CART PROBLEM: There's too much delay between when a word is written and when it appears onscreen.
If you read The Plover Blog, you'll have seen today's post about the updated Windows version. This means, among many other exciting things, that I can use Plover as a monitoring system for my steno output when my clients are sitting too far away to read comfortably from their screens. Unlike systems like StreamText (which charge per minute) and Bridge (which, in my experience, are unreliable and prone to freezing the entire connection -- client computer and monitoring computer alike), Plover runs on a completely separate connection that has no effect on my client's connection. I can turn it off and on with impunity and never affect what my client views on their screen. This is a big deal. Plover's still not quite at the stage where it can completely take over from Eclipse, my proprietary steno software, but it's definitely advanced enough to be very useful as a monitoring system. One of its best features is that unlike all the other steno software on the market, it has a length-based stroke buffer rather than a time-based stroke buffer, so there's absolutely no lag between when I write a word and when it appears in its active window.
Proprietary steno software is always written with court reporters in mind, and CART functionality tends to be put in as an afterthought. Lawyers don't care if there's a 1.5 second delay between when something is said and when it appears on the screen, and sometimes court reporters are grateful for the time difference between what they see on their screen with pending translation display turned on (though they still have to read through lots of ugly metacharacters) and what the lawyers see on their realtime screens, since it allows them to correct any errors they might have made before the lawyers can spot them. But CART is a very different business. In order for CART to be truly useful to our clients, the text output has to be as smooth and instantaneous as possible. There's inevitably going to be some delay built in; it takes time for the provider to hear and write the words at one end, and it takes time for the client to read and comprehend the words on the other end. But even a small additional delay added on can mean unnecessary frustration and embarrassment. When a professor asks one of my clients a question and they pause to let the words appear on the screen, each tiny fraction of a second can diminish the professor's estimation of my client's intelligence, competence, or attentiveness. It's not rational and it's not fair, but it's a fact, and I think it's important to minimize it as much as humanly possible.
Since I can't yet use Plover in all my daily work, what do I do? I certainly don't want to make them read through the metacharacters. So I turn pending translation display off, and at every pause in speech, no matter how tiny, I invoke the {FLUSH} command, which I write "TPHR-RB". This manually dumps the buffer and avoids the 1.5 second delay. I just looked at my dictionary statistics, and it tells me that I've written {FLUSH} over 1.1 million times and counting. 1.1 million wasted keystrokes. (Sometimes instead of {FLUSH} I'll write STPH-B instead, which roughly translates to "press the right arrow key" and accomplishes the same thing. I've written that about 105,000 times.) It's so annoying that I've got to slam on the flush stroke pretty much every sentence I write, but it's the only way to give my clients truly instant text output. I've asked the developers of Eclipse if they can include a pending translation display without the messy markup and line jumping, but they've told me it's impossible. I've just got to keep working on making Plover completely functional, and then I'll be able to finally throw away that silly superfluous flush command for good.
Monday, April 9, 2012
CART Problem Solving: Cash Flow
CART Problem Solving Series
Sitting Apart
Handling Slides
Classroom Videos
Latin
Superscript and Subscript
Schlepping Gear
Late Hours
Expensive Machines
Communicating Sans Steno
Cash Flow
Lag
Summer
Test Nerves
Ergonomics
Speech Recognition, Part I
Speech Recognition, Part II
Speech Recognition, Part III
Speech Recognition, Part IV
CART PROBLEM: Payments for CART work can be unpredictable and irregular
Today I stopped by my accountant's office, picked up my 2011 tax returns, wrote out the checks, and dropped all four of them (NYC freelancers have to pay a lot of separate tax agencies) in the mailbox. I've still got to pay estimated taxes for the first quarter of 2012, but it definitely feels good to have 2011 taken care of, and most of all, it's good to be able to write those checks and know that I have the money for them in my account. In previous years, I've sometimes had to transfer money for yearly or quarterly tax payments from my emergency savings account, which never feels good. This year I finally started following some cash management advice I first heard about back in 2010, and the resulting peace of mind is incalculable.
If you were reading this blog back then, you might remember my review of The Money Book for Freelancers.
I can't recommend this book highly enough. When I started providing CART in 2007, I had no idea what I was doing. I'd only ever had full-time W-2 jobs before: A regular paycheck at fixed intervals for roughly the same amount each time. As long as I made sure to set up my finances so that I didn't overspend my monthly take-home pay, I was fine. When I started freelancing, everything got a whole lot more complicated. I'd invoice for a two-week stint of work and not get paid for it until a month or two months or sometimes even three months later. I stopped looking forward to winter holidays, spring break, and the summer months especially, because time I wasn't working meant time I wasn't paid for, so it felt less like a vacation and more like enforced unemployment. I always made ends meet, but sometimes it was a struggle to put anything aside, and as soon as the cash flow dried up, I'd have to dip into those savings to spackle over the gap. It was frustrating.
The Money Book gave me a few simple principles to live by, which I've been gradually trying to implement since 2010. The first and easiest was to set up an "overhead account". It felt weird, but I went to my bank, opened up a second checking account, and deposited enough to cover my monthly rent and health insurance. Then I tried to forget it existed. No matter what financial crisis might come up, I at least had one month's big fixed expenses covered. The first few months after I set it up, I emptied out the overhead account right before the beginning of the month, and filled it up as checks came in over the course of the next month, but I'm happy to say I haven't had to dip into the overhead account in over a year now, and it's just been sitting there as a nice chunk of security whenever I need it.
In my first post on The Money Book, I mentioned that I had been tracking all my cash purchases (personal and professional) with an app on my Blackberry and then manually importing those every few weeks into Mint. That didn't last long; the manual import process was just too tedious. I stopped tracking cash for a long time and actually only started again when Mint upgraded its Android app to allow on-the-fly cash tracking last September. I've been pretty good about it since then, though, and it's had two good effects: First, while I don't tend to carry much cash and use my debit card for most things over $20, I now know how much I spend on small purchases -- especially food, which is the most common thing I buy with cash. I've been able to incorporate my lunch money expenditures into my monthly food budget, which is helpful, and I now don't have a big mysterious lump of "Uncategorized" to deal with when I go over my spending in Mint. The second effect is also good. I've got a bit of a junk food habit (my big weakness is potato chips, especially the Honey Mustard Lays), but most bodegas have a $10 debit card minimum, so if I want to pop in for a quick bite of something unhealthy I usually have to pay in cash. But since I've been pretty good about getting in the habit of tracking any cash purchase I make, I've found myself spending less on junk food because when I weigh the crunchy deliciousness against the hassle of taking my phone out and punching through a handful of menus just to register a $2 purchase, I sometimes decide it's not worth the trouble. Beyond just junk food, it makes me generally more aware of where my cash is going and reduces unnecessary impulse buys, which is definitely a good thing.
I didn't start implementing the last and most important idea from The Money Book until the beginning of this year. They strongly recommend that you split each paycheck as soon as it clears and transfer a fixed amount into taxes, savings, and emergency accounts. (Also retirement, but I'm not quite there yet, though I just turned 31, so I've got to get on it pretty soon). I liked the idea as soon as I read about it, but I tended to get my money in irregular lump sums, once every month or so, and it seemed like as soon as I had paid the bills for one month, I had only a small amount left to stretch as long as it took for the next big check to come in. I didn't want to carve anything out of that check, for fear that the remainder wouldn't last me long enough, and I'd have to dip back into the accounts I put the money into, which I knew was a dangerous precedent to set. So I just kept all the money in my business account and prayed for three years that I'd have enough in there to cover taxes, not really knowing how much of the tax money I'd already spent or whether I'd make enough in the next quarter to pay taxes for the previous one. Add in the CART drought that tends to come with the summer season, and it was pretty nerve-wracking. I got by, but my savings accounts stayed flat for far too long. Human nature being what it is, I tended to spend a bit more than I should have when the checks came in all at once, leaving me stranded during the droughts because I hadn't put anything aside during the flush times.
What changed this year? Well, partly it was that I got sick of the uncertainty and decided this was the year to finally put the plan into action, but I have to admit that part of it was also striking a deal with one of the universities I work for. Rather than hiring me as a 1099 independent contractor, it put me on their books as a W-2 employee, with weekly paychecks and tax withholding. Otherwise there was no difference between my relationship with them and my relationship with the schools that pay me as a contractor; they didn't offer me benefits or expect exclusivity or anything like that. It just meant that I got less money up front, since they reserved some of what I earned for the IRS, but it also meant that I got paid weekly, regular as clockwork, and that bit of weekly security made me less anxious about the unpredictable ebb and flow of my other CART work. I started splitting each check I received -- including the paychecks that had already had their tax chunk taken out -- and putting each piece into its designated account. It was surprisingly easy to do, and I found that I had plenty of money left to live on. In fact, I didn't even really miss the difference. My emergency account plumped back up to where it had been before I took out deposit money for our apartment move last spring. My tax account filled steadily, greatly reducing my anxiety over whether I'd be able to pay the Feds this April. And best of all, my long-term hopes and dreams account finally budged from the small, sad figure it had been at for two years, giving me even more incentive to keep working and planning for the future. This W-2 deal is pretty uncommon in CART work, and I'm not relying on it; if that university runs out of students one semester and I pick up the slack at another university with a more conventional 1099 deal, I think I'll still have the self-discipline to keep splitting the checks. It seems scary at first, but it winds up making things much easier to manage in the long run.
So, to sum up, my tips for a stress-free life as a freelance CART provider?
* Get a good accountant. My first year, I did a walk-in at H&R Block. They were brusque and didn't really understand my business. The next two years, I did my own taxes online with TurboTax. It was fine, but I realized I was probably overpaying quite a bit, and since freelancers tend to get audited more than other people, it made me nervous not to have anyone backing me up. Last year and this year I used an accountant that specializes in self-employed and freelance workers, and it's made all the difference. I'm paying far less and I'm much less nervous that a misunderstanding or mistake will get me in big trouble.
* Keep on top of your clients. If they're habitually late payers, don't let it slide; the payment window will just get wider and wider as they try to see what they can get away with. Be polite but dedicated in following up on late payments. Don't let them see your running balance with them as a source of reliable interest-free business loans.
* Keep at least one month of your most important fixed expenses in an overhead account, and try to touch it as little as possible. If you have to take money out of it, prioritize filling it up before spending any money that comes in on optional expenses. It takes a while to build up a real emergency account, but you can probably set aside one month's worth if you try. The peace of mind it'll give you is inexpressible.
* Get a good cash tracker, preferably on your phone so you can enter transactions at will, though a small notebook is fine too, if you don't mind transferring it to your money software by hand. Debit card purchases are great because they can be automatically categorized by most money software, but it's important not to let cash slip through the cracks.
* As soon as you can, get in the habit of splitting each check when it comes in into separate accounts named Taxes, Emergency, and Savings. If you can, try to add Retirement in there too. Taxes should be between 20% and 30%, but the rest can start as small percentages and increase with time as you get more confident in the patterns of your cash flow. Keep these figures in online banks with high interest rates that don't give you a debit card, so you're not tempted to spend them on everyday purchases. Track them with your money software so you can see the numbers going up with each check; it's the motivation you'll need to keep yourself going through the lean times.
Again, if you want more information on managing an irregular cash flow, I really have to recommend The Money Book. It's taught me pretty much everything I know about balancing business revenue with personal expenses, and I'm much less nervous about money since I started following its advice. There's lots of stuff in there that I haven't even mentioned, so check it out. You won't be sorry.
Sitting Apart
Handling Slides
Classroom Videos
Latin
Superscript and Subscript
Schlepping Gear
Late Hours
Expensive Machines
Communicating Sans Steno
Cash Flow
Lag
Summer
Test Nerves
Ergonomics
Speech Recognition, Part I
Speech Recognition, Part II
Speech Recognition, Part III
Speech Recognition, Part IV
CART PROBLEM: Payments for CART work can be unpredictable and irregular
Today I stopped by my accountant's office, picked up my 2011 tax returns, wrote out the checks, and dropped all four of them (NYC freelancers have to pay a lot of separate tax agencies) in the mailbox. I've still got to pay estimated taxes for the first quarter of 2012, but it definitely feels good to have 2011 taken care of, and most of all, it's good to be able to write those checks and know that I have the money for them in my account. In previous years, I've sometimes had to transfer money for yearly or quarterly tax payments from my emergency savings account, which never feels good. This year I finally started following some cash management advice I first heard about back in 2010, and the resulting peace of mind is incalculable.
If you were reading this blog back then, you might remember my review of The Money Book for Freelancers.
I can't recommend this book highly enough. When I started providing CART in 2007, I had no idea what I was doing. I'd only ever had full-time W-2 jobs before: A regular paycheck at fixed intervals for roughly the same amount each time. As long as I made sure to set up my finances so that I didn't overspend my monthly take-home pay, I was fine. When I started freelancing, everything got a whole lot more complicated. I'd invoice for a two-week stint of work and not get paid for it until a month or two months or sometimes even three months later. I stopped looking forward to winter holidays, spring break, and the summer months especially, because time I wasn't working meant time I wasn't paid for, so it felt less like a vacation and more like enforced unemployment. I always made ends meet, but sometimes it was a struggle to put anything aside, and as soon as the cash flow dried up, I'd have to dip into those savings to spackle over the gap. It was frustrating.
The Money Book gave me a few simple principles to live by, which I've been gradually trying to implement since 2010. The first and easiest was to set up an "overhead account". It felt weird, but I went to my bank, opened up a second checking account, and deposited enough to cover my monthly rent and health insurance. Then I tried to forget it existed. No matter what financial crisis might come up, I at least had one month's big fixed expenses covered. The first few months after I set it up, I emptied out the overhead account right before the beginning of the month, and filled it up as checks came in over the course of the next month, but I'm happy to say I haven't had to dip into the overhead account in over a year now, and it's just been sitting there as a nice chunk of security whenever I need it.
In my first post on The Money Book, I mentioned that I had been tracking all my cash purchases (personal and professional) with an app on my Blackberry and then manually importing those every few weeks into Mint. That didn't last long; the manual import process was just too tedious. I stopped tracking cash for a long time and actually only started again when Mint upgraded its Android app to allow on-the-fly cash tracking last September. I've been pretty good about it since then, though, and it's had two good effects: First, while I don't tend to carry much cash and use my debit card for most things over $20, I now know how much I spend on small purchases -- especially food, which is the most common thing I buy with cash. I've been able to incorporate my lunch money expenditures into my monthly food budget, which is helpful, and I now don't have a big mysterious lump of "Uncategorized" to deal with when I go over my spending in Mint. The second effect is also good. I've got a bit of a junk food habit (my big weakness is potato chips, especially the Honey Mustard Lays), but most bodegas have a $10 debit card minimum, so if I want to pop in for a quick bite of something unhealthy I usually have to pay in cash. But since I've been pretty good about getting in the habit of tracking any cash purchase I make, I've found myself spending less on junk food because when I weigh the crunchy deliciousness against the hassle of taking my phone out and punching through a handful of menus just to register a $2 purchase, I sometimes decide it's not worth the trouble. Beyond just junk food, it makes me generally more aware of where my cash is going and reduces unnecessary impulse buys, which is definitely a good thing.
I didn't start implementing the last and most important idea from The Money Book until the beginning of this year. They strongly recommend that you split each paycheck as soon as it clears and transfer a fixed amount into taxes, savings, and emergency accounts. (Also retirement, but I'm not quite there yet, though I just turned 31, so I've got to get on it pretty soon). I liked the idea as soon as I read about it, but I tended to get my money in irregular lump sums, once every month or so, and it seemed like as soon as I had paid the bills for one month, I had only a small amount left to stretch as long as it took for the next big check to come in. I didn't want to carve anything out of that check, for fear that the remainder wouldn't last me long enough, and I'd have to dip back into the accounts I put the money into, which I knew was a dangerous precedent to set. So I just kept all the money in my business account and prayed for three years that I'd have enough in there to cover taxes, not really knowing how much of the tax money I'd already spent or whether I'd make enough in the next quarter to pay taxes for the previous one. Add in the CART drought that tends to come with the summer season, and it was pretty nerve-wracking. I got by, but my savings accounts stayed flat for far too long. Human nature being what it is, I tended to spend a bit more than I should have when the checks came in all at once, leaving me stranded during the droughts because I hadn't put anything aside during the flush times.
What changed this year? Well, partly it was that I got sick of the uncertainty and decided this was the year to finally put the plan into action, but I have to admit that part of it was also striking a deal with one of the universities I work for. Rather than hiring me as a 1099 independent contractor, it put me on their books as a W-2 employee, with weekly paychecks and tax withholding. Otherwise there was no difference between my relationship with them and my relationship with the schools that pay me as a contractor; they didn't offer me benefits or expect exclusivity or anything like that. It just meant that I got less money up front, since they reserved some of what I earned for the IRS, but it also meant that I got paid weekly, regular as clockwork, and that bit of weekly security made me less anxious about the unpredictable ebb and flow of my other CART work. I started splitting each check I received -- including the paychecks that had already had their tax chunk taken out -- and putting each piece into its designated account. It was surprisingly easy to do, and I found that I had plenty of money left to live on. In fact, I didn't even really miss the difference. My emergency account plumped back up to where it had been before I took out deposit money for our apartment move last spring. My tax account filled steadily, greatly reducing my anxiety over whether I'd be able to pay the Feds this April. And best of all, my long-term hopes and dreams account finally budged from the small, sad figure it had been at for two years, giving me even more incentive to keep working and planning for the future. This W-2 deal is pretty uncommon in CART work, and I'm not relying on it; if that university runs out of students one semester and I pick up the slack at another university with a more conventional 1099 deal, I think I'll still have the self-discipline to keep splitting the checks. It seems scary at first, but it winds up making things much easier to manage in the long run.
So, to sum up, my tips for a stress-free life as a freelance CART provider?
* Get a good accountant. My first year, I did a walk-in at H&R Block. They were brusque and didn't really understand my business. The next two years, I did my own taxes online with TurboTax. It was fine, but I realized I was probably overpaying quite a bit, and since freelancers tend to get audited more than other people, it made me nervous not to have anyone backing me up. Last year and this year I used an accountant that specializes in self-employed and freelance workers, and it's made all the difference. I'm paying far less and I'm much less nervous that a misunderstanding or mistake will get me in big trouble.
* Keep on top of your clients. If they're habitually late payers, don't let it slide; the payment window will just get wider and wider as they try to see what they can get away with. Be polite but dedicated in following up on late payments. Don't let them see your running balance with them as a source of reliable interest-free business loans.
* Keep at least one month of your most important fixed expenses in an overhead account, and try to touch it as little as possible. If you have to take money out of it, prioritize filling it up before spending any money that comes in on optional expenses. It takes a while to build up a real emergency account, but you can probably set aside one month's worth if you try. The peace of mind it'll give you is inexpressible.
* Get a good cash tracker, preferably on your phone so you can enter transactions at will, though a small notebook is fine too, if you don't mind transferring it to your money software by hand. Debit card purchases are great because they can be automatically categorized by most money software, but it's important not to let cash slip through the cracks.
* As soon as you can, get in the habit of splitting each check when it comes in into separate accounts named Taxes, Emergency, and Savings. If you can, try to add Retirement in there too. Taxes should be between 20% and 30%, but the rest can start as small percentages and increase with time as you get more confident in the patterns of your cash flow. Keep these figures in online banks with high interest rates that don't give you a debit card, so you're not tempted to spend them on everyday purchases. Track them with your money software so you can see the numbers going up with each check; it's the motivation you'll need to keep yourself going through the lean times.
Again, if you want more information on managing an irregular cash flow, I really have to recommend The Money Book. It's taught me pretty much everything I know about balancing business revenue with personal expenses, and I'm much less nervous about money since I started following its advice. There's lots of stuff in there that I haven't even mentioned, so check it out. You won't be sorry.
Subscribe to:
Posts (Atom)