StenoKnight CART Blog: 2012

Friday, December 21, 2012

Ignorance

I might return to the CART Problem Series in the future, but I think the 18 posts that I've made on the subject will do for now. At least for a little while, I'm going to go back to making posts on various and sundry subjects as they occur to me, rather than just framing them as problems and solutions.

I mentioned in my Communicating Sans Steno post that my dad has had significant hearing loss for as long as I can remember, but since he was largely in denial about it for most of that time, I didn't learn much about how hearing loss worked or what it was like to deal with in daily life, so I'm ashamed to admit that I've been staggeringly insensitive to at least two hard of hearing people I've known over the years.

The first was when I was a teenager. Her parents were friends with my parents, so she and her brother used to come over to my house when we had parties, and sometimes I'd go to theirs. We went to different schools, so we only saw each other a few times a year, but we always had a good time when we got together. I didn't notice that she wore hearing aids until several years after we first met, when she mentioned that she'd won an essay scholarship for teenagers with hearing loss. Like the ignorant blunderer I was, I said, "Wait, you have hearing loss?" For the first time I noticed the aids. "But you wear hearing aids." "Yes," she replied patiently. "So... Why do you qualify for a scholarship if your hearing aids have already fixed the problem?" Like so many people, I'd assumed that if my eyeglasses were able to correct my severe myopia to normal vision, then hearing aids would be able to do the same thing for anything short of total deafness. I had no idea until almost 20 years later that amplification often doesn't improve clarity, that some frequencies can be incapable of amplification due to permanent loss of specific cochlear hair cells, that hearing is an extremely complex mechanism that doesn't have an easy or complete fix when any of its components malfunction. My misrefracting cornea could be completely compensated for by a piece of light-bending plastic. Even with hearing aids, my friend's hearing loss remained something she needed to reckon with.

I'd never noticed her misunderstanding me or asking me to repeat myself when we talked (the way my dad often did), and I didn't realize that the casual one-on-one conversations we had at parties were totally unlike her situation in the classroom, where she was learning new material, sat several feet away from the teacher (losing any ability to lipread, especially since the teacher faced the board most of the time), and was forced to work twice as hard as her classmates to get the same amount of information through her ears and into her brain. The fact that my friend had managed to do this all her life, getting excellent grades and becoming an extremely literate and eloquent writer, totally blew past me. I took it for granted; instead of congratulating my friend on her essay, I was rude and dismissive. I haven't seen my old friend since high school, but if I ever run into her again, I'll apologize and explain that I know a lot more now than I did then -- not that that's any excuse. If I had actually asked her to tell me more about the scholarship instead of assuming that it made no sense, I would have learned something that day, instead of having to wait 20 years to realize how much of a jerk I'd been.

The second incident is even more problematic, because I was in a position of authority. At my college, all sophomores are required to take a year of music theory, even though their degree (there's only one on offer) is in Liberal Arts. Music classes are led by professional instructors, but there are also weekly practicum classes, where students are supposed to try out what they've learned in small 4-to-5-person groups. Students with musical experience are chosen to lead those groups as work-study assignments, and because I'd played in the pit orchestra of a summer repertory theater, I got to be one of them. My job involved drilling the students in singing simple multipart songs and rounds, helping them to analyze counterpoint examples discussed in class, and answering any questions they had about the stuff they were studying. The emphasis was on getting an intellectual understanding of the music rather than in becoming accomplished performers, so it wasn't a problem that a few students in each practicum were tone deaf. Most of them just hadn't been exposed to much formal music training, and once I gave them a few exercises, their pitch discrimination and singing tended to improve quite a bit.

There was one student, though, who found both the music class and the practicum intensely frustrating. I noticed his hearing aids right away, because he'd decorated the earmolds in bright colors. He was forthcoming about his hearing loss, and explained that he got very little out of all the singing, analysis, and call-and-response pitch practice, because he couldn't hear any of it accurately enough to duplicate. Again, I assumed that the hearing aids should have solved the problem, and didn't understand what his issue was. He wasn't being graded on his accuracy in singing, and he wouldn't be penalized if he wasn't able to appreciate the aesthetic nuances of the songs. All he had to do was understand the mathematics of the music on the page, so that he could speak about it in class. The singing exercises were just intended to help build first-hand experience with hearing and repeating music in realtime. I figured that his hearing loss put him in the category of the "tone deaf" students, and treated him accordingly. I didn't realize that, unlike them, the problem wasn't his ability to distinguish the notes on an intellectual level. Unlike them, he wasn't going to improve with practice. He couldn't hear the difference between pitches no matter how many times they were repeated, so he felt like he was being forced to bang his head against a wall every week in practicum. When he expressed his frustration to me, I thought he was being oversensitive, and just reassured him that it wouldn't affect his grade even if he didn't improve by the end of the semester. I didn't realize the emotional consequences of being asked to do something you weren't physically able to do every week in front of your peers, over and over, and failing every time. He eventually wound up transferring to another college, and I'm afraid that my inability to understand what he was telling me played into that decision.

Like any good essay writer, I've Googlestalked both of these people as research for this post, and today they're both extremely successful and well-respected in their fields. Obviously my ignorance didn't stop them from doing what they wanted to do. But when you add my ignorance to the ignorance of everyone else they had to deal with, how much more exhausting, frustrating, annoying, infuriating did it make their educational experiences, not to mention other parts of their lives? If I hadn't gone into CART, I never would have realized the mistakes I'd made in trusting my own assumptions instead of listening to their experiences. Now I do, and I'm mortified when I think of the way I behaved. There's no easy solution to this problem. One out of every seven people in this country have some degree of hearing loss, and yet so few people actually understand how it works. It'll take a lot to educate all 312 million people about the 45 million who are Deaf, deafened, or hard of hearing, but it badly needs to be done.

Thursday, December 6, 2012

CART Problem Solving: Speech Recognition Part IV

CART Problem Solving Series

Sitting Apart
Handling Slides
Classroom Videos
Latin
Superscript and Subscript
Schlepping Gear
Late Hours
Expensive Machines
Communicating Sans Steno
Cash Flow
Lag
Summer
Test Nerves
Ergonomics
Speech Recognition, Part I
Speech Recognition, Part II
Speech Recognition, Part III
Speech Recognition, Part IV

This video is only tangentially relevant to the post; I just found it adorable.

At long last, the final Speech Recognition installment.

CART PROBLEM:Speech recognition is almost always slower and less accurate than stenographic text entry, but there's a strong cultural push to use it, because it's perceived as cheaper and less complicated than hiring a qualified CART provider.

In the previous three posts, I discussed why speech recognition isn't yet capable of providing accurate text when presented with untrained multispeaker audio. I also spoke a bit about why the common assumption that it would only take a bit more development time and processing power to get to 100% accuracy is based on a misunderstanding of how language works and how speech recognition engines try to capture it.

Just because a lizard can play that bug-squishing iPhone game, it doesn't follow that upgrading the lizard to a cat will make it a champion at Dance Dance Revolution. A bigger speech corpus, faster computers, and even a neural-network pattern matching model still doesn't make up for the essential difference between human and mechanized speech recognition: Humans are able to make use of context and semantic cues; computers are not. Language is full of soundalike words and phrases, and imperfect audio is very much the rule and not the exception in most real-world situations. This means that humans will inevitably have the edge over computers in differentiating ambiguous sound patterns, and the improvements in speech recognition technology will follow an asymptotic trajectory, with each new improvement requiring vastly greater effort to achieve, and the final goal of accurate independent transcription a nearly impossible one, except in controlled settings, with a narrow range of speakers and vocabulary.

But of course there's a huge difference between a professional voice writer and an untrained one, and an even greater difference between any kind of respeaking system and a speaker-independent speech transcription program. Despite widespread public perception, voice writing isn't actually any easier to do than CART, and in fact is usually quite a bit harder in most circumstances.

The supposedly short training period is voice writing's major selling point over steno (aside from the cost of equipment), but from what I can tell, it's not actually true. You can train someone to a moderate degree of accuracy very quickly; all they have to do is speak into the microphone slowly and clearly, and it'll get a fair amount of words correct. For dictation or offline transcription, this can work well, assuming they have the stamina to speak consistently for long periods of time, because they can speak at a slow pace, stop, go back, and correct errors as they make them. Obviously, the closer a person's voice is to the standard paradigm (male, American, baritone), the better results they'll get. Many people with non-standard voices (such as this deaf female blogger) have a heck of a time getting software to understand them, even speaking as slowly and clearly as they can manage. But even for men with American accents, actual live realtime respeaking at CART levels of accuracy (ideally over 99% correct) is much, much harder than dictation.

* Short words are more difficult for the speech engine to recognize than multisyllabic words are, and are more likely to be ignored or mistranscribed.

* If the voice captioner does mostly direct-echo respeaking, meaning that they don't pronounce common words in nonstandard ways, they have to repeat multisyllabic words using the same number of syllables as in the original audio; if they try to "brief" long words by assigning a voice macro that lets them say the word in one syllable, they run up against the software's difficulty in dealing with monosyllabic words that I mentioned above.

* Because they're mostly saying words in the same amount of time as they were originally spoken (unlike in steno, where a multisyllabic word can be represented by a single split-second stroke), they don't have much "reserve speed" to make corrections if the audio is mistranscribed. They also have to verbally insert punctuation and use macros to differentiate between homonyms, which takes time and can be fatiguing.

* Compensating for the lack of reserve speed by speaking the words more quickly than they were originally spoken can be problematic, because the software is better able to transcribe words spoken with clearly delineated spaces between them, as opposed to words that are all run together.

* This means that if the software makes a mistake and the audio is fairly rapid, the voice captioner is forced to choose between taking time to delete the mistake and then catching up by paraphrasing the speaker, or to keep up with the speaker while letting the mistake stand.

* The skill of echoing previously spoken words aloud while listening to a steady stream of incoming words can be quite tricky, especially when the audio quality is less than perfect; unlike simultaneous writing and listening, simultaneous speaking and listening can cause cross-channel interference.

This doesn't even go into the potential changes in a person's voice brought about by fatigue, allergies, colds, or minor day-to-day variations, all of which can wreak havoc with even a well-trained voice engine.

Low or moderate accuracy offline voice writing = short training period; most people can do it.

Low or moderate accuracy realtime voice writing = somewhat longer training period; machine-compatible voice timbre and accent required.

CART-level accuracy realtime voice writing = extremely long training period; an enormous amount of talent and dedication required.

I want to emphasize again that none of this is meant to denigrate the real skill that well-trained voice writers have developed over their years of training. It's just to point out that while voice writer training seems on the surface to be easier and quicker than steno training, that's very seldom the case in practice, as long as appropriate accuracy standards (99% or better) are adhered to. The problem comes in when the people paying for accommodations, either due to a shortage of qualified steno or voice writers, or due to cost considerations, decide that 95% or lower accuracy is "good enough" and that deaf people should be able to "read through the mistakes".

So let's talk about some other potential competitors to CART services. These fall into two general categories: Offline transcription and text expansion. I think I'll leave text expansion for a future series of posts, since it's a fairly complex subject. Offline transcription is much simpler to address.

I've seen several press releases recently from companies bragging about contracts they've secured with universities, that claim to offer verbatim captioning at rock bottom prices. The catch is that the captioning isn't live. No university or conference organizer I know of is foolhardy enough to set completely automated captions up on a large screen in front of the entire audience for everyone to see. The mistakes made by automated engines are far too frequent and hilarious to get away with. But they will, it seems, let lectures be captured by automated engines, then give the rough transcripts to either in-house editors (mostly graduate students) or employees of the lecture-capturing companies, to produce a clean transcript at speeds that are admittedly somewhat better than they used to be, back when making a transcript or synchronized caption file offline usually involved a qwerty typist starting from scratch.

I'm worried that this is starting to be perceived as an appropriate accommodation for students with hearing loss, because there's a crucial piece missing from the equation: Realtime access. Imagine a lecture hall filled with 250 students at a well-regarded American private university, sitting with laptops and notebooks and audio recorders, facing the PowerPoint screen, ready to learn. It's Monday morning. In walks the professor, who pulls up her slideshow and begins the lecture.

PROFESSOR: Tanulmányait a kolozsvári zenekonzervatóriumban, majd a budapesti Zeneakadémián végezte, Farkas Ferenc, Bárdos Lajos, Járdányi Pál és Veress Sándor tanítványaként. Tanulmányai elvégzése után népzenekutatással foglalkozott. Romániában ösztöndíjasként több száz erdélyi magyar népdalt gyűjtött.

After a few seconds, the students start looking at each other in confusion. They don't speak this language. What's going on? The professor continues speaking in this way for 50 minutes, then steps down from the podium and says, "The English translation of the last hour will be available within 48 hours. Please remember that there is a test on Wednesday."

These students are paying $50,000 or $60,0000 a year to attend this school. They're outraged. Not only do they have less than 24 hours to study the transcript before the test, but they were unable to ask questions or to see the slides juxtaposed with the lecture material. Plus they just had to sit there for 50 minutes, bored and confused, without the slightest idea of what was going on. It wouldn't stand. The professor would be forced to conduct future lectures in English rather than Hungarian, or risk losing her job. This is the state of affairs for deaf and hard of hearing students offered transcripts rather than live captioning. It deprives them of an equal opportunity for learning alongside their peers, and it forces them to waste hours of their life in classes that they can't hear and therefore can't benefit from. I'm waiting for the day when the first student accommodated in this way sues their school for violating the Americans with Disabilities Act, and at that point the fast-turnaround transcript and captioning companies are going to be in a good deal of trouble. There is the possibility of training realtime editors who might be able to keep up with the pace of mistakes and correct each error a few seconds after it's made before the realtime is delivered to the student, but that adds yet another person into the workflow, reducing the savings the university was hoping to get when they laid off their CART providers. In some classes, a relatively untrained editor with a qwerty keyboard will be able to zap the errors and clean up the transcript in realtime, but in others -- where the professor doesn't speak Standard Male American (true for a significant and increasing number of professors in the US college system), or there's too much technical jargon, or the noise of the ventilation system interferes with the microphone, or any of a hundred other reasons -- the rate of errors made by the speech engine will outpace the corrections any human editor can make in realtime.

So what lies ahead in the future? Yes, speech recognition engines will continue to improve. Voice writer training times might decrease somewhat, though fully accurate automated systems will stay out of reach. People don't realize that speech is an analogue system, like handwriting. Computer recognition of the printed word has improved dramatically in the past few decades, and even though transcripts produced via OCR still need to be edited, it's become a very useful technology. Recognition of handwriting has lagged far behind, because the whorls and squiggles of each handwritten letter varies drastically from individual to individual and from day to day. There's too much noise and too little unambiguous signal, apart from the meaning of the words themselves, which allows us to decipher in context whether the grocery list reads "buy toothpaste" or "butter the pasta". Human speech is much more like handwriting than it is like print. Steno allows us to produce clear digital signals that can be interpreted and translated with perfect accuracy by any computer with the appropriate lexicon. Speech is an inextricably analogue input system; there will always be fuzz and flutter.

Monday, June 25, 2012

Sorry for the Radio Silence

Apologies again for the last several weeks of no posts. I had today's blog post all sketched out, but then a situation came up and I don't think I'll be able to actually write it today. I'm currently helping a family member through an ongoing crisis, and it's soaking up a lot of my posting time. Hopefully I'll be able to get back on track soon.

Monday, June 4, 2012

Taking a Mulligan

Hey, guys. I'm really sorry, but Speech Recognition IV is going to have to wait until next week. I've just got too much to do, preparing for my three-hour Steno Crash Course and Plover Programming Sprint at PyGotham this Friday and Saturday. In the mean time, I'll tide over your desire for speech recognition schadenfreude with this:

Gazpacho Soup Day for Siri

Monday, May 28, 2012

CART Problem Solving: Speech Recognition Part III

CART Problem Solving Series

Sitting Apart
Handling Slides
Classroom Videos
Latin
Superscript and Subscript
Schlepping Gear
Late Hours
Expensive Machines
Communicating Sans Steno
Cash Flow
Lag
Summer
Test Nerves
Ergonomics
Speech Recognition, Part I
Speech Recognition, Part II
Speech Recognition, Part III
Speech Recognition, Part IV

Apologies for the lack of captioning in the first few seconds of the video, but I had to post it. It's a fantastic illustration of not just how often automatic speech recognition gets things wrong, but how wrong it tends to get them. There's a whole series of these Caption Fail videos, but this is the most work safe (and, in my opinion, funniest) of the lot.

See, because computers aren't able to make up for lossy audio by filling in the gaps using semantic and contextual clues, they make mistakes that a human transcriber would never make in a million years. On the one hand, you get "displaces" instead of "this place is". That's reasonable enough, and out of context a human might have made that mistake. But when a human hears "fwooosssssh" as a man tries to imitate the sound of the ocean with his mouth, the computer continues to try to read it as speech, and translates it "question of." Not only was it not able to differentiate between words and sound effects, but "fwoooosh" doesn't sound anything like "question of." The algorithms that computers use to match similar sound patterns to each other are so alien to our way of thinking that, unlike mistakes made by humans, we can't even hope to read through them to figure out what the correct version should have been.

I promised you some illustrations to use when trying to explain why accurate speaker independent automated speech recognition is not "just around the corner", despite the popular conception that it is. I think it's useful to consider your audience when trying to explain these. If you're talking to computer people, bringing in the parallels with OCR might be more effective than if you're talking to people who haven't used that sort of technology. If someone has never heard a beatboxer, my voicewriting to steno analogy comparing beat boxing to drumming won't mean much. Try to get an idea of the person's frame of reference first, and then construct your argument.

Belize, the Passport, and the Pixelated Photo

You know those procedural crime shows? Where the first 20 minutes is taken up with chasing seemingly disconnected clues, and then at last one of the detectives has a sudden flash of insight, puts all the pieces together, and knows where to find the suspect? Sometimes those shows do a very misleading thing. They'll get a blurry photo, something captured by a security camera or a helicopter, and the detective will say, "There! Zoom in right there!" Then the screen will shift and you'll get the same photo, except that the formerly blurry house will be nice and clear, and in the window... Is that a man with a gun? The detective will shout, "Zoom in on that flash of light there!" Again, the pixels will smear and redraw themselves, and look! Reflected in the man's glasses is another man in a Panama hat, wearing a mechanic's shirt with "Jerry" embroidered on the pocket and wielding a tire iron!

It's all very exciting, but it's also very wrong. If you take a blurry photograph and blow it up, you don't get a clearer view of its details; you just get a blurrier photograph with larger pixels. This is the essence of lossiness, but in an image rather than a sound. That visual information was lost when the photo was taken, and no amount of enhancement or sharpening tools will ever get it back. Unlike computers, humans are extremely good at inferring ways of filling in the gaps of lossy material, by using lateral clues. If the hard-won blurry photo of the criminal's coffee table just before he set his house on fire depicts a guide book, a passport, and a bottle of SPF 75 suntan lotion, a computer will either throw up its hands and say, "Not enough information found" or it will produce complete gibberish when trying to decipher its details. A human, on the other hand, will see that the letters on the guidebook, while extremely indistinct, seem to have approximately five spaces between them. The first letter is either an R, a P, or a B, and the last one is quite possibly an E. The passport shows that the criminal will be leaving the country, and the suntan lotion indicates that the location gets lots of sun. The savvy human detective paces around a bit and says -- I've got it! Belize! It's the only country that fits the pattern! Then they hop on the next flight and catch the criminal before the final credits.

The humans didn't need to fill in the gaps of the letters on the guide book in order to figure out the word it spelled, because they were able to use a host of non-textual clues to make up for the lossiness. Because computers programmed to recognize text aren't able to draw on all that subsidiary information, humans will always have an advantage in recognizing patterns, drawing inferences, and correcting errors caused by lossy input.

Why Hasn't Microphone Technology Improved More in 100 Years?

This leads me to the subject of speech recognition, which is a much thornier problem than text recognition. The answer to the question is simple: It has. Listen to an old Edison wax disc record and compare it to CD-quality sound produced by today's audio engineers, and you can hardly claim that recording technology has been stagnant. But the question behind my question is this: With all this fantastic audio recording technology, why is it still close to impossible to get quality audio in a room full of more than a handful of people? Make sure every speaker passes around a hand mic, or wears a lapel mic, or goes up to talk into the lectern mic, and you're fine. Put the best and most expensive multi-directional microphone on the market in the center of the room and half a dozen people sitting around a conference table, and you're sunk. Everyone sounds like they're underwater. The guy adjusting his tie near the microphone sounds like a freight train, while the guy speaking clearly and distinctly at the other end of the table sounds like he's gargling with marbles. Even $4,000 hearing aids have this problem. They're simply not as good as human ears (or, more accurately, the human brain) at filtering out meaningless room noises and selectively enhancing the audio of speakers at a distance. That's why onsite CART is often more accurate by several orders of magnitude than remote CART, no matter how much money is spent on microphones and AV equipment. When the bottleneck of sound input is a microphone, it's limited by its sensitivity, its distance from the speaker, and any interference between the two. That's the problem that still hasn't been solved, over a hundred years since the invention of recorded audio.

Having to transcribe terrible audio, guessing at omissions and listening to a five-second clip of fuzzy sound a dozen times before finally figuring out from context what it's actually about has been a real lesson in empathy for me. The frustration I feel in my home office, clicking the foot pedal over and over and listening to imperfect audio many times over, is nothing compared to what my hard of hearing clients feel in the course of their lives every day. They don't have a foot pedal to rewind the last few seconds of conversation, and they're not even getting paid to do this endlessly unrewarding detective work. I suppose I should feel even more sorry for the poor computers, who are trying to deal with substandard audio but don't have the luxury of lateral thinking or contextual clues or the ability to differentiate between soundalike phrases semantically. I've often wanted to rent out an hour of an audiologist's time and hook up the most popular commercial speech recognition software to their test system. I'd be very interested to see how it did. Of course, it could recognize all the tones perfectly well. It might even be all right at the individual words. But unlike a human with hearing loss, who usually does better at guessing words in the contexts of sentences than hearing them on their own, I bet you that the software would do considerably less well, and would probably come out with an audiogram that put them in the range of moderate to severe hearing loss, especially if any of the tests were given with stimulated noise interference mixed into the audio feed. I could be wrong, of course; I haven't yet gotten a chance to actually do this. But I'd be very interested to find out.

Well, this has run rather longer than I thought it would. I guess I'm going to have to do Speech Recognition Part IV next week. 'Til then!

Monday, May 21, 2012

CART Problem Solving: Speech Recognition Part II

Wednesday, May 16, 2012

Ergonomic update

I've been thinking about ergonomics a lot since my previous post on the subject. In my onsite work, I only spend a few hours at a time in any one position; classes range from 1 to 3 hours, but there are usually breaks every hour or so. Because I'm working from home this summer, though, I've been spending up to 4 hours at a time at my desk doing CART, and then several more hours doing transcript editing, transcription work, or miscellaneous administrative tasks. Unfortunately, it's made me realize how un-ergonomic my setup really is, and how vital it is not to succumb to the temptation to just plant myself in one place and not move from it until the end of the day. My back and shoulders have been warning me that I'd better mix something up soon, or they're really going to start complaining.

I've helped solve my leg fatigue by using foam blocks as foot rests, because they can be shifted around and rolled from back to front under my feet whenever my legs start tightening up. I can also change their height by resting them on their three different edges, and if I want them even higher, I can stack one on top of the other.

One thing that's helped with the planting problem has been to move from the desk to the couch for transcription work, as I mentioned in my previous post, but also to run off of battery power initially, so that when my laptop's battery dies about 1.5 hours later, I'm forced to get up and go into the office for the charger. It might sound silly, but if I don't create those distractions for myself, I have a tendency not to move until my work is done, which is a habit I need to figure out how to break.

Here's another thing I've done, which seems to help a fair amount during the actual remote CART work itself:

As I've said many times, I adore my split-keyboard setup. The only thing that sometimes bugged me, though, is that my desk chair is a little too deep, so in order to reach the keyboard I have to either lean forward (hard on the back), bolster the seat back with several pillows (they tend to slip around and aren't that comfortable), or tilt the tripod forward and the two halves of the steno machine up, which doesn't quite work, because the main arm of the tripod still tends to get in the way. Yesterday I hit on a new solution: I took the armature from my old Gemini 2 machine, put it on a second tripod, and then unscrewed my Infinity Ergonomic from its own armature, putting one half of it on the original tripod and one half on the new one. This allows me to put one tripod on either side of my desk chair, eliminating interference from the tripod's main arm. It's working quite well so far.

I've got a feeling there's one more piece to the puzzle, though. My current desk chair was $50. I bought it at Staples last year. It's really not ideal; there's no lumbar support, it doesn't go high enough, it wobbles a lot, and it's just generally uncomfortable. Every day I spend in it makes me resent it a little bit more. I'm seriously considering buying a fancier chair, but they can be amazingly expensive. Someone on one of the captioner forums I read recommended this one:

It looks great, doesn't it? Ball jointed lumbar support. Headrest. Tons of adjustable settings. But it's $500. Yikes. Do I really want to spend that much money for a chair? Are there cheaper but still ergonomic alternatives out there? If any of you have recommendations, I'd very much like to hear them.

Monday, May 14, 2012

Speech Recognition Part I

CART Problem Solving Series

Sitting Apart
Handling Slides
Classroom Videos
Latin
Superscript and Subscript
Schlepping Gear
Late Hours
Expensive Machines
Communicating Sans Steno
Cash Flow
Lag
Summer
Test Nerves
Ergonomics
Speech Recognition, Part I
Speech Recognition, Part II
Speech Recognition, Part III
Speech Recognition, Part IV

CART PROBLEM: People claim that someday very soon human captioners will be replaced by automated speech recognition engines.

I've got a lot to say on this subject, but it's already late, so I'm going to leave most of the heavy duty analysis for next week's post. For now, I just want to show you a few examples. I first posted this video in 2010:

It's actual classroom audio from MIT's open courseware project. The video is of me captioning it live, using Eclipse. After posting it, I made a post on this blog about my CART accuracy versus the accuracy of YouTube's autocaptions, which at that time had just been released, with promises of increasing accuracy as the time went on.

Here's the transcript of the original autocaptions from 2010:

for implants support and you know as I haven't said anything about biology those folks didn't really need to be educated and genetics biochemistry more about it so about the to solve those problems and that's because biology as it used to be was not a science that engineers could addressed very well because in order for engineers really analyze study quantitatively develop models at the bill technologies all for the parts there's a lot of requirements on the science that really biology that that's why %um %um the actual mechanisms a function work understood yes you could see that moving your arm requires certain force in of would where certain load we really didn't know what was going on down in the proteins and cells and tissues of the muscles on the books okay but still you could decide maybe an artificial %uh to do this %um %uh in the plan you really know the molecular compliments so how the world he actually manipulate the system he didn't even know what the molecules work they're really underlying yes you couldn't really do the chemistry on the biological all %uh it's very hard to quantify you can even though the parts of the mechanisms how could you get quantitative measurements for them develop models so there's good reason why they're never really was a biological engineering until very recently while he wasn't the science that was released soon right here in dallas so there for the world biomedical engineering bailey although the deposition prompted it just talked about that that necessarily require miles per se but that's changed the good news for you folks is biology this change it's now a science that engineers had it been that to very well

Here's the updated transcript from the new autocaptions produced by YouTube's updated and improved speech recognition engine, circa 2012:

for implants and so forth and you know as i haven't said anything about biology those folks didn't really need to be educated in genetics biochemistry molecular body cell bowed to solve those problem and that's because biology as it used to be was not a science that engineers could address very well because in order for engineers really analyze study quantitatively develop models at the bill technologies alter the parts there's a lot of requirements on the science that really biology that satisfy uh... the actual mechanisms a function work understood yes you can see that moving your arm requires certain force in whip where a certain load we really didn't know what was going on down in the proteins and self-interest use of the muscles in the box creek but still you could decide maybe an artificial do this satanic plan you really know the molecular compliments so how the world to be actually manipulate the system to continue to know what the molecules were that are really underlying s thank you could really do the chemistry and biological molecule uh... it's very hard to quantify since if you need to know the parts of the mechanisms how could you get quantitative measurements for them develop models so there's good reason why there never really was a biological engineering until very recently has filed he wasn't a science that was really suited wrench in your analysis synthesis so there for the world biomedical engineering mainly involved all these application props that i've just talked about but that necessarily require biology per se but that's changed good news for you folks is biology those changes in our science that engineers had unfair connect to very well

It's replaced "%uh" with a more appropriate "uh..." and it gets certain words correct that it got wrong in the original, but it also concocted brand-new phrases like "certain force in whip", "self-interested use of the muscles in the box creek", and "an artificial do this satanic plan" for "certain force and would bear", "cells and tissues of the muscles and the bones. Okay?" and "an artificial bone to do this. An implant".

Here's the actual transcript of the video:

Or implants and so forth. And you notice I haven't said anything about biology. Those folks didn't really need to be educated in genetics, biochemistry, molecular biology, cell biology, to solve those problems. And that's because biology, as it used to be, was not a science that engineers could address very well. Because in order for engineers to really analyze, study, quantitatively develop models, and to build technologies, alter the parts, there's a lot of requirements on the science that really biology didn't satisfy. The actual mechanisms of function weren't understood. Yes, you could see that moving your arm required a certain force, and would bear a certain load, but you really didn't know what was going on down in the proteins and cells and tissues of the muscles and the bones. Okay? But still you could design maybe an artificial bone to do this. An implant. You didn't really know the molecular components, so how in the world could you actually manipulate the system, if you didn't even know what the molecules were, that are really underlying this? Okay? You couldn't really do the chemistry on the biological molecules. It's very hard to quantify, since if you didn't even know the parts and the mechanisms, how could you get quantitative measurements for them, develop models? So there's good reason why there never really was a biological engineering until very recently, because biology wasn't a science that was really suited for engineering analysis and engineering synthesis, and so therefore the world of biomedical engineering mainly involved all these application problems that I just talked about, that didn't necessarily require biology per se. But that's changed. Okay? The good news for you folks is biology has changed. It's now a science that engineers can in fact connect to very well.

If you like, go read my original post on the difference between technical accuracy and semantic accuracy. In that post, I determined that, counting only words that the autotranscription got wrong or omitted (not penalizing for extra words added, unlike on steno certification exams), the technical accuracy rate of the autotranscription was 71.24% (213/299 words correct). Two years and much supposed improvement later, the new transcription's technical accuracy rate is... Drum roll, please...

78.59% (235/299 words correct)

Now, I think it's important to point out that this video is essentially ideal in practically every respect for an autocaptioning engine.

* The rate of speech is quite slow. Speech engines tend to fail when the rate gets above 180 WPM or so.

* The speaker has excellent diction. Mumbling and swallowed syllables can wreak havoc with a speech engine.

* The speaker has an American accent. All the most advanced speech engines are calibrated to American accents, because they're all produced by American companies. There are programs that claim to understand various dialects of non-American-accented English (e.g. Scottish, English, Australian), but they're still many generations behind the cutting edge, because they got such a late start in development.

* The speaker is male. Speech engines have a harder time understanding female voices than male ones.

* The speaker is using a high number of fairly long but not excessively uncommon words. Speech engines are better at understanding long words (like "synthesis", "artificial", or "biochemistry") than short ones (like "would" or "weren't"), because they're phonologically more distinct from one another.

* The sound quality is excellent and there is no background noise or music in the video. Humans are able to listen through noise and pick out meaning from cacophony to a degree completely unmatched by any computer software. Even a speech engine that's performing quite well will fall completely to pieces when it's forced to listen through a small amount of static or a quiet instrumental soundtrack.

So if even a video like this can only attain a 78% technical accuracy rating, after two years of high-powered development from the captioning engine produced by one of the most technologically advanced companies in the world... Are you worried that it's going to supplant my 99.67% accuracy rating in another two years? Or ten years? Or 20? And that's just talking about the technical accuracy; I haven't even begun to get into the semantic accuracy. I'll have more to say on this subject in the next installment.

Monday, May 7, 2012

CART Problem Solving: Ergonomics

Monday, April 30, 2012

CART Problem Solving: Test Nerves

Monday, April 23, 2012

CART Problem Solving: Summer

Monday, April 16, 2012

CART Problem Solving: Lag

Monday, April 9, 2012

CART Problem Solving: Cash Flow

I can't recommend this book highly enough. When I started providing CART in 2007, I had no idea what I was doing. I'd only ever had full-time W-2 jobs before: A regular paycheck at fixed intervals for roughly the same amount each time. As long as I made sure to set up my finances so that I didn't overspend my monthly take-home pay, I was fine. When I started freelancing, everything got a whole lot more complicated. I'd invoice for a two-week stint of work and not get paid for it until a month or two months or sometimes even three months later. I stopped looking forward to winter holidays, spring break, and the summer months especially, because time I wasn't working meant time I wasn't paid for, so it felt less like a vacation and more like enforced unemployment. I always made ends meet, but sometimes it was a struggle to put anything aside, and as soon as the cash flow dried up, I'd have to dip into those savings to spackle over the gap. It was frustrating.

The Money Book gave me a few simple principles to live by, which I've been gradually trying to implement since 2010. The first and easiest was to set up an "overhead account". It felt weird, but I went to my bank, opened up a second checking account, and deposited enough to cover my monthly rent and health insurance. Then I tried to forget it existed. No matter what financial crisis might come up, I at least had one month's big fixed expenses covered. The first few months after I set it up, I emptied out the overhead account right before the beginning of the month, and filled it up as checks came in over the course of the next month, but I'm happy to say I haven't had to dip into the overhead account in over a year now, and it's just been sitting there as a nice chunk of security whenever I need it.

In my first post on The Money Book, I mentioned that I had been tracking all my cash purchases (personal and professional) with an app on my Blackberry and then manually importing those every few weeks into Mint. That didn't last long; the manual import process was just too tedious. I stopped tracking cash for a long time and actually only started again when Mint upgraded its Android app to allow on-the-fly cash tracking last September. I've been pretty good about it since then, though, and it's had two good effects: First, while I don't tend to carry much cash and use my debit card for most things over $20, I now know how much I spend on small purchases -- especially food, which is the most common thing I buy with cash. I've been able to incorporate my lunch money expenditures into my monthly food budget, which is helpful, and I now don't have a big mysterious lump of "Uncategorized" to deal with when I go over my spending in Mint. The second effect is also good. I've got a bit of a junk food habit (my big weakness is potato chips, especially the Honey Mustard Lays), but most bodegas have a $10 debit card minimum, so if I want to pop in for a quick bite of something unhealthy I usually have to pay in cash. But since I've been pretty good about getting in the habit of tracking any cash purchase I make, I've found myself spending less on junk food because when I weigh the crunchy deliciousness against the hassle of taking my phone out and punching through a handful of menus just to register a $2 purchase, I sometimes decide it's not worth the trouble. Beyond just junk food, it makes me generally more aware of where my cash is going and reduces unnecessary impulse buys, which is definitely a good thing.

I didn't start implementing the last and most important idea from The Money Book until the beginning of this year. They strongly recommend that you split each paycheck as soon as it clears and transfer a fixed amount into taxes, savings, and emergency accounts. (Also retirement, but I'm not quite there yet, though I just turned 31, so I've got to get on it pretty soon). I liked the idea as soon as I read about it, but I tended to get my money in irregular lump sums, once every month or so, and it seemed like as soon as I had paid the bills for one month, I had only a small amount left to stretch as long as it took for the next big check to come in. I didn't want to carve anything out of that check, for fear that the remainder wouldn't last me long enough, and I'd have to dip back into the accounts I put the money into, which I knew was a dangerous precedent to set. So I just kept all the money in my business account and prayed for three years that I'd have enough in there to cover taxes, not really knowing how much of the tax money I'd already spent or whether I'd make enough in the next quarter to pay taxes for the previous one. Add in the CART drought that tends to come with the summer season, and it was pretty nerve-wracking. I got by, but my savings accounts stayed flat for far too long. Human nature being what it is, I tended to spend a bit more than I should have when the checks came in all at once, leaving me stranded during the droughts because I hadn't put anything aside during the flush times.

What changed this year? Well, partly it was that I got sick of the uncertainty and decided this was the year to finally put the plan into action, but I have to admit that part of it was also striking a deal with one of the universities I work for. Rather than hiring me as a 1099 independent contractor, it put me on their books as a W-2 employee, with weekly paychecks and tax withholding. Otherwise there was no difference between my relationship with them and my relationship with the schools that pay me as a contractor; they didn't offer me benefits or expect exclusivity or anything like that. It just meant that I got less money up front, since they reserved some of what I earned for the IRS, but it also meant that I got paid weekly, regular as clockwork, and that bit of weekly security made me less anxious about the unpredictable ebb and flow of my other CART work. I started splitting each check I received -- including the paychecks that had already had their tax chunk taken out -- and putting each piece into its designated account. It was surprisingly easy to do, and I found that I had plenty of money left to live on. In fact, I didn't even really miss the difference. My emergency account plumped back up to where it had been before I took out deposit money for our apartment move last spring. My tax account filled steadily, greatly reducing my anxiety over whether I'd be able to pay the Feds this April. And best of all, my long-term hopes and dreams account finally budged from the small, sad figure it had been at for two years, giving me even more incentive to keep working and planning for the future. This W-2 deal is pretty uncommon in CART work, and I'm not relying on it; if that university runs out of students one semester and I pick up the slack at another university with a more conventional 1099 deal, I think I'll still have the self-discipline to keep splitting the checks. It seems scary at first, but it winds up making things much easier to manage in the long run.

So, to sum up, my tips for a stress-free life as a freelance CART provider?

* Get a good accountant. My first year, I did a walk-in at H&R Block. They were brusque and didn't really understand my business. The next two years, I did my own taxes online with TurboTax. It was fine, but I realized I was probably overpaying quite a bit, and since freelancers tend to get audited more than other people, it made me nervous not to have anyone backing me up. Last year and this year I used an accountant that specializes in self-employed and freelance workers, and it's made all the difference. I'm paying far less and I'm much less nervous that a misunderstanding or mistake will get me in big trouble.

* Keep on top of your clients. If they're habitually late payers, don't let it slide; the payment window will just get wider and wider as they try to see what they can get away with. Be polite but dedicated in following up on late payments. Don't let them see your running balance with them as a source of reliable interest-free business loans.

* Keep at least one month of your most important fixed expenses in an overhead account, and try to touch it as little as possible. If you have to take money out of it, prioritize filling it up before spending any money that comes in on optional expenses. It takes a while to build up a real emergency account, but you can probably set aside one month's worth if you try. The peace of mind it'll give you is inexpressible.

* Get a good cash tracker, preferably on your phone so you can enter transactions at will, though a small notebook is fine too, if you don't mind transferring it to your money software by hand. Debit card purchases are great because they can be automatically categorized by most money software, but it's important not to let cash slip through the cracks.

* As soon as you can, get in the habit of splitting each check when it comes in into separate accounts named Taxes, Emergency, and Savings. If you can, try to add Retirement in there too. Taxes should be between 20% and 30%, but the rest can start as small percentages and increase with time as you get more confident in the patterns of your cash flow. Keep these figures in online banks with high interest rates that don't give you a debit card, so you're not tempted to spend them on everyday purchases. Track them with your money software so you can see the numbers going up with each check; it's the motivation you'll need to keep yourself going through the lean times.

Again, if you want more information on managing an irregular cash flow, I really have to recommend The Money Book. It's taught me pretty much everything I know about balancing business revenue with personal expenses, and I'm much less nervous about money since I started following its advice. There's lots of stuff in there that I haven't even mentioned, so check it out. You won't be sorry.