Tuesday, May 21, 2013

Variables in Wireless Captioning

The end of the semester is looming, and with it I'm taking on more work outside of my ordinary daily academic CART schedule. Last week I did an awards ceremony and a graduation ceremony, and yesterday I captioned one of the monthly Songbook performances at the New York Public Library. All three of those events had one thing in common: Wireless captioning. In each case, I was given a partial script of the event, which I was able to feed line by line to the client's screen. Other parts were CARTed live, so I had my steno machine at the ready to switch off from line feeding when necessary. This necessitated a split screen view in Eclipse, which was very different from the clean, stripped-down view I like to use with my clients. In addition, two of the events had multiple viewers, seated at a distance from one another, and the event organizers didn't want me to project open captions to a big screen at the front of the venue. Wireless captioning to the rescue!


Samsung Galaxy Tab


Microsoft Surface Pro

I used my laptop to send the script and monitor my CART output, with pending translation display turned on to give me an extra 1.5 seconds of error correction, since the client wasn't reading my screen and wouldn't be forced to read Eclipse's confusing markup syntax. Then I used Streamtext to send the captions to web browsers on my Microsoft Surface and Samsung Galaxy Tab 2 (to replace my dear old Samsung Q1, now on its last legs and looking somewhat junky), as well as to the smartphones and iPads of any audience members who pointed their browsers to the caption feed's URL. According to the guy in the NYPL sound booth, there were about 15 people using their own equipment to view captions yesterday, which is probably a record at that particular event. Why Streamtext? Well, there are a few options for wireless captions, with pros and cons for each:

* Screen sharing apps such as ScreenLeap or Join.me.

* Peer-to-Peer connections such as Teleview and Bridge.

* Free document collaboration services such as Etherpad or Google Docs.

* Instant messaging applications such as Google Talk.

At these particular events, I didn't want to share my screen, since it was split into two unsightly panes, and because screen sharing is usually restricted to specific devices, while I wanted the captions to be accessible to any number of audience members without having to hook up their equipment individually. Screen sharing is also heavier on bandwidth than simple text streaming, doesn't allow the caption viewer to scroll backwards and review captions they might have missed, and is prone to lag, especially as more devices are added to a single screen. Peer-to-Peer connections such as Teleview and Bridge have a lot of potential, and many of my colleagues have used them, but I've been reluctant to use them much after experiencing several problems with freezing, broken connections, and incompatibilities with institutional Wi-Fi. Since reliability is all-important in a captioning situation where you're not on hand to troubleshoot potential problems with every caption viewer's device, I prefer to use server-hosted text streaming services. That way, if the connection drops on the user's end, they just have to refresh their browser once their connection resumes, and the captions start streaming again as usual. If the connection drops on the captioner's end, they have to reset their connection and then wait for users to refresh their browsers.

That's not ideal, but better than peer-to-peer services, which require both provider and users to go through a synchronized handshaking process whenever either party drops a connection. Instant messaging applications are similarly limited to predetermined lists of users, which wouldn't have worked for me in this situation (though they've proven to be helpful as a stopgap during one-on-one CART when the text streaming service has a sudden outage and I know the user's IM identity). Additionally, instant messaging requires the captioner to press "enter" after every line of text, which slows the rate of captioning, and it doesn't tend to support script feeding. Document collaboration services also don't tend to support script feeding, though they can be useful in live captioning situations that don't require script feeding. However, the collaborative editing features can often prove to be more of a hindrance than an asset in most live captioning situations, and the interface isn't always as clean and simple as I'd like.

So as of right now, Streamtext is my go-to service. It's server-hosted, reliable, supports line-by-line script sending, and can connect any number of users on several different devices by streaming the captions to a single URL accessible by nearly any web browser. It also offers customizable font and color settings, which can be set by the captioner or customized by the user. The only real disadvantage is that, like most good things in life, it costs a pretty penny, starting at $6/hour and increasing from there, depending on how many users are connected at a time. At times, my Streamtext bill has exceeded $400/month, and while it's deductible as a business expense, it hasn't been much fun to pay that bill, knowing that I might have been able to get by with a cheaper or entirely free service instead. Still, I've been burned too often by inconsistent services to want to switch from Streamtext without an extremely compelling reason.

So what other variables are at play, besides the text streaming service? Well, if you're supplying your own devices to clients instead of requiring that they provide their own, you'll want to configure them properly. In my case, the Surface was easy; I just pointed Chrome at my all-purpose text streaming URL (http://stenoknight.com/nypl, since I first set it up for use at the New York Public Library, and have been using it for various other purposes ever since), which redirects to http://www.streamtext.net/player?event=nypl. That way, as long as I set up my Streamtext job to point to a file called "nypl", users only have to input my short Stenoknight.com URL instead of the long, awkward Streamtext URL. I've put a link to the site on both my Surface and my Galaxy Tab 2, for quick and easy access. On the Galaxy Tab 2, I initially tried Streamtext with the default browser and then with Chrome for Android, but neither of them supported full-screen viewing, and I didn't like how much real estate was taken up by the address bar and browser UI, so I installed the Dolphin Browser, which supports simple toggling in and out of full screen mode. The result is a clean, simple text-only interface on both tablets, with customizable font resizing and seamless transitioning between portrait or landscape mode, to fit each client's preferences. One of the three events I captioned was held outdoors, so I was able to crank up the contrast on both devices, with large white text on a black background, to compensate as much as possible for glare.

The last and ultimately most crucial decision was how to make the internet connection that would keep everything running smoothly. My preference when providing remote or wireless captioning is always to connect my captioning computer to a wired wideband Ethernet connection such as the one I use in my home office, but that's not always possible in every venue. At all three recent wireless captioning events, I had access both to institutional Wi-Fi and to the connection offered by my 4G wireless modem/hotspot, but my decision on which to use varied wildly with the circumstances. At the awards ceremony and the library gig (both indoors), the institutional Wi-Fi was strong and steady, faster than my 4G modem and more responsive, with significantly less lag time. At the outdoor graduation ceremony, however, the situation was reversed. The Wi-Fi signal was weak and patchy, dropping frequently and showing significant lag. My 4G modem, on the other hand, had a strong signal throughout, and I quickly switched all my devices over to it from the Wi-Fi during setup. The only disadvantage there, of course, is that the 4G modem has a limited range, and I was concerned that my client's connection would drop if the Galaxy Tab were brought up to the stage during the actual diploma-granting portion of the ceremony. My client decided to go without captioning for that part of the event, so it was never put to the test, but the range limitation is definitely something to keep in mind when choosing a hotspot-based internet solution over institutional Wi-Fi. I've heard of services such as Connectify, which claim to consolidate multiple internet connections as a sort of failsafe mechanism, but I haven't yet given it a try; definitely something to investigate now that the semester's wrapping up.

So there are some things to consider for onsite streaming to multiple wireless devices at public events. Please feel free to share your own tips and tools, if you solve these problems differently! There's always something to learn in this business, and the technology is advancing all the time, so it's important to keep up to date as much as possible. As more people start carrying smartphones and tablets, essentially providing their own caption-viewing devices, I foresee a boom in open-URL wireless device captioning for public events, and we captioners will need to be able to offer it.

Tuesday, May 14, 2013

Former CART Client Wins NSF Fellowship!!

CART providers are bound by rules of confidentiality not to disclose the names or details of people they've captioned for, but in this case my client graciously allowed me to use her name and link to her information. A few years ago, I captioned several classes (including Latin, one of my all-time favorite subjects) for Navena Chaitoo, an undergraduate at Fordham University up in the Bronx. Now she's graduating, and yesterday she informed me that she won a Graduate Research Fellowship from the National Science Foundation to further her education in public policy and management at Carnegie Mellon University! From the article she sent me:

“I was diagnosed with a severe-to-profound hearing loss when I was about 5 years old, and at the time, my audiologists relied on the latest medical studies to determine that I would probably never graduate high school,” said the Brooklyn native. “Ultimately, my parents knew better and saw to it that I had all the accommodations necessary to offset my hearing loss, which allowed me to be as successful as I am today.”

[...]

Chaitoo will continue research she began at Fordham on the economic wellbeing of persons with disabilities in the United States, particularly the indirect as well as direct medical costs of persons with disabilities—a topic in which she has been personally invested.


Navena is only one of countless examples demonstrating how important accommodations can be, and how much can be achieved if they're put in place. The communication access came from CART providers like me and the other captioners who've worked with her, but the brilliance, insight, and dedication all came from her. This woman is amazing, and I'm honored to have played a part in her success. I know she'll just keep going up and up from here, and I'll definitely be watching to see the great things she does in the future.

Tuesday, May 7, 2013

Thresholds and Tolerance

I'm not a fan of starting a blog post by quoting the definition of the topic in question; it's virtually always just a lazy attempt to co-opt some of the dictionary's presumed authority or credibility and doesn't add anything of substance to the author's argument. That said...

"Tolerance is the permissible limit or limits of variation in a measured value or physical property of a material, manufactured object, system, or service. [...] A variation beyond the tolerance [...] is said to be non-compliant, rejected, or exceeding the tolerance."

I'm quoting this definition because it refers to a specific technical meaning of an otherwise well known word. Most people aren't familiar with the word "tolerance" used in this sense, but it's a useful concept not just in mechanical engineering but in the provision of transcription services for Deaf and hard of hearing students and professionals. In my CART Problem Solving series, I addressed the popular misconception that a tolerance of 90% accuracy was acceptable, because most people think of 90 and 100 as rather large numbers that are pretty much equivalent to each other, even though language is such a fine-grained system that 100 words constitutes only about a paragraph of text, and a 90% error rate works out to an error in just about every sentence. I also talked about the ways in which human captioners are able to use lateral context clues to fill in the gaps of non-ideal audio conditions, while outside of a perfectly amplified, perfectly enunciated standard American accent, automated speech recognition systems go from almost adequate to laughably awful perilously quickly.

Tolerance enters the captioning sphere in other cases as well. Speed, for instance; if a professor's average rate of speed is 160 words per minute (quite a bit below the typical rate of speech, which tends to be between 180 and 220 WPM), a stenocaptioner (AKA a CART provider like me) with a speed of 240 words per minute will be able to achieve virtually 100% accuracy, because any errors can be immediately caught and corrected. A text expansion provider (using a system such as C-Print or Typewell) may have a speed of 140 words per minute or so, which means that if the professor's rate stays completely steady all the way through, they will probably be able to capture a good 85% of what's spoken. Since they're human and not just a mindless speech recognition system, they will give preference to writing down important things (names, technical terms, relationships between concepts), and will try to make sure that the remaining 15% of speech that they're too slow to capture consists mainly of "Um", "Uh", "You know", repeated words, irrelevant asides, and inefficient phrasing that can be tightened up and paraphrased to use fewer keystrokes. In some cases, that will be enough. The professor's speed will never rise above 160 WPM throughout the entire class, and there will be plenty of chaff to ignore, leaving enough time to take down the important content, even though the provider's writing speed is lower than the professor's average rate of speech. By contrast, the stenocaptioner will probably choose to leave out the "Um", "Uh", and "You know" sorts of filler words for clarity's sake, but will not omit repeated words or attempt to paraphrase the professor's wording, no matter how inefficient it might be. Stenocaptioners are focused on providing a verbatim realtime stream, only omitting words that add absolutely no value to understanding, while text expansion providers are focused on tightening up whatever they hear so that it can be written in as few keystrokes as possible. So far, so good. This is a case where stenocaptioning and text expansion are more or less equivalent, and the difference lies mostly in whether the client wants the pure, unmediated words of their professors to interpret for themselves, or whether they'd rather have a condensed version of the information delivered in class, more along the lines of the bullet points on a PowerPoint slide.

Change any of the factors in play, and the results will be very different. For instance, say the professor's average rate of speed is still 160 words per minute, but that's because his rate is 135 when he's writing formulas on the board (about half the class) and 185 when he's explaining what the formulas mean (the other half of the class). Or it's 140 for long stretches at a time, when he's lecturing on the information mandated by the syllabus, but it shoots up to 200 for brief moments, when he gets excited about a particular detail of whatever he's talking about. The stenocaptioner, whose top speed is 240 WPM, is still able to get 100% in all of these situations. The text expansion provider, on the other hand, will be able to handle the 135 WPM formula sections almost perfectly, but will start cutting or condensing words and phrases from the 185 sections, and will be forced to leave out over a quarter of the material from the 200 WPM sections. If this particular professor has a tendency to repeat words, insert lots of filler words, pause between sentences to take a drink of water, or otherwise speak in a lightweight, inefficient way, the text expansion provider might be able to deliver a workable portion of the class's important material, because there will be enough less important stuff they can cut out and still have enough reserve speed to write down the good parts.

If, on the other hand, the professor is an accomplished speaker, who says precisely what she means in precisely the way she means it, if her lectures are a constant stream of dense technical jargon and precise, specific descriptions of how everything fits together, if there's no chaff or filler to cut out and no awkward repetitions to rephrase... The text expansion provider is out to sea. They've got to start cutting important material in favor of leaving in vital material, and that becomes a dangerous guessing game when it comes to the grade of the student they're transcribing for. Text expansion services acknowledge this to a certain extent; they tend to say that CART is recommended when the material is technical or highly precise, such as in the graduate and professional programs that I specialize in. And admittedly, there are some classes and some subjects and some professors where a 140 WPM typing speed, as slow as it is when compared to a stenocaptioner's 240 WPM typing speed, is enough to deliver most important material given in the class.

The question is: How do you tell which situation you're dealing with? If you're a disability director and you're trying to decide between hiring a text expansion provider or a certified CART provider for a given student's schedule of classes, it may seem obvious to choose the former, since text expansion services are cheaper and more widely available. But have you audited the professors in all of the classes in question? Does their average speed always stay under that 160-180 WPM sweet spot? Is there enough extraneous speech to discard and paraphrase without losing important information? Are there ever spikes of higher speeds, and if there are, can you guarantee that none of that high speed material will appear on the test? Have you checked to make sure that there won't be any guest lecturers or student presentations during the course of the semester? Guest experts, since they're not used to speaking for students, tend to speak at 200 to 220 WPM or higher. One that I transcribed a few years ago spoke at 280 WPM, and I found myself starting to do the same sort of paraphrasing and chaff cutting that my text expansion colleagues do as a matter of course. I think I managed a good 90% to 95% of relevant material given in that lecture. But I didn't reach that paraphrasing threshold until I encountered a speaker at the high end of the rate-of-speech bell curve; for text expansion providers, it's their starting point. They don't have any speed in reserve, and if there's nothing extraneous to cut out, they start losing important material very quickly. Give them a 280 WPM speaker, and they're now losing a full 50% of everything that's spoken.

Of course, you could make the argument that most students without hearing loss don't take in 100% of every lecture. They might daydream or nod off, experience a moment of inattention, miss a word or two here or there while skimming through their notes from the class before. Even without getting every word of every lecture, many students do quite well. But where's the cutoff? How many words can you lose and still receive equal access? Which words can you leave out and which must you absolutely leave in? Who do you trust to make that call? It all comes down to tolerance.