Challenging Reality: Voice to Text

Continuing our productivity series, which began with my writing splint, will look at touch screens and styluses, and conclude with trackballs, we will look at voice-to-text technology.

Since 1998 I have been using voice-to-text software. It was very much in it's infancy. Version 2 of Dragon Dictate by Nuance Software was my preferred program at that time, their other being Dragon Naturally Speaking, which is what I'm using now. Dragon Dictate was a bit less capable at the writing aspect, but had greater control over cursor movement and other command functions. It no longer exists, but Naturally Speaking has integrated much of Dictate's functions.

At the time, NaturallySpeaking was designed to interpret full sentences, as it is now, but it was not terribly accurate. I even had a voice profile trained to dictate exam answers to a personal voice recorder that could later be plugged in to the computer, played and spit out my recording as text. I never trusted it because, even if the teacher or instructor could listen to the audio file, one inaccurately typed negative would have completely changed my answer. Now, one misspoken word or missed error can make for an undecipherable statement or wonderful laughing fit, as my best friend will confirm. This is especially true when chatting quickly and inaccurately, possibly after the intake of beverages that may result in reduced annunciation.

At that time, for the computer to accurately recognize what I was saying, it took at least four hours of training to start to show any semblance of accuracy and consistency. The best microphones available at the time, the one pictured above, were quite expensive and paled in sound quality to the $17 USB connected headset I'm using right now. The computers of the time were slow and it was a measly 266 MHz Intel processor with a 6 GB hard drive and probably 256 MB of RAM that first powered the software I used. The laptop I had the software installed on had a 200 MHz processor, 2 GB hard drive and likely 128 MB of RAM. To say that dictation was smooth and effortless would be a lie. After a completed phrase or sentence a pause was required to allow the auto correct box to appear with the 10 options given to choose from, hopefully one of which was what you actually said if the program got it wrong on the first try. It was tedious at times.

Now, except for pausing when I want to see what it interpreted the last thing I said was, I can speak multiple sentences without pausing and get near-perfect accuracy. In fact, the more I say and the larger the words I use, the more accurate it is. More letters means there are fewer words with those sounds and syllables so the likelihood of it being accurate increases. It really does encourage sophisticated writing which is not a problem for me.

A sluggish computer with growing software was only marginally faster than standard typing. As with so many computer-related things the data is always pushing the boundaries of the hardware. Video files keep getting bigger with the unbelievable 4K resolution that is now emerging, video games keep requiring better video cards and sound cards, additional hard drive space and other hardware requirements. Digital cameras continue to produce higher megapixel files (the megapixel race is over, please do not any longer be fooled into thinking that more megapixels means a better camera, but I digress) which requires more hard drive space. What once seemed like far more computer than you would ever need becomes laughable as inadequate within five years for most current applications.

There was a long period of time that I ceased using the software which came at the time I decided to upgrade my computer. You see, the hundreds of hours I had spent refining and updating my voice file caused it to increase in size to 8 MB. This is still in the time of 56.6 K modems and floppy disk drives. No available media could store that one voice file to be transferred to the new computer. Only 100 MB zip drives, which were outside my budget. As a result of not wanting to start from scratch, I stopped using the software for some time until I was encouraged to explore it, again.

The training time had diminished vastly, the accuracy after a mere 30 minutes of training was nearly perfect and, with a few features disabled, the hardware could keep up to the software to make for fluid writing. Now I keep the program running all the time and even flip on the microphone for a quick comment online.

I won't deny that I feel a little bit of jealousy when I think about the effectiveness of Google Voice, and even Siri for people who have never even used it before. Without any training, and only an Internet connection, these services are remarkably effective at accurately converting almost anyone's speech to accurate text. No hundreds of hours of training and refinement, just effective software. However, I have been enjoying the benefit of this technology a lot longer than most people and even those who have had smart phones much longer that I have seen amazed when I effectively use Google Voice.

Where these new technologies fail is in their ability to be customized. Dragon NaturallySpeaking can accurately insert any and all punctuation, navigate throughout a body of text or window, even add phrases specific to your needs. Three of my more common trained phrases are D700 (my camera), Help-Portrait and Jay. Without training you are most likely to get "D 700", "help portrait" and "J". By training them I save myself multiple words on a regular basis. In the case of Help-Portrait, to get those results without training, I would have to say 'cap help no space hyphen no space cap portrait. Until you get your brain wrapped around that it is a little bit confusing.

Dragon NaturallySpeaking can also be told to assume you have an accent when you begin the initial training. This way it can accurately write those words that you may not speak perfectly if English is your second language or you come from somewhere that people tend to have a strong accent. And, as much as I mourn the rapid decline of proper English usage, there is even a profile or two identified as teen-speak which can include and adapt to numerous slang terms and acronyms. There is a glimmer of hope, though, as the auto punctuate function might just assist a few people to learn appropriate places to insert dashes, commas or even a semicolon.

After 16 years of using this software my brain is thoroughly trained to use it. When I was on the verge of making a decision between upgrading my current Dragon NaturallySpeaking or switching to Windows' built-in dictation software I had a very difficult time using the variations of commands in the Windows version. I'm not saying the Windows software is not effective, but after that long of using a certain program it is well ingrained in my brain. My wife will attest, the occasional time she replies to an e-mail or text for me, that I often speak the punctuation as I do using this software when I am telling her what I would like my reply to say. The point is that if you do decide to give the software a try do your research and choose one because once your brain is trained it'll be like learning another language to switch between software. I think I have done a fairly will good job, when I think while using it, between dictating using Dragon NaturallySpeaking or dictating on my phone using Google Voice.

Source: The software can be purchased online or in most stores that sell computers. The microphone I use was purchased from DealExtreme. When searching for a microphone read reviews to see how effective it is at dictation as some are better than others for that purpose.

Challenging Reality

Monday, December 23, 2013

Voice to Text

No comments:

Post a Comment

Article Archive