Thursday, December 15, 2011

Review: Siri Dictation Accuracy

How does Siri compare with Dragon Dictation?

Summary:

If you want to dictate text on an iOS device, there is no substitute for Siri -- especially not the free Dragon Dictation app. However, for Siri to work at its best you need to use a headset and, if you really want to do voice input efficiently, there is still no substitute for a proper computer.

Introduction

The headline feature of the new iPhone 4S is, of course, the built-in, voice controlled virtual assistant, Siri. However, while most discussions of Siri have focused on “her” performance as an assistant, this review will focus purely on the speech recognition accuracy of the feature. This introduces the question as to whether the voice-based text entry feature of the 4S is technically part of Siri or something else, but since the voice entry feature is not available on any other iPhone, for the purposes of this review the dictation feature will simply be referred to as “Siri”.

The accuracy of Siri's speech recognition is of interest to a number of parties. First, many users who have not upgraded to an iPhone 4S may be wondering if perhaps, assuming they can forego the virtual assistant features, the free Dragon Dictation app might be almost as good a way of achieving quick and efficient bulk text entry on their older phone, without shelling out for a new 4S. Of course, for iPad users, this is still the only option. Furthermore, accuracy of the speech recognition is of interest to 4S owners, even if they rarely use the Siri feature for text entry, because accuracy is equally important when instructing Siri as to a task to be completed.

This review compared the performance of Siri for voice-based text entry compared with Dragon Dictation under a number of different circumstances.

Objectives

This experiment sought to identify at least preliminary answers to the following questions:

Is there any significant difference in performance between Siri and the Dragon Dictation app? (This is particularly important given that one option is essentially free and the other requires most users to purchase an entirely new phone.)
Indeed, is there are any indication that these in fact use the same back-end processing?
Is there any notable difference in performance between using the built-in microphone and Apple headset for speech input? (Also, how does the built-in microphone on the iPad 2 compare?)
How do these portable and network-dependent modes of speech input compare to a fully fledged PC processing the speech locally?
How feasible is the use of iPhone/iOS dictation software for large-scale text input using voice?

Experiment Design

When testing the accuracy of speech recognition, it is usually necessary to have a set of standard text that is read out using each system and then compare the number of differences to the original text. Unfortunately, in practice this seems to give somewhat overinflated accuracy results compared with what might be expected when dictating “off the cuff”, probably because text that is read tends to be spoken more clearly when reading than when the brain is working on actually formulating the words in addition to simply pronouncing them. However, at this approach at least gives results that are relatively comparable.

As part of the experiment I also used the Dragon NaturallySpeaking software on a Windows PC as effectively a benchmark of speech recognition performance. Because of this, I sought to avoid choosing text that might contain specialised terms that the computer software was already familiar with from my past usage of it, thereby giving it a significant advantage over Siri and the Dragon Dictation app. I also wanted to select text that did not contain a large number of names as these would be more likely to trip up the software and thereby (somewhat artificially) inflate the error rate. Finally, I wanted to use text that was a probably not something that had been dictated to the software before and thereby might be new to it.

In the end I settled on the first part of the Steve Jobs Stanford commencement speech. The text contains very few names (essentially only "Macintosh", "Reed College" and "Hare Krishna") and is otherwise relatively straightforward, although there are few slightly tricky words and phrases such as: “proportionally spaced fonts”, “serif and Sans serif typefaces”, “drawer”, “calligraphed”, and “dorm room”. While it is possible that this text has been dictated before, it does seem somewhat unlikely and the results don't seem to suggest this. In the end, only the first seven paragraphs were used, totalling 727 words. This provided enough text to get a reasonable measure of recognition accuracy and is also probably close to the upper limit of what someone would ever want to dictate on their mobile phone.

The results were analysed, recording the number of words from the original source text that were wrong in some way in the transcribed text. A separate count was also kept of how many words from the source text were omitted in the transcription. If the transcription resulted in a very slight difference that a typical user probably wouldn't bother to correct (for example, concatenating two words that could be written either separately or with a hyphen) then these differences weren't counted. However, all other errors that would require correction were included.

A number of different combinations were tested. Obviously Siri was used to input text. This was put into the Notes application and then copied and pasted into a Word document in DocsToGo. Because the amount of text that can be transcribed by Siri at a time is limited, the source text was broken up into a series of chunks and each of these dictated consecutively. Direct Siri transcription into DocsToGo was attempted, but this was unsuccessful after the first chunk of source text in what appears to be a bug in the DocsToGo app. The Dragon Dictation app was also used and the same individual chunks dictated consecutively as with Siri. When complete, this text was also copied and pasted into DocsToGo. All of this transcription was performed on a 32 GB iPhone 4S running iOS 5.0.1 and connected to a fast Wi-Fi network. A sample run using the Dragon Dictation app was also performed on a 32 GB iPad 2 for comparison. Finally, the same text was also dictated using the Dragon NaturallySpeaking program on a Windows XP Core 2 2.4 GHz Dell Latitude laptop system with 4 GB of memory using a Logitech USB headset. On the day of the test Dragon NaturallySpeaking recorded an audio input quality value of 20, which is acceptable although not particularly high. To compare the effectiveness of the inbuilt microphone with the Apple-supplied headset, both of these combinations were tried with Siri and the Dragon Dictation app. Also, repeat runs were performed with both Siri and Dragon Dictation (using the microphone) in order to potentially identify any “learning” being performed by the software and possible improvement in the actual dictation over time. Finally, background ambient noise was measured at between approximately 30-35 dB and remained relatively consistent for each run. The experiment was conducted in a relatively quiet private office and represents probably about as close to optimal conditions as a user could reasonably expect.

Results and Discussion

Given that the online tech press and various blogs have suggested that Apple is using Dragon technology to process the speech input to Siri, it was my expectation that the performance of the Dragon Dictation app would essentially mirror that of Siri when it came to text input and transcription accuracy. In fact, the difference between the two of them was quite obvious.

The graph below shows the percentage accuracy of the various combinations tested. The first thing to consider before looking too closely at the graph is that these absolute figures don't necessarily give a particularly useful picture of how well each combination performed. In particular, the magnitude of difference in performance between the different combinations is not necessarily particularly well reflected by the percentages shown. Therefore, it is perhaps best to consider this graph as effectively a ranking of performance.

Not surprisingly, the benchmark system of Dragon NaturallySpeaking on the laptop performed the best, while Dragon running on the iPad 2 using the internal microphone performed worst, perhaps indicating that the microphone on the iPad 2 is not as optimised for capturing voice as that on the iPhone. When comparing the use of the microphone versus the headset for either piece of software, the headset shows a clear advantage.

The graph also demonstrates that, whether using the microphone or headset, Siri outperforms Dragon. However, as stated previously, these absolute numbers don't give a meaningful indication of the actual magnitude of the difference in performance between the different combinations tested. While 98% is obviously clearly superior to 84%, even the smaller figure sounds very good and a difference of 14 percentage points does not seem particularly significant. Furthermore, the difference between the laptop on 98% and Siri using a headset on 97% sounds negligible. A look at the raw results shows this is not the case.

The graph above showing the raw number of word errors per system gives a much better idea of performance. From this graph, we can now see that using the Dragon Dictation app on the iPad in fact resulted in more than nine times the number of word errors than the NaturallySpeaking benchmark on the laptop. In terms of the effort required to edit and correct this text, this is night and day. Similarly, while the difference in percentage errors between NaturallySpeaking on the laptop and Siri on the headset was only 1 percentage point, the raw results show that Siri in fact had almost twice the number of errors.

However, the most important result from this graph is that Siri clearly outperforms the Dragon Dictation app in all comparable circumstances. For example, when using the headset with both, Dragon has exactly twice the number of errors as Siri (50 compared with 25). When using the microphone, across both runs, Siri also handily outperforms Dragon: 84 errors compared with 65 on the first run and 91 errors compared with 55 on the second. Therefore, during the experiment's Dragon’s performance ranged from being between approximately 30% and 100% worse than Siri. In fact, Dragon using the headset was only a little better than Siri using the inbuilt microphone. Given that each error necessitates a frustrating and time-consuming correction, this is a significant difference.

These results very strongly suggests that the speech recognition processing being performed by Siri is in fact quite different to that being used by the Dragon Dictation app and, in fact, Siri's processing is (perhaps surprisingly) clearly superior. It has been reported that Dragon has licensed its speech recognition technology to Apple for use in Siri, but it does appear that there is something that Apple has done to significantly improve recognition performance. The alternative is that there is some technique for improving recognition that Dragon is deliberately not employing when processing the speech from their own software -- even though they have made the feature available to Apple. If this is the case it may be for reasons of economy, as additional processing would likely require greater back-end resources etc. However, it's not exactly a great advertisement for the desktop software that company is trying to sell -- even though that software actually performs very well relative to these mobile-based systems.

The other clear-cut result is the difference that using a headset makes to accuracy, regardless of which system is used. For example, Siri was between 2.2 and 2.6 times more accurate when using a headset compared with the inbuilt microphone. In practice, this is a substantial and important improvement. Dragon also improved, although the difference was less significant, being between 1.7 and 1.8 times more accurate when the headset was used.

Note that the order of the graph above may seem slightly odd as the results from the respective systems are not grouped together. However, the order given on the graph reflects the order in which the systems were actually tested. This order was selected in order to help identify any possible learning effects. For example, was Dragon disadvantaged the by being tested first if, hypothetically, both systems were sharing the same speech recognition back-end processing and learning in effectively real-time? However, repeat runs of the main systems being considered (iPhone using the inbuilt microphone with either Siri or Dragon) did not appear to show any significant learning effects. For example, Siri improved slightly going from 65 to 55 errors between the two runs. However, Dragon actually got slightly worse, going from 84 to 91 errors. Certainly the results suggest that Dragon is not actually learning and improving. Siri could be, but multiple repeat runs would be required to identify this. It's entirely possible that Siri's improvement is due to improvements in the fluency of the dictation from repeated practice. If there is an improvement in Siri over time with respect to a particular set of text or phrases, it's probably quite small. Also note that, during the experiment, no corrections were made to the transcribed text, which also would be expected to limit any learning effects.

Also recorded was the number of missed words that were completely omitted from the text rather than being apparently misrecognized. The graph above shows the absolute number of such errors for each system in green, while that below shows the percentage these constituted in terms of total errors.

To get a meaningful picture of these results, both absolute and proportional numbers need to be considered. Perhaps what is most notable when looking at both graphs is the high number of missed words affecting Siri when using the microphone. For example, in both runs, Siri using the microphone headset missed 7 words while the otherwise extremely poorly performing iPad 2 with Dragon missed only 6. Consequently, missed words represented a far higher proportion of the total errors for Siri using the microphone than for any other system. (The laptop using NaturallySpeaking also had an unusually high proportion, but this is probably a reflection of the very low number of errors the laptop benchmark had more than anything else.) While Siri easily outperformed Dragon overall, it also clearly missed more words than Dragon across both runs using the microphone. For whatever reason, missed words are something of a weakness for Siri when using the microphone, even though its overall performance is still very good. This effect seems to disappear entirely when a headset is used. Interestingly, the effect also doesn't seem to apply to Dragon Dictation, again reinforcing that the processing used by the two systems is quite different.

Conclusions

While I began the experiment expecting to find that Dragon Dictation and Siri performed extremely similarly, perhaps indicating a commonality of speech recognition processing, the results were in fact very different. The first very clear lesson of this experiment is, in terms of the number of corrections required, Siri easily beats Dragon under all comparable circumstances. While the degree of superiority may be only moderate (although still very obvious) when using the microphone, the results from this experiment show Siri can be twice as accurate as Dragon when using a headset. If you are someone intending to use your phone for any kind of extensive text input, it may very well be worth spending the extra to upgrade if you don't already have an iPhone 4S. In particular, in most circumstances Siri is also far more convenient to use than Dragon Dictation for text input as there isn't the need to load a separate app and then copy and paste the text into its final destination. There is no question: if you want to dictate text to your phone, get Siri.

The second lesson is, if you are about to dictate a anything more than a quite small amount of text, You should definitely pull out your headset. While Siri is still outperformed by NaturallySpeaking on a laptop, the number of errors more than halves when using a headset. While headsets are obviously far less convenient than simply using the inbuilt mic, in the long run the headset (assuming you have it with you) will save you time that would otherwise be spent making corrections. In fact, Dragon Dictation also benefits significantly from the headset, although not as much as Siri. Similarly, the poor performance of Dragon on the iPad suggests that the microphone on this device is just not as good. Perhaps this is one of Apple's reasons for not bringing Siri to the iPad, at least not yet, although this is a disadvantage that can probably be avoided entirely by the use of a headset.

Finally, is Siri on an iPhone 4S an effective substitute for Dragon NaturallySpeaking on a full computer? If you're using a headset then, for moderate amounts of text, it's probably okay as an occasional substitute. However, you will still end up with almost twice the number of errors needing correction. Similarly, unlike with NaturallySpeaking on the computer, Siri only allows you to correct certain words that the software thinks it may has misunderstood, and often it simply doesn't know when it is made a mistake. If Siri does in fact learn and improve, allowing users to all correct recognition errors will be an important step forward. Furthermore, the need to break the input to Siri into chunks and then wait while each of these is processed and transcribed means that it is not particularly enjoyable to input text in this way. In contrast, a user can dictate continually to the NaturallySpeaking software, stopping only when they are ready. For inputting large amounts of text, a fully fledged computer is still vastly superior. Nonetheless, the performance of Siri (especially with a headset) remains extremely impressive and will no doubt continue to improve over time.

Tuesday, November 29, 2011

Review: Case-Mate Tough case for iPhone 4S

Paid: $35

Purchased: eBay

Summary:

Looks good, feels great and (mostly) provides excellent protection, but still with one or two minor flaws.

Case Philosophy

Even more so perhaps than the gadgets themselves, cases tend to be quite personal. People look for different things and can sometimes have very different requirements, so I'm going to begin with a word on my philosophy on cases. To my mind the most important thing about the case is that it provide protection. A case that provides no practical protection merely covers up the device and, particularly with visually-striking devices such as iPhones, hiding an aesthetically beautiful device behind a case that provides no meaningful protection seems pointless.

However, there are different types of protection. For example, some review sites seem to care a lot about whether as much of the device's surface is covered as possible. I can't see the point in this -- fine dirt will get in no matter what you do. To me protection means saving the expensive gadget from a potentially catastrophic impact -- something the glass-covered iPhone 4/4S is particularly vulnerable to. After all, if you just care about protecting the phone from scratches, you may as well simply invest in some cheap and unobtrusive protective film while enjoying the world-class industrial design (and hoping you don't get slippery fingers).

When protecting against impacts, in my view it is largely unnecessary for every single surface to be covered up. For example, some cases provide elaborate removable plugs that can be removed to access dock connectors and side volume buttons. I can't see the point in this as exposing these parts of the device provides a negligible reduction in protection against impact as long as some part of the case extends beyond the device on any axis along which the phone may impact the ground if dropped.

Finally, I prefer cases to look good and be at least a little stylish. My preference here is usually for leather, but leather cases for iPhones seem to fall in to one of two categories: they are either difficult access or relatively unprotective. The difficult to access cases include various pouches and flip cases where actually activating the device involves uncovering/removing it from the case in addition to pressing a button. As someone who uses their phone regularly and actively, I know I would find this just too annoying in practice. The unprotective leather cases leave some edge of the device exposed (either of the top or the front face) such that, if the device were dropped on the wrong angle, the case would do little to mitigate the impact. Having experienced the results of such an impact stemming from an unprotective leather case on my previous phone (iPhone 3GS), this is something I specifically wanted to avoid with the mostly glass 4S.

So, in short, what I was looking for in an iPhone case was something protective against impacts (particularly drops) and looks reasonably classy while still being easy to access and use the device. So how does the Case-Mate Tough case measure up to these requirements?

“Rambo, only pocket sized?”

The Case-Mate Tough case packaging comes with the tagline: Think Rambo, only pocket sized. A big claim, at least with respect to protectiveness -- I'm not sure the Sly Stallone's mullet could be considered classy.

The source of these claims comes largely from what Case-Mate refers to as "dual layer protection". That is, the case itself is made up of two separate pieces -- an inner layer of silicon-like material and an outer, more rigid, layer of polycarbonate plastic. The idea is that the combination of these two materials provides superior protection compared with cases made solely out of just one, although it should be noted that the inner silicon layer is somewhat thinner compared with most stand-alone silicon wraps.

Obviously, to test this scientifically would involve propelling multiple iPhones in different types of cases at various hard surfaces and measuring the results. As I'm not in a position to perform such an experiment, I'll only say that the claim makes sense to me both intuitively and having handled the case itself. I've never entirely trusted hard cases to protect devices properly if an impact comes in at the wrong angle with the force being transferred directly through the case and to the device itself. Similarly, I've always been sceptical of the ability of a silicon case to absorb significant impacts as the case itself has no ability to transfer the force of an impact in a way that avoids this going through the device. (However, I must confess that my wife's silicon case seems to have done a reasonably good job of protecting her iPhone 4 against all kinds of abuse... although this may be largely luck!)

To my way of thinking, the Tough case's way of dealing with impacts mirrors that of car manufacturers. That is, modern car body shells are designed such that impacts are transferred through the shell and, as much as possible, away from the occupants while airbags are used to cushion occupants against those impacts that can't be so transferred. The hard, outer polycarbonate shell of the Tough case extends beyond the edges of the phone itself in all directions and, it seems likely, in the event of a drop, the impact would largely be transferred through the hard shell parallel to (rather than directly through) the device itself. However, the silicon inner layer also extends past the edges of the phone (and indeed the hard shell) in most directions, and it seems likely this would partially absorb heavy impacts before these even reach the hard shell. In particular, this provides a significant buffer to the front face of the phone although, if dropped face-first on to a protruding surface such as a rock, the case would obviously be of little benefit (although screen protectors would likely help somewhat). The only area of the phone not protected in this way by the silicon shell is obviously the back of the case where the outer shell fully covers the soft silicon inner.

Rambo's Soft Spots

The Tough case does, however, have a couple of potential weak points when it comes to protection. On the top of the case, the hard shell only extends very slightly beyond the top of the sides of the phone, with the result being that the majority of the phone's top is protected only by the silicon shell. While the hard shell extends slightly past the top of the rear of the phone, if dropped at exactly the wrong angle towards the front, the outer shell would probably provide very limited protection.

Note the join between the hard outer shell and the soft inner silicon layer.

Fortunately, the top of the phone is where the silicon layer is at its most thick. However, the same is not true of the bottom. Here, the silicon layer runs both above and below the phone's dock connector and speakers. However, the layer above (aligned with the phone's front face) is relatively flimsy and exposed. Although fitted more tightly than many silicon shells, it takes very little pressure for this to slide away from its intended position, leaving the edge of the phone exposed. It seems distinctly possible that, if dropped on the wrong angle, the silicon could be pushed out of position during the impact and a substantial amount of the force transferred to the phone. I imagine that you would have to be pretty unlucky for this to happen in practice, but it is an unfortunate weakness in the design.

The piece of silicon along the lower top edge is not well supported and prone to movement.

Perhaps worse is the way the bit of silicon at the bottom of the phone constantly slides around when touched. For example, it irritatingly stretches and slides out of position when putting the phone into or taking it out of my jeans pocket. I imagine this could be easily fixed either by linking the top and bottom of the silicon shell along the phone's bottom edge in line with the screws or, even better, by extending the hard outer shell in this area. It's fair to say though that these are relatively small criticisms -- the danger of damage from impacts due to this issue is probably very small -- but I do quietly wonder whether the wear and tear from stretching the silicon in this place will cause problems long term.

Shows the undesirable flexibility of this part of the silicon layer.

Rambo in a Tux

So the case is very protective... but does it look any good? In short, yes. The silicon inner layer is certainly nothing to write home about, but this is mostly covered by the hard outer shell. Fortunately, this polycarbonate shell is actually, somehow, a bit special. Unlike many similar shells, it doesn't feel or look plasticy. In fact, it has a slightly soft, slightly grippy feel that actually feels great in your hand. Unlike many plastic shells, it has an matte finish meaning it doesn't pick up scratches easily (for example, from being squashed into the same pocket as your keys) and its almost rubbery surface means it doesn't feel like it will easily slip out of your hand. Without wanting to raise expectations too high, just about everyone who has picked up the case has actually commented on how nice it feels. Considering the case costs little more than many silicon shells, arguably it represents pretty good value. Is it as classy as leather? No. It certainly doesn't visually scream quality the way a really nice leather case can. But it still looks good and feels even better when you actually go to pick it up.

Practical Hero

There are certainly more protective cases out on the market but, as I said, these seem to revel in their impracticality. Unlike many of these, the Tough case adds very little bulk -- I would say little more than a good quality silicon wrap. Similarly, I haven't had any problems accessing the dock connector or other external controls. The coverage around the earphone jack is reasonably tight though and might cause problems if you have a bulky headphones plug. I certainly haven't had any issues so far though. The volume buttons and lock button are fully covered, but easily operated. The cut out around camera is quite wide (not the same as some pictures I've seen on the web of what must be an older model of the case), but this is unlikely to cause any problems unless you happen to drop the camera lens onto a small rock and the wide design probably helps ensure that the flash throws as wide a beam as possible.

Note large cutout around the camera.

So no practical complaints? Well, the flimsy piece of silicon along the bottom that is easily stretched when taking the phone out of a pocket is easily my biggest gripe. The only other practical problem I can see is that the silicon layer, like most silicon wraps, seems to attract pocket lint very easily. Fortunately very little of the silicon layer is exposed and the lint seems to come off easily enough, so I'm really just nitpicking here.

Conclusion

If you want a case that gives easy access your phone while providing well-above average protection and looking/feeling great (albeit without the class of a less practical/protective quality leather case), Case-Mate Tough case is an excellent option. If it weren't for flaws in the design of the silicon layer, particularly along the bottom, I would be raving about it. Still, these are relatively minor criticisms and, overall, if these requirements are lined with your own, this case definitely deserves to be on your short list.