From Zero to Five, to whatever Number that ensures Sample Representativeness

Goodman et al’s (2012: 304) assert that “… usability is a means of directed product evaluation, not scientific inquiry.” What do they mean?

A reason why usability tests are not scientific inquiry can be found just two pages following said Goodman et al assertion – “Usability tests are not statistically representative” (authors’ emphasis.) In science, statistical representativeness requires strict sampling procedures to ensure the sample is representative of a well-defined target population. Random sampling and appropriate sample size are among the ways to ensure sample representativeness.

The meaning of “random” is not what lay people usually use this word for. Only when everyone in the target population has equal chance to be included in the sample through the sampling process can we say the sample is randomly selected. Therefore, sitting at a coffee shop and testing willing customers does not give us a random sample.

“Being scientific” may be the gold standard of scientific research, but in the context of design and usability tests, this standard is not feasible in most design cases. Even in the rare cases where it is, it is most likely not desirable.

On the feasibility front, the sample size required to achieve representativeness may be beyond what the design project can afford, and the time it takes to complete the tests may be too long for the time line of the product development the tests are for.

Regarding the desirability of adhering to scientific methods in conducting usability tests, the resources these methods require can be better spent in not-so-scientific usability tests. For example, Nielsen (2000) finds the marginal benefits of testing more than five users significantly drop, as the graph below shows. Later and more tests (Nielsen 2012) confirm such a finding.



Source: Nielsen (2000)


Another non-scientific aspect of usability tests is the difficulty in replicating findings. Studies adhering to scientific methods are supposed to be replicated in findings if the same methods are used, but replicating findings exactly is not usability tests' forte. However, does it make good sense to try to make usability tests' findings replicable? Nielsen (2011) argues against attempts to find all issues in usability. The marginal benefits in relation to costs as demonstrated in the graph above is one of the reasons behind this argument. The other is that he finds most websites, applications, and mobile apps have serious usability issues. By focusing on the big issues, usability tests can significantly improve the key performance of the website or application. This is the 80/20 argument, which makes good sense from the pragmatism viewpoint.

Heuristic evaluation provides arguably an even more cost effective way to evaluate usability than usability tests do, because it has proven to be able to find the majority of major and even minor problems usability tests can (Nielsen 1995). However, sometimes heuristic evaluation may not find some of the problems that usability tests on the same design uncover. One such scenario is that the experts who conduct the heuristic evaluation do not have the specific domain knowledge (Nielsen 1995). In this case, it would be appropriate to make design decisions based on usability tests.

In all, usability tests are not scientific inquiry, but they have an importance place in the design process. As Nielsen (2000) indicates, a critical take-away of the said graph is: when the number of tested users is zero, we find zero usability problems. Therefore, one is much better than zero, and five can just hit the sweet spot. It doesn’t seem nearly as daunting as the kind of sample size that is required to qualify as scientific inquiry. Does it?


References
  1. Goodman, E., M. Kuniavsky, and A. Moed. (2012). Chapter 11: Usability Tests. In Goodman, E., Kuniavsky, M., & Moed, A. Observing the User Experience : A Practitioner's Guide to User Research (2nd Edition), pp. 273-326. Saint Louis, MO: Morgan Kaufmann. Retrieved from http://www.ebrary.com.
  2. Nielsen, J. 1995. Characteristics of Usability Problems Found by Heuristic Evaluation. Access via http://www.nngroup.com/articles/usability-problems-found-by-heuristic-evaluation/ on January 30, 2015.
  3. Nielsen, J. 2011. Accuracy vs. Insights in Quantitative Usability. Access via http://www.nngroup.com/articles/accuracy-vs-insights-quantitative-ux/ on January 30, 2015.
  4. Nielsen, J. 2000. Why You Only Need to Test with 5 Users. Accessed via http://www.nngroup.com/articles/why-you-only-need-to-test-with-5-users/ on January 30, 2105.
  5. Nielsen, J. 2012. How many Test Users in a Usability Study? Accessed via http://www.nngroup.com/articles/how-many-test-users/ on January 30, 2015.


No comments:

Post a Comment