6. Quantitative User Testing Methods

Quantitative User Testing

Quantitative user testing is used for measuring the usability of an interface. Unlike qualitative user testing, which focuses on users’ experiences, perceptions, opinions and feelings, quantitative testing requires a precise measuring instrument and insights are derived from a mathematical analysis.

Typically, qualitative testing is for answering questions such as why or how to fix a problem, whereas quantitative methods are for questions such as how many and how much. Often both qualitative and quantitative data is collected in one study – it helps to understand how often a problem occurs, as well as why it occurs and how exactly it affects user experience; quantitative research is often used for validating hypotheses formed during qualitative research.


Image Credit

Quantitative testing is often used as part of the Lean UX approach.

Introduction to Lean UX

Lean UX is an Agile approach to UX design that focuses less on formal requirements, deliverables and UX reports, more on maximizing user experience by obtaining feedback as early as possible and making many quick design decisions and iterations until a product fully meets users’ needs.

Problem and Solutions Hypothesis Testing

Designing a new product is risky, there are always many uncertainties. Designers are likely to have some assumptions of what users need and prefer, and what could be a solution to a problem they have – lean UX turns the assumptions into hypotheses that need to be tested in order to reduce the uncertainty and to build a product that works. In order to test the hypotheses, a minimum version of a product (minimum viable product) with limited functionality (so much effort is not put into something that does not work) is developed and tested. The key is not to wait for perfection when creating something new, but create and test a minimum version and quickly iterate the product based on the feedback until it is exactly what your users need.


The process of Lean UX hypothesis testing:

  1. Declare all assumptions and create hypotheses to test them. Hypotheses might be problem hypotheses – assumptions about what users need, or solution hypotheses – what design meets users’ needs.

You can use the simple format below for creating hypotheses:

We believe that … is essential forThis will achieve…. We will have demonstrated this when we can measure…

An example: We believe that showing help messages is essential for new users. This will achieve a higher level of sign up completions. We will have demonstrated this when we can measure an improvement of the current sign up completion rate of 40%.

  1. Create a minimum viable product. It should be the most basic version of the product, however, still functional and realistic enough to test it.

Image Credit. Copyright message in the source: “Author/Copyright holder: Kirill Shikhanov.. Copyright terms and licence: All rights reserved”

  1. Organize testing to test the hypotheses. It should be quick – the results should be delivered soon, no need to spend time creating fancy presentations and meticulously documented outputs. Testing 20 participants usually offers a reasonably tight confidence interval when collecting usability metrics.
  1. “Use the feedback to refine the product.
  1. Start again with Step 1 until data shows that your products is exactly what users need.

Usability needs to be objectively measured in order to test hypotheses, e.g. if a hypothesis is that a one-step checkout is easier to use, you need to define what you mean by “easy to use”. Usually Key Performance Indicators (KPIs) are set when measuring usability, e.g. 90% success rate, otherwise it is very hard to test hypotheses.

Defining and measuring usability

Usability is the extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use (ISO 9241- 11).

Based on the definition above, bellow are the attributes of usability that need to be measured:

  • Effectiveness – accuracy and completeness with which users achieve their goals. Measurements: The most common measurement is a success rate – the percentage of users who managed to complete specified tasks without any help. Alternative measures include a disaster rate – the number of users who think they have completed a task successfully, e.g., ordered a product, but in fact they did not; also the number of errors per unit of time.
    Calculation of the success rate:
    Success rate = the number of tasks completed successfully / the total number of tasks x 100.


Task 1 Task 2
User 1 Yes Yes
User 2 No Yes
User 3 No Yes
User 4 Yes Yes
Success rate 50% 100%

  • Efficiency – accuracy and completeness of goals achieved in relation to resources. Measurements: Efficiency is usually measured as the average (mean) time taken to complete each task. Other measurements include the time taken to complete a task at a first attempt; time to achieve expert performance.
    Some users take very long time to carry out tasks thus skew the results by making the average time to complete tasks higher. To avoid that, you could calculate a geometric mean instead (use GEOMEAN function in Excel).


Example efficiency calculation:

Task 1 Task 2
User 1 56s 400s
User 2 533s
User 3 2140s
User 4 88s 622s
Geo mean 70.2s 729.9s

The times of participants who did not complete the task are not taken into account.


  • Satisfaction – positive attitudes towards the use of the system, how users feel about the experience after using the interface. Satisfaction is an important aspect of usability because no matter how efficient the system is, users might not like it, thus not use it. Measurement: it is often measured as a mean (average) score using some established questionnaire.
    System Usability Scale (SUS) is commonly used for measuring satisfaction, it can be downloaded here:http://www.usabilitynet.org/trump/documents/Suschapt.doc

How to calculate SUS score: first sum the score contributions from each item. Each item’s score contribution will range from 0 to 4. For items 1,3,5,7,and 9 the score contribution is the scale position minus 1. For items 2,4,6,8 and 10, the contribution is 5 minus the scale position. Multiply the sum of the scores by 2.5 to obtain the overall value of SU. SUS scores have a range of 0 to 100. For the example above, the total score is 22, the SUS Score is 22 *22.5 = 55.

Unmoderated Testing Success Rate and Time on Task Testing

Unmoderated user testing is usability testing that uses software to administer tasks without a need for a facilitator, sometimes follow-up questions are built into the study. It allows using many participants and getting statistically precise results in a relatively short time.

The disadvantages of this method are that there is no opportunity to ask detailed questions related to a user’s action, participants are not reminded to think aloud, also, participants have no real time support if they are having some problems with setting up or carrying out tasks (though you could provide your telephone number).

Unmoderated testing is recommended when the main focus of the study is on a few specific elements, rather than an overall overview of a system and user experience. It is very useful in calculating the success rate and average time to carry out tasks, in fact, most of the vendors calculate important metrics such as task completion, task time, overall perception automatically by default, Userzoom even generates heatmaps.

It is very easy to set up, you just need to supply a list of tasks you would like participants to perform (same procedures as discussed in the qualitative testing sections) and select a pool of representative users – most of the testing software vendors allow choosing participants from their user panel. Please refer to selected vendor’s instructions.

Popular unmoderated testing software:

NB: Not only quantitative methods (usability measures) are used in Lean UX. Often qualitative research is used at the beginning to generate hypotheses, quantitative to test them. Sometimes qualitative research is needed to explain quantitative data obtained by testing hypotheses.

Login/Register access is temporary disabled
Compare items () compare