Preparing for Usability / Top Task Testing
This week I really dug into my Top Tasks Monitoring/Task Performance Indicator project (name TBD).
We’ll be running usability tests via TryMyUI, where people will be asked to find certain answers on the Canada.ca website. We’ll measure their time-on-task, task-success, and perceived ease-of-use in completing the tasks. This will be used to identify areas of improvement. We’ll be monitoring these same scenarios and tasks on a regular basis until we find the success rate acceptable (generally 80% success rate, or at least 20% improvement over the baseline).
What to measure?
The Agency had already completed some Top Tasks work, and identified a list of about 25 Top Tasks that people are coming to the CRA web presence to perform. We regularly conduct a survey of website visitors where we ask them why they have come to the site, where we provide them a list of possible suggestions as well as the ability to write in an answer.
We use this Top Tasks survey, as well as page views and visits and call centre call drivers, to identify possible areas of investigation.
Quantitative numbers like number of page visits can sometimes give us a false sense of … something. If a lot of people visit a given page on a website, that must be meaningful! Unfortunately, all you can really feel confident it means is that a lot of people visit that page. You don’t know whether that’s good or bad and whether you want that number to go up or down.
What looking at page views did, though, was it helped us to identify specific benefits or programs that people were trying to read about. We don’t know yet whether they’re finding the information understandable, but at the least we know that people are accessing the content. This may be a reasonable scenario to explore.
Call drivers are slightly more interesting to me, although it can be even more complicated to interpret. I know there’s more in-depth information somewhere, but I’ve basically just seen a number of calls associated with a proposed project. We tend to assume that a lot of calls means that we should be improving the web content (so people would/could self-serve instead of calling). But I haven’t seen the specific of those calls. Are those calls of people who don’t know or can’t find what they’re looking for, are they people who don’t understand the content they have found, are they people calling because we don’t allow them to perform certain actions online? I don’t know. (that’s ok — I know that data is being analyzed elsewhere, I just don’t personally have it at my fingertips across everything everyone does on our website!)
But what this means is that even though we have ‘exact’ quantitative numbers instead of ‘squishy’ qualitative insights, the ‘right answer’ doesn’t just jump out at us. There are still decisions to be made on what specifically we want to try to test.
In the end, we worked with a stakeholder to identify five major areas to focus on. Are they far and away the most important, frequent and frustrating tasks that people are trying to achieve when interacting with the CRA? Maybe, maybe not.. but the joy of our team and the work we do is that we don’t have to get this exactly perfect this one time.
There will be plenty of opportunities to measure task success when users are interacting with our website — we do it all the time! As it is, this initiative is just about monitoring, it’s not even about making improvements. When we have an optimization project, we’ll focus a series of usability tests just on that particular project/program/task/whatever-you-want-to-call-the-scope.
We’ll start with a baseline test, do a few rounds of improvements and re-testing the interface. That’s where we work closely with subject matter experts to be sure we’re really capturing the nuances of the area, and we’ll be able to really dig in deep.
At the level of this monitoring project, we know we’re not intimately testing every nuance of the task. As a colleague explained, top tasks are the tasks the majority of Canadians are here to perform, and we want to be sure we’re making it usable for the general populace. With an optimization project, we need to focus more on the long tail and ensure it’s usable for those who are outside those “most common” segments. Or at least that’s what I gleaned from her explanation.
We can’t hope to test *everything* as part of this monitoring.. and we don’t need to. This is a way to have numbers to report on certain tasks, which may help identify candidates for optimization projects, and then measure the impact of those projects.
Basically, this exercise is not a way for us not to prove that a task is simple and straightforward for all users, but rather a way to identify which tasks may need some more attention.. and over time, the impact of our optimization projects on those that required that attention.
I wrapped up the week with a draft list of scenarios to run (the plan is 2 tests with 10 people performing 8 scenarios each), and next week I’ll do a pilot and hopefully get the tests going. I’ve participated in notetaking for several of my colleagues tests since I started at CRA about 6 weeks ago, and I look forward to running my own test and seeing how things come out!