Designing Formal Usability Studies

Last month, I presented on designing formal usability studies at the UX Pittsburgh (which is also on Twitter). By request, I’m sharing my slides and lightly adapted slide notes here. 

You can Google many of these things and find out the good ways to do anything, but what’s hard when you’re starting out especially is to figure out how to think through the best practices as they apply to your own project. I’m going to present one loose framework for doing that. You should consider this as much a point-of-view piece as a how-to.

I also want to encourage you to ask questions. Just shoot up your hand and I’ll call on you once I’ve finished a thought or sequence.

Why am I here?

That is: Why am I doing a formal usability study? For most people doing a formal study, the question amounts to, why not guerrilla testing? We in this room probably all understand the value of some kind of usability testing.

It’s a good question. There’s too much money on the table not to ask about it. I should say I come at this largely from a consulting perspective, but I think the considerations are largely the same for in-house UXers. Also, “formal” vs. “guerrilla” is in many ways a spectrum, so much of what follows will just help you figure out where on that spectrum your next round of testing lies.

Good Reasons

This is the best reason: The project really needs it, for some combination of reasons like these:

  • Less bias…
    • …especially when the tester isn’t on the product team
    • …especially when the tester is a practiced facilitator
  • Qualitative rigor: a thorough analysis process, a comprehensive report with recommendations and theoretical/“best-practical” underpinnings.
    • Some useful quantitative measures possible with more participants all running through the same study
  • More direct observers, like…
    • designers
    • business owners
    • engineers

These things typically come into play on really big projects and in a shorter-term consulting relationship, where the usability researcher isn’t likely to be paid to stick around through the remainder of the design and development process.

Other Reasons

Clients & bosses: They sometimes mistakenly think they need formal testing, and they won’t take no for an answer.

Stakeholders: Sometimes they don’t understand qualitative research and the value it brings, so they demand a quantitative component that just isn’t worth trying to shoehorn into a leaner guerrilla process.

Consultants: There’s more money in bigger projects. That’s enough reason for some people to push for formal testing. It’s usually self-deception rather than evil.

Where am I going?

As you make your planning decisions, you ought to have a very strong sense of direction, as indicated by a few things.

(This is that framework I mentioned.)

The five factors: Goal of the study, broader project process, artifact fidelity, budget, and timeline.

Let’s call these the five factors of study design, and let’s nail them down before we start planning.

(1) Goal: Why are we conducting the study? Is it to prove there’s a UX problem? To validate a design solution? To align the team?

(2) Broader process: Are we part of a long, waterfall design project? Or are we doing standalone usability testing, akin to an annual physical?

(3) Artifact fidelity: Are we testing a live website, a set of wireframes, or something in-between? (Don’t formally test low-low-fidelity designs. It’s not worth it.)

(4) Money and (5) timeline: How much of each do we have for things like recruiting, testing, data processing, analysis, and reporting?

We’ll come back to these several times, which is why I’m showing you these horrible emoji.

How do I get there?

Now you know why you’re doing a formal usability test, whether you feel good about those reasons, and where the project needs to go.

In other words, the easy part is over.

Making your plan: Basic study configuration, location and tools, participants, task design, and artifact prep

Time to make the plan. Here’s what we need to think through.

Basic study configuration matrix: moderated or unmoderated against in-person or remote

Note: “Remote” here means “using an online platform like usertesting.com” or “sitting in the next room watching through a glass or CCTV.”

  1. In-person, moderated
    • Classic (in part because the technology wasn’t there for the others when the method was being developed)
    • Gives you great insight into not just task completion but physicality and demeanor.
    • Lets you probe (with care) into behaviors and desires.
    • Relies tremendously on the skill of the facilitator.
  2. Remote, moderated
    • Saves costs on travel, space, or both.
    • Lose some—but not all—of the benefits of in-person (moderated) testing.
    • Mostly, a little harder to “read” a stranger from a distance.
    • But, gain some context—what’s the user’s computing environment like?
    • Also relies tremendously on the skill of the facilitator.
  3. In-person, unmoderated
    • That would just be creepy, to sit there ignoring them like that?
  4. Remote, unmoderated
    • Difficult to position this as truly formal usability testing unless your tasks are very well-organized and straightforward, and you have a platform capable of tracking task completion at a fairly granular level of detail.
    • Can be very valuable for those sorts of tasks, however.

Don’t forget the five factors. Each one should shape how you make this decision. For example:

  • If your study’s objective includes testing emotive responses to a product, you should avoid unmoderated testing, because getting deep into the subjectivities of a session usually takes more active probing by the facilitator.
  • If for some reason you have a week to get your test done, remote unmoderated testing can be a lifesaver.
  • If you know you can’t do any more user testing before launch, in-person, moderated testing might be best, as it often yields a more comprehensive results set (again, depending on what kinds of things you’re hoping to test).

Location and tools by quadrant of the matrix, as described in the following text.

This is a non-comprehensive list. There’s a *lot* out there especially in terms of tools and the list grows quickly; you’ll have some research to do when you get to planning this stage.

  • In-person, moderated
    • Morae: Heavy-duty, expensive, feature-rich Windows software
    • Silverback: Great, cheap Mac software with a history of steady improvement
    • Neutral space: Avoid having them see the product company’s logo in the environment
    • Inviting space: Be cognizant of accessibility, perceived safety, physical comforts. Also: Men, try not to be there alone with a woman. Find a woman to join you even if she just does unrelated work all day.
    • In-home: Great to better understand context (and save money); hard sell for strictly formal studies however.
  • Remote, moderated
    • A lot here. Get creative and test the Dickens out of both your solution and your instructions.
  • In-person, unmoderated
    • (Again, don’t.)
  • Remote, unmoderated
    • I’ve only used usertesting.com, and it’s great for this kind of study. Attendees tonight may be interested in trying Loop11. Nielsen Norman Group has a nice run down from this summer that you could read.

Participants: How many, what kinds, compensation, and recruiting

How many?:

  • Goal: How credible do you need your quantitative findings to be? (They will not be statistically significant under most circumstances.) Do you have skeptics to convince who don’t understand the value or purpose of discount usability engineering? (Google it yourself, lazy.)
  • Process: How many more test cycles will be run before the process is complete?
  • Fidelity: Is your artifact complete/complex enough that you stand to learn more beyond the first five or seven users?
  • Budget and timeline: How many users can you pay? How many hours can you spend? How many weeks?

What kinds?:

  • Goal: Are you testing things that require domain expertise? Do you need to cover certain demographics to make your business case more compelling? Do you have personas that you’ve mapped to a specific product / feature set / task set? How important is a diverse participant set to your project? (And, diverse *how,* exactly?)
    • Whatever it is you want to test, your study may not be well-served by people in (or near) the industry. You’ll have to decide exactly what that means based on the nature of the product and project.
  • Process: Again, how many more test cycles will be run before the project is complete?
  • Fidelity: I can’t think of a case where fidelity of the artifact should influence whom you recruit.
  • Budget and timeline: See “How much to pay them?”—and also, if you’re short on time, you won’t likely be able to recruit 18 employees of small startups in Pittsburgh making over $85,000 per year and who prefer decaf.

How much to pay them?:

  • The going rate changes. I’ve paid between $50 and $100 recently. You may have to pay a bit more to get people of higher socioeconomic status, but you should pay all participants in a given study the same amount. Value their time equally, even if they don’t.

How to find them?:

  • Carpet-bomb your friends and family (though you shouldn’t conduct any sessions with people you know for a formal study), any professional contacts (again, likely from outside the industry), and your social-network connections.
  • Or, for an even more formal approach: Use a recruiter. Plan to spend between $75 and $150 per participant (as of late 2014), depending on the complexity of your participant set and whether you want just a preliminary recruiting effort from their database or end-to-end recruiting and scheduling.

Task design graph: intensity against complexity

Your overall approach should account for the five factors first:

  • Study goal: all tasks in support of your objectives; your most critical objectives prioritized.
  • Broader process: a suite of tasks that neither exceeds the moment nor fails to make use of it.
  • Artifact fidelity: tasks that the artifacts can support.
  • Money and timeline: tasks that lead to a dataset compatible with your resources for analysis.

But also consider also this chart, showing a hyperbolic view of how participants tend to experience tasks with great emotional or psychological intensity and great procedural complexity (or ambiguity). The moral: You just won’t get good results if you go full-sadist on your participants. You have to keep the overall test experience at least somewhat pleasant or else you can create a falsely negative impression of the product.

More on task design: clarity, verisimilitude, utility

Clarity: They’ve got to understand what they’re supposed to do—or answer. So don’t be vague (“What do you think of this page?” or “Try to figure out what you’re supposed to do on this site.”) Be clear (“Do you see anything that you don’t understand?” or “This site helps you find facial-hair inspiration, and it works best if it knows what kind of facial hair you’ve had in the past. Let’s try to figure out how to upload pictures of your own facial hair.”)

Verisimilitude: Sometimes, the task you’re testing is simply huge—and sometimes, it can’t be broken up into discrete tasks that a user might perform across several sessions. (You might see about changing that, but sometimes you can’t.) So sometimes, you’re just going to have a really long, painful task. But in most cases, aim for something that will reflect what you anticipate real-world task-completion habits to be. Unless your study aims to demonstrate how bad the software is—as many do, in fairness—you don’t want to hear, “That took way too long” over and over when, in real-world use, the task wouldn’t be an all-or-nothing proposition.

Utility: What will running participants through the task really tell you? It’s too easy to waste your time (and your participant’s) chasing data a minor feature you don’t like or a font choice you fought your team on. Any of those things that are problems will reveal themselves anyway, especially with a good facilitator, who will continually encourage thinking aloud and who will notice small issues and probe accordingly if the participant doesn’t speak to them. (Going after your grudges is also a good way to bias the data set, especially if you’re both designing and facilitating the study.) Instead, every task should help you answer a question you need to have an answer to, whether that’s, “Will people enjoy using a site like this?“ or “Will they successfully upload facial-hair pictures?”

Artifact prep: be lazy early, work hard late, and remember your paperwork

Be lazy early: Sometimes, your choice of artifact—whiteboard sketches, paper prototypes, wireframe PDFs, detailed designs—will be determined by the moment in the broader process. To maximize the ROI of the testing, minimize the “I” by keeping fidelity as low as it can be while preserving your ability to test the specific qualities and quantities you’re setting out to test.
Consider not only the present study, but the fact that you’ll have to revise your artifacts (or possibly advance them to a higher-fidelity deliverable) as a result of the test. Using the least-complex possible artifacts for your study will keep overhead to a minimum.

Work hard late: Once you’ve chosen your artifact and prepped your testing flow, however, bust your ass to make sure it all works. Don’t let your first participant be the first or even the third person to run through your test. Catch all the bugs / inconsistencies / flaws you can. Many of these will not be flaws in your proposed UI, but flaws in your artifacts or your task design. (“Oh, right, I forgot to replace the greeking in that callout.” or “Oh, right, that question is prohibitively unclear.”) These issues have a way of getting magnified in the actual study and noisily clouding out more important results. Catch the low-hanging fruit on your (plural) own and protect your study’s ROI.

In fact, if you have time, first secure yourself an expert review or run a heuristic evaluation; see my own “Beyond Usability Testing”—and don’t miss some important clarifications in the comments.

Paperwork:

  • discussion guide
  • quantitative sheets (e.g., SUS)
  • consent form / anonymity and privacy statement
  • receipt for compensation

What does my future hold?

This stuff may not be part of study design per se, but it’s worth touching on, because you can easily render your study worthless or, more often, detrimental to the ultimate product without thinking through it.

After-testing activities: analysis, reporting, more testing

Analysis:

  • The goal is to let the data speak as directly as possible, with as little interpretation as possible from any particular subjectivity. Rigorous qualitative data analysis with several reviewers is time-consuming but it goes a long way towards removing bias. It’s very easy to think you know what your data are telling you just by having run some or all of the test sessions, but you really can’t. Outside reviewers (not involved with product design) are even better, and are often worth paying for even as an independent consultant.
  • What data to analyze? Could be videos, transcripts, or notes. The closer to the beginning of that list, the more time it will take—but the more objective and comprehensive the results, potentially.

Reporting:

  • You could…
    • …report out the top five issues in a single slide
    • …write a 150-page report with charts, screenshots, links to video clips, participant quotations in callouts, opinionated footnotes, and Dilbert comics used by permission
    • …do anything in-between, or some combination
  • Generally speaking, level of investment in the study will help determine what your clients, bosses, or stakeholders expect in terms of final deliverables, but you should obviously be as clear as possible about that with them up-front.
    • Also, one level or another may be justified by other of the five factors. For example:
      • As a consultant looking for repeat business, you may want to do as much work here as you can reasonably do in order to prove your value (without setting up unreasonable expectations for the future).
      • You may be testing such informal artifacts that a long report would be a waste of time (as compared to moving on to the next round of artifacts).
      • You may know that this is the only user testing that a product will see for some years, and you may want to be sure your stakeholders can use your report to build-out a medium- to long-term roadmap of improvements.

More testing:

  • Depending on where you are in the process, fix the issues you found and test again. It’s rare that it’s worth conducting two formal usability studies back-to-back, but it’s equally rare that a formal study is the best last step in a product design, redesign, or optimization process. So you’ll probably look to guerrilla testing or other discount methods next (depending on your reasons for conducting a formal study in the first place). Go get ‘em!

WTF??!

[In the presentations, I asked for questions and we had a lengthy, lively, and productive discussion. Feel free to do the same in the comments.]

 

Posted in Technology, Web | Comments closed

Why We Blame Bikes

A cyclist friend posted a link to a nice analysis of the ways people tend to blame bikes disproportionately for pedestrian problems (pun intended).

I did a lot of bike-commuting in college but moved afterwards to a city that, at the time, was far less amenable to such activities than either place is today. I fell completely out of the habit, and a large part of that was about feeling blamed—sometimes with dangerous backlash—for what we should see as reasonable and even desirable changes to the urban environment.

As I read the Alviani piece, what I really wanted to ask was, “But why?” Why, ultimately on psychological terms, do people heap all this blame on the humble cyclist?

I have two hypotheses, one for pedestrians and one for drivers. I’m sure the real situation has a multivalence that I’m not accounting for here, but:

I think what makes bikes disproportionately scary for pedestrians is that we can’t really hear them coming. We use sound to take in our surroundings broadly and alert ourselves of potential danger that our narrow field of vision might miss. We can’t do that reliably with bikes, so we worry and blame cyclists irrationally. (Not that their aren’t some crazy ones, but again, the backlash is out of proportion as compared to pedestrian fear of crazy drivers.)

And when we’re driving, I think bikes force us to confront the fact that we’re not the rational creatures we like to believe we are. We know that the law is the law, but we resent the minor hassle of having to share the road and so make up all kinds of rationalizations about why the biker is in the wrong. But those break down in a way our similar thoughts about other cars do not, because with bikes and their fragile, unprotected human riders, we are so much more directly confronted with the fact that our desire for convenience could so easily cost somebody else’s life if we’re not careful.

TL;DR: We’re only human! Pure rationality evades us always.

Posted in Life | Comments closed

Not-not-design: Confessions of a Terrible Designer

I’m a terrible designer. I’m an amateur with Adobe Creative Suite, I know almost nothing about color theory or typography, and I don’t—in fact, I can’t—make your website or software interface “pretty,” a role often rightly understood as the designer’s. I’d guess that at least three-quarters of my work isn’t even meaningfully visual; I spend most of the day reading and writing, talking and listening.

The problem is, I’m a senior designer at a well-respected healthcare software firm.

My boss likes to call what we do “‘Big-D’ design,'” a term architect Larry Barrow seems to have coined. That means we don’t just (“just”) crank out detailed designs and styleguides, but we also tackle all the research, analysis, and communication that leads up to that final step. It’s a nod to the importance of the thinking behind the final design deliverables—also leading to the concept “design thinking,” popularized at IDEO and Stanford’s d.school.

I like the idea of elevating design’s reputation among non-designers into Big-D territory, but the mere addition of a capital letter doesn’t communicate how much worse I am at the lowercase version than my colleagues. So, I’ve also taken to explaining that my work is “not-not-design,” a less intuitive term, but a productive one in that it allows me to tell a good story:

In one of the interviews for the job I have now, I was feeling skeptical that I could or should end up in a design position. As I showed one of my wireframes, I wanted everyone in the room to understand that such barebones presentations are as high as I climb on Fidelity Mountain. I said something to that effect, and the design group’s second-in-command said, “But that’s not not design.”

And she was right, that’s not not design. If design is the process of learning about users and business needs and working towards concepts and eventually some visual deliverable, then I’m doing at least three-quarters of that work. What I do is (not-not-)design if anything is.

So why do I have such a hangup? Why do I feel it’s awkward to be introduced around our enormous parent organization as “the designer on the project?” Why do I need to rally behind a term like “not-not-design” in the first place?

My own problem probably starts with web marketing agencies, where I spent my professionally formative years, and where, until recently, design and user experience (UX) didn’t often overlap as disciplines. UXers worked on strategy and architecture; designers tackled the visuals and brought the interactivity one step closer to its eventual life in code. In the best cases, the two might collaborate some as they worked. Sometimes there’d even be a developer at the table.

The UX field does also have a well-documented terminological problem (OK, one more), which is that even those of us inside it don’t often feel certain what it means to call ourselves UX designers, architects, or strategists. It’s easier for some firms, like my current employer, to sidestep that question by just calling everybody “designer,” and to explain the depth of that term later. It’s just that between the moment when somebody sees my business card and the moment when they learn what I do (and do not), I feel a little bit like a liar—and perhaps a little bit undervalued, too.

Misunderstanding the role of the designer—ignoring not-not-design, in other words—is bad for everybody. If I were a designer at an agency like the ones in my past, I’d want very badly to get to spend as much time as the UXers do talking to users and stakeholders and poring over secondary research before I even started thinking about layout and type. If I were a UXer at one of those places, I’d want to feel like I had a strong influence on the final visual representation of the product, even if I lacked the skills to help create that representation directly. (In other words, I’ve found a pretty good fit in my current employer.)

I don’t mean to suggest that there’s no value in identifying and articulating the differences among these many roles. There is and ought to be a community of people talking productively about how to do better work in the areas that will probably always elude me. Likewise, I like talking and learning about things that may never interest some of those people.

All I’m saying is, despite my protestations in the job interview, I do belong on something called “a design team,” because not-not-design is design, too.

Posted in Technology, Web | Comments closed

Exercises for New Parents

  • kidlifting (all muscle groups)
  • swaying/bouncing (quads, patience)
  • sex (imagination)
  • rageful clenching (jaw, glutes, adrenal glands)
  • spouse-shouting (abs, rectum [if neurotic])
  • door-slamming (anterior delts, lats, eardrums)
  • house-fleeing (right lower leg, right wrist and forearm [if manual transmission], teeth / inventiveness [when pulled over])
  • divorce paperwork (small muscles of the dominant hand)
  • loneliness (small muscles of the dominant hand)
Posted in Life, Sport | Comments closed

Why I’m Buying a Case for My iPhone 5

As documented by Geoff Barnes, I waited a long, long time to upgrade from my iPhone 3GS, skipping directly to the 5.

Besides my sharing Barnes’ experience of the near-impossibility of one-handed operation of the iPhone 5, I also continue to lament the squared-off form factor of the backside of the device.

Yes, the diamond-bit chamfer on the edges makes the phone feel better in the hand than the 4/4S did to me—or else I wouldn’t have bought the 5 any more than I did the 4 or 4S. I do feel those edges digging into me, but not as much as the 4S did when I auditioned it at the Apple Store.

What my 3GS provided that none of the newer models do, though, is some kind of safeguard against dropping the phone. The rounded back put more surface area of the phone in contact with my skin, making sliding toucher. The composite material, much tackier than the aluminum, doubled down on that friction. All told, I’ve probably lost control of the 5 more in a couple months than I did in more than three years.

The light weight of the 5 also means trouble, combined with the slipperiness of the aluminum. With a new baby at home, I spend lots of time in pajamas and gym shorts, and the svelte, smooth iPhone 5 is so susceptible to jostling and shifting that it falls out of my pocket literally every time I sit down on my couch or in our glider (when I’m wearing those clothes). It’s hit our hardwood floor more than once, and I feel sure a cracked screen is in my future. The 3GS, of course, never fell out of any of my pockets. It was just too heavy, which again I preferred.

The sad bottom line is this: When I find one, I will buy a case for the phone that makes the back rounder and the device heavier. Like Farhad Manjoo, I’ve been derisive about cases in the past, but the fact is that having a case on this phone would make it a better phone, for my purposes. I’m glad it looks so pretty, but it just feels a mess.

Posted in Life, Technology | Comments closed
  • Contact Me

    Reach me at or: