08 October, 2014

The tyranny of testing over design

There’s a great debate about processes and methodologies that are considered to be effective, but they aren’t. While plenty of blog posts, workshops, public talks celebrate the triumph of continuous testing, we seem to forget about the good old principles of design. Continuous experimentation and branding advocacy are sometimes working together to hold back not just design, but also common sense.

Experiments at Airbnb… what’s the point?

On May 2014, on Airbnb’s techy blog an article appeared called Experiments at Airbnb, where they provide useful examples of how to run split testing, alias A/B testing. They make controlled experiments — they say, that are very important in shaping the user experience on their site. The tips themselves seem to be rather sensible, but I’d be really grateful if someone could be so kind to explain to me what the practical achievements of these experiments are in terms of design.
The post illustrates two examples related to the price filter:

A feature that was rejected.
Two variations that were split tested.
Let’s now look at what has been implemented on the current site, and try to book a place for 2 nights on the current site:
As you can see, not only is the currency not there (and yes, people nowadays travel a lot and they need to see it), but even worse, the label on the price filter does not communicate whether the price is per night or per number of nights booked. That would be a legitimate question wouldn’t it? Good luck with finding the answer, the obscure label “Price Range” certainly is not there to help. You’ll have to select one of the properties to find out that the price displayed in the filter is per night. One might argue that during some (badly run?) usability testing nobody raised this issue. Or that most people would understand the meaning (arguably not), or that it’s still better than calling it just “price”, but for God’s sake, why can’t they just put a clear label where it’s needed? That’s what they do on Booking.com, or in a different way, on Way to stay:

That would be a bullet-proof way of making it work for everybody, without the need for any A/B testing or usability testing whatsoever.

The point here is, the price filter on Airbnb contains a fundamental design flaw, as a very established and quite obvious design principle is to label things properly in order to avoid misunderstandings.

This specific example clearly shows how the outcome of the split test can get invalidated by the fact that the design solution does not meet the design heuristics in the first place.
As a side note, there was no need for A/B testing to come up with very obvious conclusions such as:

  1. Why would people ever prefer a generic quantitative indicator instead of actual figures?
  2. There is no point in showing the highest price on the slider, a plus sign is enough to let people know that the top prices are higher than the displayed value, as soon as the algorithm behind is accurate enough to include only a range of prices that is statistically significant.

While focusing on these minor details, how many users chose to leave Airbnb because they could not understand clearly how much they would have to pay?

Do you really need to test that? And what if testing turns into a hurdle to design?

The second example described in the post is about the fully revamped interface that was released in July 2014:

The new design is a neat improvement, as users can now see images of the properties without loading a new page, and see the location of the properties on an interactive map. It took a long time to get there, but it looks nice. Was there any need to carry out A/B testing to confirm which design was better? As a designer I would say of course not. But let’s say that the A/B test results pointed to a drop in KPIs on the redesigned version, what would the next steps be? There are so many differences between the new and the old version that identifying the culprit would be utterly impossible. This reminds me of another similar example. In his 2012 presentation called Design for continuous experimentation, Dan McKinley, engineer at Etsy, shares the story of how continuous testing lead the design team to abandon certain design proposals, such as opening an item on a new tab or adopting endless pagination. While going through the presentation, I thought it contained a fundamental flaw. Unlike usability testing, which should also be treated with great caution, an A/B testing does not give much insight into what exactly made users prefer one version over the other. Maybe the implementation of version B was not good enough?

Whatever the reason, the main point here is aren’t we going towards a trend where testing dictates design solutions just because there’s no time during a design meeting to come up with a solution that is accurate? Are we dropping design thinking in favour of the cult of statistics?

In my experience as a designer performing plenty of usability testing sessions, most of the times I can foresee what the outcome of the testing will be, and that’s probably because I am quite good at doing what I do. Of course testing always provides useful insights on underrated issues, but I know what the heuristic and principles are and I never lose sight of them. But in many of the design-related discussions that I’ve been involved in over the years, these principles are not even taken into consideration most of the time.

Let’s go back to Airbnb. Here is the search widget on the homepage, after the major redesign they carried out recently:
On Firefox (Mac OS X)

On Chrome (Mac OS X)

See the ugly rounded corners? See how the controls are not the same height? It’s a one or two pixel difference, but enough to make it look not alright. And why cram all controls together like that?

If we look at the search results page, it gets even worse. Here is how it looks on my large screen, after I click on “More filters”:

Issues I found here, after looking at this for about a minute (from top to bottom):

  1. Under ‘Room type’, the three check boxes are too far from the text and icons they apply to, up to the point of being almost in the middle between one item and the one next to it. If at least they kept the border… but no, everything has to be flat now, because Apple did.
  2. When moving the slider on the price filter, the maximum amount updates on the right hand side instead of following the pin (where I would expect it to be), and the maximum price is not visible anymore — small details, maybe, but still…
  3. Large amounts of white space between filter labels and filter values. A boldface could be used as a better way to differentiate between headers and values.
  4. From ‘Neighbourhoods’ to ‘Host language’, the carets pointing down on the right hand side are out of alignment, and even though there could be a reason for such a choice, there’s a certain ambiguity as to whether they refer to the item on the right or to the whole category. Wouldn’t it be better to position them next to the labels?
  5. The text box on ‘Keywords’ does not align with any other element on that page.
  6. The ‘Show listings’ button is so massive that it doesn’t even look like a button, it actually scares me a bit.

Here is another screenshot that includes the map, taken during a different session on a different day. Some more points here:

  1. The buttons to zoom in/zoom out on the map are really small.
  2. No currency displayed on price range, and again, unclear “Price range” label.
  3. The ‘Options’ section features a really flashy indent overseeing a variety of picturesque font colours, and post-atomic, genetically-modified check boxes.

Want some more interaction? A small and cute help overlay opens when you click on the small question mark, but the pointer is displaced quite a bit from the icon:

By the way, you might have noticed that the currency was displayed in one of the two versions and not in the other. I wonder if this is because they were playing the multi-variance game to decide if designers should include the currency?

Price filter showing currency (and displacement of pin and number).

I had a similar issue when Google was testing two variations of search results pages on Google Images, and until they finally opted for the version where images open as inlays instead of new pages, it was a real pain if you happened to fall under the wrong user group.

The point here (despite the fact that they seem to have done a very good job with rebranding and the emotional impact of the site is quite effective) is there are so many issues that could have been addressed by just paying attention to detail, as you would expect from a site with millions of visitors. Visual design in particular requires being fastidious about the choices that are made. Alright is just not alright. You can’t limit yourself to dropping graphic assets into that slick and modern collection of widgets that you call GUI.

Let’s carry on with Airbnb. It’s hard to believe, but there is no way to sort the results. I suspect they must have done it for strategic reasons, but from a design perspective, this does not make sense to me, and I am sure that thousands if not millions of users have been swearing out loud because of this.

The old style pagination is one more burden to explore results and get what you want. Despite the fact that there’s plenty of space available, at least on my screen, if I want to jump to page number 5 I can’t, but hey, I can jump directly to page 56! And why would I ever do that, considering the listing seems to be totally random, with prices shuffling up and down without a criteria?

Conclusions, and a few notes

Designers should have the authority to lead decisions about how an interface should work and what it should look like, without a need to prove that what they say is right or wrong every time there is disagreement in the team. They are supposed to have the experience and knowledge to make the right choices.
Usability metrics, multi-variance testing and usability testing should all be adopted with great care. Continuous testing can be useful in many ways, but it should not replace informed design decisions.