Creating an experimentation framework

Intro

In 2018 my organisation underwent a change in leadership with a new CTO / CPO. Up until this point, our founder and CEO had always been our CPO and our product visionary. He steered the ship and we were his (mostly) willing crew. This change in personnel began a chain of events that reshaped how we built product.

The first change was the move from functional to cross-functional teams. We moved from project teams being spun up to solve individual problems and then dissolved to semi-permanent autonomous cross-functional teams. But, despite having a new team structure, we still worked in a pretty waterfall manner. When a cross-functional team completed a project it moved on to the next challenge in its roadmap. Although we had become autonomous in terms of delivery, our teams had no autonomy on direction or strategy.

The next big evolution was in how we set goals for our teams. Rather than giving them a list of features to build we wanted to move towards a more outcomes-based way of thinking. This means trying to focus on what we want to achieve by releasing a feature and then challenging teams with this outcome. They are then empowered to choose to which feature(s) to deliver in order to achieve the desired outcome. This in itself was quite a challenge but that's another story...

This new found autonomy came with challenges, the biggest was knowing which new features or changes to existing features would deliver the requested outcome. We needed to turbo charge our discovery processes...

It was about this time I first read Marty Cagan's book "Inspired: How to create tech products customers love". This book was my first introduction to "product thinking" and for a while it became my reference manual. I referred to this book on an almost daily basis but one line particularly stuck with me; Marty suggested that a product team should have fifteen experiments running at the same time. For the uninitiated, an experiment is a form of discovery work designed to test a hypothesis or assumption as quickly and as cheaply as possible. We were right at the start of our new discovery journey and had only run a handful of experiments up to this point but each one had been slow to build and set up. Our teams were struggling to run two experiments concurrently let alone fifteen.

Building the foundations

Creating and running experiments was a new skill that needed to be learned by our teams. We didn't yet have a model of what good looked like which meant each team approached the problem of running experiments in their own unique way.

To help accelerate our maturity in running experiments I wanted to introduce some common foundations like I had done when developing our research ops maturity. I figured some guidance would help our teams get better at experiments and also thought that if all the teams used a common approach we'd be able to iterate on that common approach quicker and collectively level up sooner.

I began by working with the teams to better understand the concerns they had when planning their discovery and experimentation activities. I started hearing questions such as:

What can you test with experiments?
Do I always need to run experiments for all my team's ideas?
How do I design experiments?
What's the process for creating an experiment?
How do I test that an experiment has been successful or unsuccessful?

From here I began constructing a framework to guide our teams and help them gain confidence in running experiments. This consisted of home grown articles and videos created by me and a couple passionate supporters, some articles sourced from the web, presentations, coaching workshops and some simple systems for tracking and cataloguing the experiments we ran.

Getting everyone on the same page

We were more or less starting from scratch so we had to start with foundational elements such as language and the basic process. We created guides that explained the language of discovery and established some common ways of identifying what should be tested with an experiment.

We established that experiments were any activities that generated any form of behavioural data, ie you put something in front of a customer and measure how they interact with it.

Experiments could be conducted offline through simple prototypes or use live data prototypes in production. We borrowed techniques from Jeff Patton & Jeff Gothelth to identify what a squad's riskiest, high-impact opportunities and framed these as potential experiments. An example of the techniques we used is Jeff Gothelth's Lean UX Canvas

We then borrowed further guidance from Jeff Patton and Giff Constable to help the team understand which techniques were appropriate for these experiments based on where they were in their discovery journey and how much confidence they were seeking to generate.

The Jeff Patton discovery techniques graph
(Both Giff Constable and Jeff Gothelf have similar graphs)

The further to the right you go, the more detailed but expensive and slow the techniques become.

In later versions of the experimental framework we also borrowed heavily from Spotify's Thoughtful Execution Framework to help us in the process of identifying what was to be tested.

Creating hypotheses

One of the biggest shifts in our thinking was focus our discovery efforts on the validation (and invalidation) of hypotheses. Prior to this, we simply built what we thought was right for our customers. With hypothesis-based design your goal is basically to validate or invalidate your hypotheses as quickly and as cheaply as possible. However, before you validate a hypothesis you first have to have a hypothesis!

We already had data analysts within our squads and from working with the Head of Data I knew that due to their training they were already adept at forming experimental hypotheses. So in collaboration with our lead data analyst we created some content and activities to train our product managers and designers in constructing hypotheses. Again, we borrowed from whomever we could; we used Jeff Gothelf's Lean UX Canvas v2 (again) and an excellent article from Chris Compston to build out a really effective coaching session and collection of articles.

We also catalogued all the experiments we conducted so as well as the guidance to create hypotheses we also had a library of past examples.

Coaching on building simple experiments

We were beginning to find our confidence with framing our experiments but the act of constructing and running an experiment was still taking too long. We needed to learn how to test our hypotheses in the quickest, cheapest fashion. This actually proved to be quite tricky as experiment design is a very different mindset to traditional interaction and product design. We really had to work hard to instil the philosophy of designing simple experiments as We found designers consistently wanted to design the best feature not the best test. Collaborating with one of the lead designers we constructed some guidance and coaching on designing the simplest possible experiment. We then ran coaching sessions for each designer to help them develop their experiment design skills. We coached them on three key principles:

Focus on the variables contained within the hypothesis
Try to make the simplest possible, ideally changing one thing at a time
Demonstrating attribution

Our goal was to help designers design a series of small, quick to implement experiments the delivered the learning we sought as quickly as possible. We also wanted to move away from the idea that to test something we needed to build some form of MVP, instead trying to test things like value and demand before building anything functional.

Analysing your experiments

Understanding why your experiments behaved as they did and knowing whether the result is trustworthy can be extremely challenging but we started with a much lower bar. We needed to ensure we had sufficient instrumentation to understand the results of an experiment and we needed to learn how to devise fair and powerful experiments.

Like I mentioned earlier, we had a head start here as we already had data analysts in each team to help with the maths and calculations but an analyst is only as good as the data they're analysing. We needed to improve our data pipelines so the right data flowed to where we needed it and also improve the quality of our data. Some mistakes we made initially in constructing unfair or untrustworthy experiments included:

Comparing cached pages to uncached pages
Measuring the wrong metrics and not considering guardrail metrics
Designing under-powered experiments which simply didn't generate enough data to make a reliable decision (but making one anyway)

To help with sharing the results of experiments, we created templates for experiment write ups and created a way of cataloguing all the write ups so it was possible to quickly see what experiments had been conducted and their results.

Helping the organisation accept experiments

So far I have only spoken about helping the practitioners who are conducting the experiments but I should also mention how we worked with our senior leadership team and other parts of the organisation in an effort to help them understand our new ways of working.

Let's start with our leadership team, for a long time they had been pretty hands on with our products, working with squads to build roadmaps of features. Given we were now using experiments to make these choices we needed them to focus more on business level decisions. Ideally, we needed them to set directions and goals then leave the squads to determine how we get there. We needed our leadership team to move from an output mindset to an outcomes mindset.

Changing how a team collectively acts and thinks is not something you can merely ask them to do, they will need to be coached and given the space and support to learn the new world. I was part of a team who worked over several years to affect this change. The area I focused on was building the trust in our squads. We needed to demonstrate that the squads were qualified and capable of making these decisions so to give our leadership team the confidence to discharge their former roles and devolve power to the squads. Our journey to achieve this was long (taking multiple years) and is a complex story for another day...

For the rest of the organisation, the introduction of empowered teams who determine their own path through their discovery activities proved to be quite disruptive. For example, when we began to run more experiments and inevitably made a few mistakes along the way, it was hard for teams like our customer service team to understand what was going on. To aid with this we built an experiment tracker so that anyone in the business could quickly look up which experiments were live at a given time and who the point of contact was for them. This way if something did go wrong, it was easier to minimise the disruption caused. We also created guidance for go-to-market planning to ensure that, based on how risky an experiment was, the level of awareness within the organisation and monitoring in place were appropriate.

Previously, the other parts of the organisation had simply told the product and engineering teams what fixes and changes they needed. These requirements were fed into the development machine and at some point the work was done. With the squads now having more of a say in what work gets done and being focused on their goals, the old linear way of working was replaced with one that had more variability. This meant that the other parts of the business now needed to convince the squads of the merit of addressing their issues. This definitely was a point of friction with teams in other parts of the organisation. We worked with the squads to help them better understand who their stakeholders were and how to bring them into their prioritisation process whether it be at an annual, quarterly or fortnightly level. We encouraged the squads to make these processes transparent and share their working with their stakeholders. We then worked with the leaders of teams from around the organisation to help them understand how they could 'plug into' the new processes and work better with our squads.

Continued coaching and refinement

We're now four years down the track and we're on v3 of the framework. We're still improving how we work and taking inspiration from industry leaders. For example we're now trying to shift to assumption testing (where possible) based on Teresa Torres' excellent book "Continuous Discovery Habits: Discover Products that Create Customer Value and Business Value". Plus we're soon to start a program of work to redefine and improve our discovery techniques with the challenge of making them leaner and quicker.

We've come along way from where we were, our product squads are now massively more empowered than they were and we're running more experiments than we ever have but there is always room for improvement so the journey continues...

Thumbnail image: https://unsplash.com/photos/graphs-of-performance-analytics-on-a-laptop-screen-JKUTrJ4vK00

Building an always on user research cadence

Mentorship

Contact

Jules Munford
Phone: 0431 414322
Email: julian.munford@googlemail.com
Twitter: @julesmunford