- How it Works
- Applications Overview
- Bioprocess Development
- Molecular Biology
December 5, 2022
DOE in the real world: implementing design of experiments
Written by: James Arpino, PhD
- Design of Experiments (DOE) software helps you create and assess different designs, analyze data and build models that help you decide your next action or iteration.
- Investigating all the factors that could influence an outcome in detail in one experiment is usually impractical if not impossible.
- Design of Experiments offers the opportunity to investigate your system without cognitive bias, which often reveals new insights and generates novel hypotheses.
- If you’d like to keep learning about DOE after you're done with this article, make sure to check out our other DOE blogs, download our DOE for biologists ebook, or watch our DOE Masterclass Series.
So far, this series of blogs has explored the principles underpinning Design of Experiments (DOE). But what happens when DOE collides with the real world in a busy laboratory?
Implementing DOE is, of course, a big topic with plenty of nuance. In this blog, we’ll explore DOE implementation in several ways. We’ll start by examining the most important elements and offering advice based on our experiences running DOE and helping others implement DOE. To illustrate our points, let's imagine that we want to optimize the expression of a target protein in bacterial cell culture.
Tools you’ll need: software and automation
In theory, you can create, execute and analyze DOEs with little more than a pipette, pen, and paper. But it’ll be hard to do more than scratch the surface of what you could achieve with the proper tools.
Software for DOE
Let’s begin with DOE software. DOE rests on a well-established and robust mathematical foundation. You can do the math by hand. As we’ve seen, DOE began when computing had moved little beyond Charles Babbage’s Analytical Engine. But it’s laborious, error-prone, and requires specialized mathematical knowledge.
DOE software helps you to create and assess different designs, analyze the data and build models to help you decide your next action or iteration with much less effort than manual calculation and with less risk of a mathematical slip. Over the last few years, DOE software has become more scientist-friendly, which lowers the barriers to entry for non-statisticians. We’ll look at DOE software in detail in a future blog.
Automation hardware for DOE
Biological experiments typically involve liquid handling and analytics. Manually handling small quantities of liquid is feasible, but biological DOEs typically employ dozens or hundreds of runs, each one different from the last. The number of runs represents the number of levels raised to the power of the number of factors (e.g. 28 or 256 runs for a full factorial with 8 factors and 2 levels of each factor).
So, manually executing complex experiments is extremely challenging: there’s usually a lot of work and it’s very hard to avoid mistakes. In addition, biological scientists usually work within constraints imposed by labware, such as 96 or 384-well microtiter plates. Accurate manual pipetting of ten or more liquids into a grid of wells millimeters apart, all at variable volumes in an unpredictable layout without any errors would test the expertise of even the most practical scientists. A future blog will focus on laboratory constraints and the challenges associated with DOE and manual execution.
In addition, biologists need to integrate the output of the DOE software with the software that controls their lab automation. Automation Engineers can help make the transition from manual to automated liquid handling and ease DOE implementation, although this can create a new bottleneck.
Planning your DOE campaign
As we’ve seen, a large part of DOE’s power resides in a campaign approach encompassing screening, refinement and iteration, optimization, and assessing robustness. So, you need to sketch a plan for your campaign.
As in every scientific experiment, first frame your question as a hypothesis. Returning to our growth experiment; producing compound ‘X’ in bacteria depends on a complex interaction between genetics and environment. Therefore, optimizing production means varying aspects of genetics and environment to discover what’s important, and how they affect one another.
The next stage is to start thinking about what factors to investigate and how to change them. You should use all the knowledge you can get to avoid spending too much effort re-learning things that are already known. If, for example, you know which growth media achieve high yields there’s usually no need to confirm this experimentally. However, you can investigate a biologically plausible change to the media (e.g. zinc availability may be limiting) alongside other media, genetic and process factors, and interactions (e.g. between zinc and manganese).
Familiarity should not breed complacency. It’s all too easy to be trapped by confirmation bias (see below). You should also ask yourself: What will you do when? Do you need an answer now? Can you do a couple of iterations? Moreover, DOE assumes that each run is independent. But optimizing your executions often means runs will share factors (e.g. growth media). So, it’s important to plan carefully.
Avoiding the "big bang"... and bias
Dozens of factors could influence the optimal expression of a target protein in a bacterial cell culture, including variations in the genetic payload (e.g. plasmid type, the coding, promoter or terminator sequences). The molecular biology techniques used to assemble and transform the payload, the host strain details and growth conditions, such as temperatures and times, could also influence expression.
Investigating all these factors in detail in one experiment is impractical if not impossible. In any case, most of the possible combinations will have little if any effect on the expression profile. The problem is you don’t know which! Beginning a DOE campaign by investigating a broad set of factors in limited detail eliminates many dead-ends, producing a smaller, more interesting and influential set. Later experiments can fill in the missing details.
As we’ve seen in previous blogs, the lack of well-developed, robust theoretical frameworks in biology can result in unconscious cognitive bias. It’s all too easy to develop experiments that confirm, rather than test, hypotheses. So, be open-minded. Don’t assume you know everything. DOE offers the opportunity to investigate your system in an unbiased way, which often reveals new insights and generates novel hypotheses.
For instance, the formulae for many cell growth media are handed down and used unquestioningly by generations of scientists. After all, why would you risk taking something out and having your cells not grow properly? But calculated risks are part of science.
Cell growth is complex and there’s no perfect medium that gives excellent results in every possible case. It’s likely that many ingredients aren’t necessary for specific applications or may even be harmful: high levels of zinc may inhibit the growth of certain bacteria, for example. Investigating the composition of such apparently standard parts of the workflow can be useful: some ‘unnecessary’ components of the media can be very expensive, while others are actively harmful for the specific application.
The sanity clause
It's important to sanity-check each stage in your DOE campaign to make sure you understand exactly what you’re proposing to do and that it makes biological sense. The early stages of DOE aim to investigate high and low levels of continuous factors, such as the concentrations of media components, to establish ranges to investigate. While each extreme may make sense in isolation, the combination may be obviously wrong.
For instance, investigating the effect of several carbon sources on bacterial growth could involve a low level or zero for each source individually. Bacteria may, however, thrive on more than one carbon source. But giving bacteria no carbon would obviously prevent growth. Equally, large amounts of different carbon sources could overwhelm the bacteria. So, you may set limits for total carbon.
Biologically implausible runs waste time and resources and can compromise the overall results, especially if they occur many times. Trying to understand how the combination of levels would influence the system is critical: it will make a huge difference to the success of your run. No DOE design package or statistician can give you these answers.
You also need to remember the fundamentals of good experimental design, such as whether you need positive or negative controls. The experimental design will contain the points required to estimate the effects and interactions you are investigating. DOE also assumes that you can easily measure the response for each run. You should consider these all as experimental runs. While they can sometimes include runs that could function as controls (e.g. the zero carbon example above) that's not their purpose. You need to make sure you add the required controls and replicate runs separately. We also advocate, particularly when iterating, including a few repeated runs from earlier stages to help understand if your system is behaving the same way—otherwise it can be difficult to identify errors that affect large sets of runs like a machine not functioning correctly.
In the late 19th century, Lord Kelvin commented that “When you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meager and unsatisfactory kind.”
So, get your measurements right: the success of your DOE campaign depends on your measurements being fit for purpose. Typically, assessing expression of genetically engineered bacteria means counting colonies to determine how well a particular payload has been assimilated and tolerated by the host. However, counts can be extremely variable and, therefore, difficult to model. It’s easy to end with a particularly productive combination in which the bacteria grows as an uncountable lawn rather than individual colonies. In other situations, nothing grows. Contamination risk is another important factor to consider when assaying growth.
Working out how to handle noise (replication), ensuring you stay within a detectable range (multiple dilutions), and avoiding contamination (aseptic protocols) will make your assay much more robust and usable. You could perform a few runs to determine the dynamic range, which helps ensure the assay is robust despite different conditions.
In this blog, we’ve discussed what happens when DOE collides with the real world of your lab, both in terms of the tools and ideas you’ll need and some tricks to help you, as it were, hit the bench running.
Software and automation, as well as experts in statistics and lab automation, are all valuable allies. But your greatest ally is your scientific knowledge and instincts: it's up to you to make sure that your experiments ask the right questions in the right way. Just remember to temper this with open-mindedness: be critical of what you think you already know. After all, you have nothing to lose but your cognitive bias.
Interested in learning more about DOE? Make sure to check out our other DOE blogs, download our DOE for biologists ebook, or watch our DOE Masterclass webinar series.
Tag(s): Design of Experiments (DOE) , Featured
James Arpino, PhD
Dr James Arpino, aka JAJA, is a Product Manager at Synthace, where he leads the product development of experiment design and planning. In his seven years at the company he has become an evangelist and expert in transformational multifactorial methods in biology, including DOE.
Other posts you might be interested inView All Posts
Design of Experiments (DOE)
What is Design of Experiments (DOE)?
COVID Vaccine Development was Astonishing, but Therapeutics Discovery and Development aren’t Solved
Design of Experiments (DOE)
The DOE process: an overview