Data Science for Experimental Design in Biology

On the 17th of October, Synthace was an invited speaker at the Alan Turing Institute workshop on Data Science for Experimental Design. During this event, Synthace Principal Scientist and FDE, Vishal Sanchania, gave a talk highlighting how Antha is being used to automate and accelerate high dimensional experimentation such as Design of Experiments -  in areas such as bioprocessing, protein engineering and cell therapy. These are his thoughts on what was a fascinating day of talks from both industry and academia.

A key theme highlighted by multiple speakers across the event was that: researchers want to be running better designed, statistically powerful experiments in an automated fashion. Dr Stephanie Biedermann, from the University of Southampton, talked on optimal designs for experiments – why, how, and where are the bottlenecks including a focus on Gaussian process modelling for multi-parameter optimisation. Outside of the technicalities, something that really resonated with me, was the concept of “Bring statisticians in during the experimental planning phase, not just at the end when you have the data.”. This is something, we’ve done since the conception of Synthace – but it’s not easy, as it requires much closer collaboration and the creation of a common language to allow computer scientists, biologists and mathematicians to communicate effectively. Following Stephanie’s talk, Dr Ozgur Akman from the University of Exeter gave a fascinating talk on “Reduced models of circadian systems”, in which the complexity of systems were being pushed, where he showed ways he had used Boolean models to massively reduce otherwise expensive cost functions (with some calculations taking 120 hours on 6 cores!).

Prof Ross D King, from the University of Manchester, then stepped up to talk on a subject close to our heart: “How can we use laboratory automation and robotics to reduce research biases?”. Prof King explained his own closed loop automated system for early stage discovery focussed biological research (Eve which follows on from Adam), and then went on to describe the differences between abductive and inductive assimilation of information between humans and machine learning algorithms. Ultimately, it was very interesting to see the layers of abstraction and use of meta-QSAR to drive the intelligent discovery process. Other talks on open science and reproducibility included those from Prof Rafael Carazo Salas (demonstrating a central database for collecting, processing and accessing meta-data for reanalysis) and Dr Rachael Ainsworth (who provided a comprehensive overview of the tools to help solve reproducibility and open data access challenges). In both of these talks plenty of discussion focussed around data curation, open source tools and improving reproducibility. There were also many other fantastic talks and brainstorms in what was a powerful multidisciplinary day of data science, policy and wet lab science.

So, what were our take home thoughts from this event?

Beyond the media buzzwords, there is a growing community of researchers and companies trying to tangibly change ‘how’ we are conducting research in biology. There was tremendous interest in what Synthace is developing with Antha and real synergies between our vision for biology and a lot of the themes presented over the course of the day. The challenges that we face to move towards a new ‘Computer Aided Biology’ were highlighted throughout the day and are those of mindset change, overcoming inertia and deciding upon common standards/frameworks and a common language to describe biological protocols. To move towards a way of conducting biological research will require collaboration and openness from multiple stakeholders: government, funding bodies, academia, publishers, private companies and tool/equipment providers – and I’m confident that we will get there as a community and that Synthace will continue to add value in helping to accelerate this transition.