At this point then it is worth considering how data and data management fit into the processes that drive the two fundamental lab types, which we have referred to earlier, namely (i) the hypothesis‐driven, more research/discovery‐driven lab and (ii) the protocol‐driven, more “manufacturing”‐like lab.
1.2.4.1 Data in the Hypothesis‐driven, Research Lab
In a pure research, hypothesis‐driven lab, whether it is in life science, chemical science, or physical science, there is a fundamental, cyclical process operating. This process underpins all of scientific discovery; we refer to it as the “hypothesis‐experiment‐analyze‐share” (“HEAS”) cycle (see Figure 1.3) or, alternatively, if one is in a discovery chemistry lab, for example a medicinal chemistry lab in biopharma, DMTA (see Figure 1.2).
The research scientists generate their idea/hypothesis and design an experiment to test it. They gather the materials they need to run that experiment, which they then perform in the lab. All the time they capture observations on what is happening. At the end they “workup” their experiment – continuing to capture observations and raw data. They analyze their “raw” data and generate results (“refined” data); these determine whether the experiment has supported their hypothesis or not. They then communicate those results, observations, and insights more widely. Ultimately, they move on to the next, follow‐on hypothesis; then, it is off round the cycle they go again until they reach some sort of end point or final conclusion. All the while they are generating data: raw data off instruments and captured visual observations and refined data, which are more readily interpretable and can more easily lead to insights and conclusions.
Figure 1.3 Hypothesis‐experiment‐analyze‐share (HEAS) cycle.
1.2.4.2 Data in the Protocol‐driven Lab
In the protocol‐driven lab, whether it is in a manufacturing or sample testing domain, there is again a fundamental process which operates to drive the value chain. Unlike the “HEAS” cycle this is more of a linear process. It starts with a request and ends in a communicable result or a shippable product. This process, which we refer to as the “request‐experiment‐analyze‐feedback” (REAF) process, is outlined in Figure 1.4.
There are many similarities, often close, between the linear REAF process and the HEAS cycle especially in the Experiment/Observe and Analyze/Report steps, but the REAF process does not start with an idea or hypothesis. REAF represents a service, which starts with a formal request, for example to run a protocol to manufacture a good or to test a sample, and ends with a product or a set of results, which can be fed back to the original customer or requester. As we noted in Section 1.2.4.1 above, it is increasingly likely that the LotF will be set up with a Laboratory as a Service (LaaS) mentality; REAF may therefore be much more broadly representative of how labs of the future might operate.
Figure 1.4 Request‐experiment‐analyze‐feedback (REAF) process.
It is important to acknowledge that the data and information, which drive Request and Feedback, are quite different in REAF than in the corresponding sections in HEAS. With the focus of this book being on Experiment/Observe, and to a degree Analyze, we will not say anything more about Request and Feedback (from REAF) and Hypothesis and Share (from HEAS). Instead, the remainder of this section focuses on what the Experiment and Analyze data management aspects of the LotF will look like, whether that LotF is a hypothesis‐ or a protocol‐driven lab. This is made simpler by the fact that in the Experiment/Observe and Analyze/Report steps, the data challenges in the two different lab types are, to all intents and purposes, the same. In the remainder of this section we treat them as such.
1.2.4.3 New Data Management Developments
So what new developments in data management will be prevalent in both the hypothesis‐ and the protocol‐driven labs of 2030? In the previous two sections we asserted that these labs will be populated by fewer people; there will be more robotics and automation, and the experiment throughput will be much higher, often on more miniaturized equipment. Building on these assertions then, perhaps the most impactful developments in the data space will be:
1 The all pervasiveness of internet of things (IoT) [25, 26]. This will lead, in the LotF, to the growth of the internet of laboratory things (IoLT) environments; this will also be driven by ubiquitous 5G communications capability.
2 The widespread adoption of the findable, accessible, interoperable, and reusable (FAIR) data principles. These state that all data should be FAIR [27].
3 The growing use of improved experimental data and automation representation standards, e.g. SiLA [28] and Allotrope [29].
4 Data security and data privacy. These two areas will continue to be critical considerations for the LotF.
5 The ubiquity of “Cloud.” The LotF will not be able to operate effectively without access to cloud computing.
6 Digital twin approaches. These will complement both the drive toward labs operating more as a service and the demand for remote service customers wanting to see into, and to directly control from afar what is happening in the lab. Technologies such as augmented reality (AR) will also help to enable this (see Sections 1.2.5 and 1.2.6).
7 Quantum computing [30–33]. This moves from research to production and so impacts just about everything we do in life, not just in the LotF. Arguably, quantum computing might have a bigger impact in the more computationally intensive parts of the hypothesis‐ and protocol‐driven LotF, e.g. Idea/Hypothesis design and Analyze/Insight, but it will still disrupt the LotF massively. We say more on this in Sections 1.2.5 and 1.2.6.
The first three of these developments are all related to the drive to improve the speed and quality of the data/digital life cycle and the overall data supply chain. That digital life cycle aligns closely to the HEAS and REAF processes outlined in Figures 1.3 and 1.4 and can be summarized as follows (see