When studying biodiversity, it is therefore important or even crucial that data can be linked to a specimen, especially in the case of poorly known “species”, especially since the “species” taxon can be revised and its definition revised a posteriori, thus changing the meaning of the name and the assignment of individuals to a species group (Troudet et al. 2018).
This situation is very poorly understood by society as a whole and by a large part of the scientific community, which approaches the question of species with a very essentialist vision. In this vision, phenotypic discontinuities between species are wrongly taken as a confirmation of their reality, and attributed, without proof, to reduced gene flow. In this view, the species would evolve and be transgressed only by rare and unproductive hybridization events or by exceptional evolutionary changes. In reality, hybridization is often trivial and the species is a porous category at the genetic level; the discontinuities between species that we think we observe empirically are often a defect in the sampling of living organisms or the result of secondary extinctions, and not only the result of a splendid isolation coupled with a divergence phenomenon (Mallet 2008). Hence the importance of collections, which contribute to documenting a complex living world that is resistant to classification by multiplying the number of specimens stored, regardless of their momentary assignment to a particular taxon.
2.5. Big Data collections in space and time
Still in terms of fundamentals, specimens from collections have the great advantage of documenting a vast geographical coverage and an unparalleled temporal depth. As such, they can help answer all sorts of questions about changes in ecosystems in the context of the biodiversity crisis. Indeed, even if they were not acquired through a reasoned sampling program but erratically, as a result of various studies or opportunistic collections, the different specimens stored in museums ultimately document many different places and times and can often lend themselves to tests of scientific hypotheses formulated a posteriori (Dias Tarli et al. 2018). All global approaches with large numbers of individuals or species need this collection-resident information, though with the express condition that places and dates are accurately informed, which is not as obvious or as common as one might think. Toponyms can change or be inaccurate and even modern means of geolocation have their limits because we do not have a single exact reference ellipsoid for the whole planet.
Nevertheless, specimens in collections represent a biased sample of life on Earth, whether in terms of location, date, or taxonomic groups involved (Meyer et al. 2015; Amano et al. 2016; Troudet et al. 2017). Most of these biases are related to well-known “collector effects” (Pautasso and McKinney 2007), namely that biodiversity is best sampled in the most accessible or visited locations, and for the most societally and scientifically valuable taxa. These biases are often criticized and indeed it is often recommended that they are corrected by supplementing sampling with additional collection effort (Feeley and Silman 2011; Beck et al. 2012; Goodwin et al. 2015).
Of course, additional collection efforts are never negative, especially if they are concentrated on the most glaring deficiencies in terms of areas or taxa, since we only know 20% of all living organisms. Such efforts are often associated in people’s minds with massive data acquisition operations (large expeditions, large monographs on fauna or flora). Nevertheless, it is illusory to hope to compensate for all the deficiencies by more massive sampling. Even if there are no longer obvious large “gaps” in terms of taxa or geographic areas, the available data will still be biased with respect to the multiplicity of questions addressed or accessible, and it is therefore at least of equivalent importance to design protocols for statistical correction of these biases (Dias Tarli et al. 2018).
Rather than relying solely on a few massive operations, it is also important to consider that all published scientific studies on biodiversity should make their specimens and observations available. The accumulation of all these specimens/ observations would potentially be colossal, precisely because of the immense number of published case studies of species on diverse evolutionary or ecological issues, and would dwarf the sum of the few large expeditions conducted each year. This is not at all the case at present, and there is a real culture change to be made in this area, in the very current context of an “open science” policy (Ayris et al. 2018). This is both a fundamental ethical issue for science and an extraordinary opportunity for the community to quickly increase the amount of indispensable data on biodiversity.
2.6. What future is there for the use of collections?
Collections have the great advantage of being material and having curated samples on which all sorts of retrospective and analytical approaches can be conducted (Rocha et al. 2014). In this way, one can go back to the source data and analyze them against completely different questions than those that governed their harvest in the past. The current trend, unfortunately, is to accentuate the acquisition of dissociated observations of specimens rather than continue to acquire many additional specimens (Troudet et al. 2018).
This trend is mainly related to the very strong development of initiatives of citizen science that generate millions of observations without specimens (Amano et al. 2016). Some scientists also argue ethical reasons for this lack of specimen collection (Minteer et al. 2014), but it should be noted that disciplines such as ecology, for example, have never really made available most of the data collected and used, regardless of the organisms involved (Schilthuizen et al. 2015; Mills et al. 2016). Finally, a few systematists oddly advocate taxonomic descriptions based on virtual data such as photography (Marshall and Evenhuis 2015). All of these trends are based on the same mode of thinking – derived from general biology – which implicitly considers our knowledge about biodiversity as already sufficient and organized enough to afford the creation of digital complementary data alone (Grandcolas 2017a).
The validity and the a posteriori exploration of these observations will be strongly limited in the great majority of cases, because it will be impossible to come back to the specimen, even in the form of a digital photograph or sound record associated with the observation. These observations, especially when they are linked to rare and poorly established names, which are sometimes doubtful or changing, will simply be null and void. Consider that, in metropolitan France alone, there are 40,000 species of insects; we cannot expect all of these taxa to be known or even stable in terms of their scientific significance, or to be the subject of unambiguous taxonomic assignments.
Concerning the specimens themselves, our analytical capabilities have increased considerably and allow for increasingly powerful and diverse studies through collections (Meineke et al. 2018). Access to and the study of specimens is facilitated by the digital data and images associated with them, especially through large digitization programs (Le Bras et al. 2017). Many future uses of collections will likely be stimulated in this way, by linking specimens to digital records or analytical results. It is therefore important that this linking is done correctly, which unfortunately is not always the case. For example, the millions of digital DNA or RNA sequences deposited in Genbank rarely include a link to the sequenced specimens which would be deposited in collections (Pleijel et al. 2008). Similarly, not all participatory science citizen programs, generating huge amounts of data, deposit their digital identification validation materials (photos, sounds, etc.) on portals like GBIF that are designed to link metadata (Troudet et al. 2018).
Collections are thus in a paradoxical situation. Their accessibility is greatly improved by modern techniques, but these same techniques can lead to collections becoming virtual in the minds of many (Grandcolas 2017a). Many biologists, and among them some taxonomists, consider that we have already mapped out the order of life and that we now just need to complete this picture with additions that are still numerous, but that will not upset the already established order. Therefore, the tree can hide the forest