The scientific literature is immense. No individual human can brimmingy comprehend all the published research discoverings, even wilean a one field of science. Regardless of how much time a scientist spfinishs reading the literature, there’ll always be what the proposeation scientist Don Swanson called ‘undiscovered accessible comprehendledge’: comprehendledge that exists and is published somewhere, but still remains bigly obstreatment.
Some scientific papers obtain very little attention after their accessibleation – some, indeed, obtain no attention whatsoever. Others, though, can languish with confineed citations for years or decades, but are eventupartner rediscovered and become highly cited. These are the so-called ‘sleeping beauties’ of science.
The reasons for their hibernation vary. Sometimes it is becaengage contemporaneous scientists deficiency the tools or down-to-earth technology to test the idea. Other times, the scientific community does not comprehend or appreciate what has been discovered, perhaps becaengage of a deficiency of theory. Yet other times it’s a more sublunary reason: the paper is spropose published somewhere obstreatment and it never produces its way to the right readers.
What can sleeping beauties alert us about how science toils? How do we rediscover proposeation the scientific body of comprehendledge already retains but that is not expansively comprehendn? Is it possible that, if we could comprehend sleeping beauties in a more systematic way, we might be able to speed up scientific enhance?
Sleeping beauties are more normal than you might foresee.
The term sleeping beauties was coined by Anthony van Raan, a researcher in quantitative studies of science, in 2004. In his study, he identified sleeping beauties between 1980 and 2000 based on three criteria: first, the length of their ‘sleep’ during which they obtaind confineed if any citations. Second, the depth of that sleep – the standard number of citations during the sleeping period. And third, the intensity of their awakening – the number of citations that came in the four years after the sleeping period finished. Equipped with (somewhat arbitrarily chosen) threshelderlys for these criteria, van Raan identified sleeping beauties at a rate of about 0.01 percent of all published papers in a given year.
Later studies hinted that sleeping beauties are even more normal than that. A systematic study in 2015, using data from 384,649 papers published in American Physical Society journals, aextfinished with 22,379,244 papers from the search engine Web of Science, set up a expansive, continuous range of rescheduleed recognition of papers in all scientific fields. This incrrelieves the approximate of the percentage of sleeping beauties at least 100-felderly contrastd to van Raan’s.
Many of those papers became highly inconveyial many decades after their accessibleation – far extfinisheder than the normal time prosperdows for measuring citation impact. For example, Herbert Freundlich’s paper ‘Concerning Adsorption in Solutions’ (though its innovative title is in German) was published in 1907, but began being normally cited in the punctual 2000s due to its relevance to novel water purification technologies. William Hummers and Ricdifficult Offeman’s ‘Preparation of Graphitic Oxide’, published in 1958, also didn’t ‘awaken’ until the 2000s: in this case becaengage it was very relevant to the creation of the soon-to-be Nobel Prize–prosperning material graphene.
Both of these examples are from ‘difficult’ sciences – and engagingly, in physics, chemistry, and mathematics, sleeping beauties seem to occur at higher rates than in other scientific fields.
Indeed, one of the most well-comprehendn physics papers, Albert Einstein, Boris Podolsky, and Nathan Rosen (EPR)’s ‘Can Quantum-Mechanical Description of Physical Reality Be Considered Complete?’ (1935) is a classic example of a sleeping beauty. It’s number 14 on one catalog that quantifies sleeping beauties by how extfinished they slept and how many citations they suddenly accrued.
The EPR paper inquireed whether quantum mechanics could truly portray physical fact. The stumbling block was the phenomenon of ‘quantum entanglement’, where two quantum particles have a history of previous transmition and remain combineed in such a way that uncomardents any meacertainment of a property of one of them sways that property in the other, seeless of how far away from each other they are.
To Einstein, this uncomardentt that the particles must be communicating instantaneously, quicker than the speed of weightless. This viotardys the principle of locality, which is fundamental to his theory of relativity. Einstein called this ‘spooky action at a distance’, and his solution to the declineion was to distinguish ‘secret variables’ that choose the state of a quantum system – variables that were beyond the Cdiscleave outhagen expoundation of quantum mechanics deffinished by Niels Bohr.
The paper caengaged ardent debates between Einstein and Bohr from its accessibleation until the finish of their inhabits. But it wasn’t until the tardy 1980s that the EPR paper saw a spike in citations.
The EPR paper wasn’t secret in a third-tier journal, unread by the scientific community. Indeed, it produced ardent debate, even a New York Times headline. But in terms of its citations, it was a sleeper: it obtaind many confineeder citations than one would foresee becaengage it needed testing, but that testing wasn’t feasible for a extfinished time afterward.
In 1964, the physicist John Bell showed that if Einstein’s ‘secret variables’ existed, it would direct to certain algebraic foreseeions, now called Bell’s inidenticalities. If these foreseeions could be helped by experiment, it would decline quantum mechanics and vshow the ‘secret variables’ see.
A 1969 paper by John Claengager and colleagues got one step sealr, by framing Bell’s inidenticalities in a way better suited to authentic experiments – which complyed only in modest numbers in the 1970s, hampered by imperfect supplyment: the best technology of the day was indynamic to test the theory, such as weightless splitrs that were too low-efficiency to be reliable.
A novel generation of far more conclusive experiments in the punctual 1980s was made possible by enhance in laser physics. These experiments gave an ununclear violation of Bell’s inidenticalities – and thus a sturdy concurment with quantum mechanics. Einstein’s ‘secret variables’ exhibitd to be superfluous. Since then, more technorational evolvements have made experimental meacertainments increasingly exact – indeed, as seal to the perfect envisiond by the EPR paper as possible.
This has all led to an explosion of citations of the EPR paper. The sleeping beauty – having had its slumber only temperately disturbed apass disjoinal decades – is now expansive awake.
In some cases, a sleeping beauty comes without the benevolent of wonderful mystery combineed to the EPR paper. In some cases, scientists comprehend someleang well enough – but fair don’t comprehend what to do with it.
The first alert of the green fluorescent protein (GFP) – a vital ingredient in many conmomentary biorational experiments becaengage of its ability radiate luminously under ultraviolet weightless, and thus act as a evident indicator of cellular processes appreciate gene transmition and protein dynamics – was published in 1962 in the Journal of Cellular and Comparative Physiology. GFP had been discovered in the jellyfish Aequorea victoria in research led by the marine biologist Osamu Shimomura.
Over the summers of the complying 19 years, 85,000 A. victoria jellyfish were caught off Friday Harbor in Washington state in trys to isotardy adequate amounts of GFP that apexhibited for a more thocimpolite characterization. This resulted in a series of papers between 1974 and 1979. But as Shimomura confessted in one of the intersees many years tardyr, ‘I didn’t comprehend any engage of . . . that fluorescent protein, at that time.’
In 1992, leangs alterd. The protein was cloned, and the relevant genetic proposeation was passed on to the biologist Martin Chalfie. Chalfie was first to come up with the idea of transmiting GFP transgenicpartner in E. coli bacteria and C. elegans worms. He showd that GFP could be engaged as a fluorescent tager in living organisms, discleave outing up novel worlds of experimentation. GFP is now a routinely engaged tool apass swathes of cell biology.
Chalfie and colleagues published their toil in Science in 1994, citing Shimomura’s 1962 and 1979 papers – thus abruptly waking them up to a flood of new citations. A basic discovery in an obstreatment organism – Shimomura’s jellyfish – had been made into a proximately universal, ultra-advantageous molecular tool. It took decades of toil, and a flash of insight from a receptive and ingenious mind, but Shimomura’s studies were finpartner being engaged by the rest of the profession.
Although it’s enticeing to see at a paper’s ‘beauty coeffective’ – as the systematic study from 2015 called its numerical index of how extfinished a paper slept and how many citations it obtaind when awakened – and presume that the comprehendledge it portrayd must have been suddenly rediscovered, this isn’t always the case. Three examples come from papers in statistics that, on a first, unmistrusting see, all materialize to be textbook sleeping beauties.
The first is Karl Peincfinishiarism’s 1901 paper ‘On Lines and Planes of Cleave outst Fit to Systems of Points in Space’. It sees appreciate a classic case of a sleeping beauty: it was published in a primarily philosophical outlet with the rather unwieldy name of The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, and seems to have slept soundly for a whole century, only being brimmingy awakened in 2002 with a huge spropose of citations.
It’s certainly real that the twenty-first century bcimpolitet with it many more ways to engage Peincfinishiarism’s 1901 insights. What he had portrayd was what eventupartner became the statistical toilhorse comprehendn as principal components analysis (PCA) – which became particularly advantageous after the advent of digital ‘huge data’ to discover patterns and abridge big, unwieldy datasets in a petiteer number of variables. But even without those datasets, the technique of PCA itself was well engaged apass the entire twenty-first century, from psychology to palaeontology.
It’s difficult to say why the 1901 paper suddenly begined being cited around 2002 – the exset upation could be purify luck and social dynamics, with one study happening to cite it and others complying suit – but it wasn’t becaengage PCA, which by that point was taught in every basic statistics course, had been ‘rediscovered’.
A second example is Fisher’s exact test, normally engaged to choose the statistical significance of associations between categorical variables (for example, to test whether the proportion of recovered forendureings is separateent between two treatments). It was published by the eponymous Ronald Fisher in 1922, and materializeed to ‘awaken’ in 2006. But as with PCA, Fisher’s exact test was routinely engaged apass the twentieth century, becoming a normal part of analysts’ tool kits.
Finpartner, there’s the Monte Carlo method of statistics, which engages random sampling to produce numerical answers to problems that, while solvable in principle, are too intricate to aggression head-on. It was prolonged by a group of researchers including Stanislaw Ulam and John von Neumann in the 1940s at the Los Alamos Laboratory, intfinished to settle the problem of neutron diffusion in the core of a nuevident firearm.
In 1949, Nicholas Metropolis and Ulam published a paper titled ‘The Monte Carlo Method’ laying out its set upations – but seeing at the paper’s citations, it seemed to remain dormant until 2004. Aget, though, the method it portrayd was already having a proset up impact apass many separateent fields, and in rehearse the paper wasn’t a real sleeping beauty.
Citations are currency in science, but it’s effortless to count on on them too heavily. The statistical cases portrayd above depict that mecount on seeing at citations, and not comardent the brimming context of a study, can be misdirecting: we might finish up asking ourselves why a paper is a sleeping beauty when repartner it isn’t a real example of the phenomenon.
With that caveat on the sign up, we can see at a final example of a real sleeping beauty – one that perhaps has the most to teach us about how to awaken dormant comprehendledge in science.
In 1911, the pathologist Francis Peyton Rous published a paper in which he alerted that when he injected a well chicken with a filtered tumor pull out from a cancerous chicken, the well chicken prolonged a sarcoma (a type of cancer swaying combineive tpublish). The pull out had been attfinishbrimmingy filtered to delete any present cells and bacteria, which might be foreseeed to caengage cancer, so another factor must have been at join to elucidate the contagious cancer.
It turned out that the caengage of the tumor in the injected chicken was a harmful programs – but Rous wasn’t able to isotardy it at the time.
The presentance of his study, and the paper alerting it, wasn’t determined until after 1951, when a murine leukemia harmful programs was isotardyd. This discleave outed the door to the era of tumor virology – and to many citations for Rous’s initial paper. The harmful programs Rous had uncomprehendingly discovered in his 1911 paper became comprehendn as the Rous sarcoma harmful programs (RSV), and Rous was awarded the Nobel Prize in Medicine in 1966, 55 years after publishing.
Many other examples of tumor-inducing harmful programses complyed: in rabbits, cats, nonhuman primates, and eventupartner humans (one example is the Epstein-Barr harmful programs, which caengages glandular fever, also comprehendn as mono, and which can caengage Hodgkin’s lymphoma).
This instraightforwardly led to the discovery of oncogenes: genes that, when mutated, have the potential to caengage cancer, for example by promoting undeal withled cell prolongth. That’s becaengage scientists, intrigued by the discovery of cancer-promoting harmful programses, began to see at exactly how they had their effect. For RSV, it was discovered that one of the genes that helped it caengage cancer had a counterpart in the chicken (and indeed human) genome comprehendn as SRC. Discoveries of many other harmful programs-present gene pairings complyed, apexhibiting novel insights into the etiology of cancer.
What can we lget from the case of the RSV, where vital comprehendledge was secret away and unappreciated for decades before it was rediscovered, seeding an enormously fruitful line of research? What can we lget from sleeping beauties more generpartner?
Some of the reasons for a paper’s slumber are technorational. As we saw with some of the sleeping beauties from physics, it was only with the advent of novel techniques (in this case in virology) that Rous’s discovering could be brimmingy spendigated and verified. As the biologist Andreas Wagner put it, ‘no innovation, no matter how life-changing and alterative, prospers unless it discovers a receptive environment’.
Some sleeping beauties might have slept due to needy access to discoverings. In the punctual twentieth century it was difficultly straightforward to lay one’s hands on a scientific journal to access the comprehendledge wilean: these days it’s easier, though still not as effortless as it should be. The Open Access transferment has made vital strides in this see, but it’s a reminder that freely useable comprehendledge could speed up scientific enhance for the mundane reason that more scientists can read it and have it transmit with their pre-existing ideas.
Another lesson is roverhappinessed to collaboration. It could be that the techniques and comprehendledge needd to brimmingy utilize a discovery in one field lie, partly or wholly, in an enticount on separateent one. A study from 2022 showed empiricpartner how the ‘distance’ between biomedical discoverings – whether they were from analogous subfields or ones that generpartner never cite each other – chooses whether they tfinish to be combined to create novel comprehendledge.
‘Biomedical scientists’, as the paper’s author, data scientist Raul Rodriguez-Esteban, put it, ‘materialize to have a expansive set of facts useable, from which they only finish up publishing discoveries about a petite subset’. Perhaps comprehendably, they tfinish to ‘accomplish more frequently for facts that are sealr’. Encouraging interdisciplinary collaboration, and encouraging scientists to get an discleave out mind about who they might toil with, could help extfinish that accomplish.
That, of course, is easier shelp than done. Perhaps the most conmomentary tools we have useable – namely, strong AI systems – could help us. It is possible to train an AI to escape the disciplinary lines of universities, instead generating ‘alien’, yet scientificpartner plausible, hypotheses from apass the entire scientific literature.
These might be based, for example, on the identification of unstudied pairs of scientific concepts, improbable to be envisiond by human scientists in the proximate future. It’s already been shown in research on organic language processing that a purifyly textual analysis of published studies could potentipartner glean gene-disrelieve associations or drug aims years before a human, or a human-led analysis, would discover them.
The AI technique of ‘contextualized literature-based discovery’ aims to mine the scientific literature to discover enticount on novel hypotheses that human scientists can then test. At contransient, it produces a lot of non-advantageous combinations and ideas – but as the AI models increase, we could engage them to sift thcimpolite that immense scientific literature in ways that no human could ever do, discovering sleeping beauties – or at least fragmentary sleeping beauties – that ‘wake up’ when combined with some other vital piece of comprehendledge.
Most scientific discoveries materialize in the vicinity of earlier discoverings – the search for novel comprehendledge, as one recent study put it, ‘is contraged by local misengage of the comprehendn over novel exploration of the obstreatment’. That’s only organic: scientists, appreciate humans in vague, stick to what they comprehend, mainly reading the literature from their own fields.
Now that we comprehend the power of sleeping beauties, that needn’t be the case. Nor does it need to be the case for AI. Complementing human reasoning with AI has the potential to speed up scientific discovery, broadening the scope of our accumulateive imagination.
These efforts in literature mining will probable awaken many still-sleeping beauties in the scientific literature aextfinished the way.