Thursday, August 1, 2024

Science and the significant trend towards spin and fairytales

Science and the significant trend towards spin and fairytales


Simon Gandevia

What do fairytales and scientific papers have in common? Consider the story of Rumpelstiltskin. 

A poor miller tries to impress the king by claiming his daughter can spin straw into gold. The avaricious king locks up the girl and tells her to spin out the gold. She fails, until a goblin, Rumpelstiltskin, comes to her rescue.  

In science, publishers and editors of academic journals prefer to publish demonstrably new findings – gold – rather than replications or refutations of findings which have been published already. This “novelty pressure” requires presentation of results that are “significant” – usually that includes being “statistically significant.”  

In the conventional realm of null-hypothesis testing of significance, this means using a threshold probability. Usually in biology and medicine the accepted cutoff is a probability of 0.05 (a chance of 5%, or one in 20) and its use is explicitly written into the description of the methods section of publications. Some branches of science, such as genetics and physics, use more stringent probability thresholds. But the necessity of having a threshold remains.  

How do researchers create the illusion of novelty in a result when the finding has a probability value close to, but on the wrong side of, the stated probability threshold – for example, a probability of 0.06? Talk it up, spin out a story! It’s the fairytale of Rumpelstiltskin in modern garb.  

Here are more than 500 examples of pretzel logic researchers have used to make claims of significance despite p values higher than .05. It would be comical if not for the serious obfuscation of science which the stories cause.  

In recent years, the practice of claiming importance and true significance for such results has been termed “spin.” More formally, we call it “reporting that could distort the interpretation of results and mislead readers.”

Increasingly, scholars are quantifying and analyzing the practice of spinning probability values. Linked to our development of a “Quality Output Checklist and Content Assessment” (i.e. QuOCCA) as a tool for assessing research quality and reproducibility, my colleagues and I have measuredhow often spin occurs in three prestigious journals, the Journal of Physiology, the British Journal of Pharmacology and the Journal of Neurophysiology.  

We found when probability values were presented in the results section of the publication, but were not quite statistically significant (greater than 0.05 but less than 0.10), authors talked up the findings and spun out a story in about 55%-65% of publications. Often, they wrote results “trended” to significance. Thus, results of straw can become results of gold! Attractive to the researchers, editors, publishing houses and universities.  

Putting spin on insignificant probability values is an egregious and shonky – that’s dubious, for our friends outside of Australia –  scientific practice. It shows the authors’ failure to appreciate the requirement of an absolute threshold for claiming the presence (or not) of an effect, or for supporting (or not) a hypothesis. It reveals an entrenched and incorrigible capacity for bias. Furthermore, the authors seem unaware of the fact that a probability value of, say, 0.07 is not even justifiable as a trend: The addition of further samples or participants does not inexorably move the probability value below the 0.05 threshold.  

The number of instances of spin within a publication has no theoretical limit; any probability value above 0.05 could be talked up. However, while our previous audits of publications in three journals have occasionally found more than one example of spin within a single publication, such casess seemed rare.  

A 2022 paper in the British Journal of Pharmacology titled “Deferiprone attenuates neuropathology and improves outcome following traumatic brain injury,” has obliterated this impression. On at least 25 occasions, the authors overhype results linked to a probability value exceeding 0.05. Some of the offending explanations use phrases such as: “did not reach significance but showed a strong trend (p=0.075);” “a trending yet non-significant preservation of neurons was seen;” “no significant changes were seen in proBDNF despite an increased trend.”

In the publication, by Daglas and colleagues, many probability values between 0.05 and 0.10 were spun, but even values above 0.10 were considered “trendy.” These included values of 0.11, 0.14, 0.16, 0.17, 0.23 and 0.24. The authors have not responded to my request for comment, 

As  the 2024 Paris Olympics get underway, it is tempting to ask: Does the featured publication set a World Record for scientific spin? Comment with your entries, please.

What should be done about the prevalence of spinning probability values? This question is part of a bigger dilemma. All levels of the “industry” of science know the problems caused by perpetuating shonky science, but their attempts at regulation and improvement are fraught with difficulty and impeded by self-interest. Education about science publication and mandatory requirements before publication are potentially helpful steps. 

The messages from Rumpelstiltskin should be that spinning straw can lead to trouble, and science is not a fairytale.  

Simon Gandevia is deputy director of Neuroscience Research Australia.


No comments: