Cancer Biology

Replication of “null results” – Absence of evidence or evidence of absence?

Samuel Pawel author has email address
Rachel Heyard
Charlotte Micheloud
Leonhard Held

Epidemiology, Biostatistics and Prevention Institute, Center for Reproducible Science, University of Zurich, Switzerland

https://doi.org/10.7554/eLife.92311.2

Open access
Copyright information

Figures and data

Two examples of original and replication study pairs which meet the non-significance replication success criterion from the Reproducibility Project: Cancer Biology (Errington et al., 2021). Shown are standardized mean difference effect estimates with 95% confidence intervals, sample sizes n, and two-sided p-values p for the null hypothesis that the effect is absent. Effect estimate, 95% confidence interval, and p-value from a fixed-effect meta-analysis p_MA of original and replication study are shown in gray.

Null hypothesis (H₀) and alternative hypothesis (H₁) for superiority and equivalence tests (with equivalence margin Δ > 0).

Effect estimates on standardized mean difference (SMD) scale with 90% confidence interval for the 15 “null results” and their replication studies from the Reproducibility Project: Cancer Biology (Errington et al., 2021). The title above each plot indicates the original paper, experiment and effect numbers. Two original effect estimates from original paper 48 were statistically significant at p < 0.05, but were interpreted as null results by the original authors and therefore treated as null results by the RPCB. The two examples from Figure 1 are indicated in the plot titles. The dashed gray line represents the value of no effect (SMD = 0), while the dotted red lines represent the equivalence range with a margin of Δ = 0.74, classified as “liberal” by Wellek (2010, Table 1.1). The p-value p_TOST is the maximum of the two one-sided p-values for the null hypotheses of the effect being greater/less than +Δ and −Δ, respectively. The Bayes factor BF₀₁ quantifies the evidence for the null hypothesis H₀ : SMD = 0 against the alternative H₁ : SMD ≠ 0 with normal unit-information prior assigned to the SMD under H₁.

Number of successful replications of original null results in the RPCB as a function of the margin Δ of the equivalence test (p_TOST ≤ α in both studies for α = 0.1, 0.05, 0.01) or the standard deviation of the zero-mean normal prior distribution for the SMD effect size under the alternative H₁ of the Bayes factor test (BF₀₁ ≥ γ in both studies for γ = 3, 6, 10).

Effect estimates on Fisher z-transformed correlation scale with 90% confidence interval for the “null results” and their replication studies from the Reproducibility Project: Psychology (RPP, Open Science Collaboration, 2015) and the Experimental Philosophy Replicability Project (EPRP, Cova et al., 2018). The dashed gray line represents the value of no effect (z = 0), while the dotted red lines represent the equivalence range with a margin of Δ = 0.74. The p-value p_TOST is the maximum of the two one-sided p-values for the null hypotheses of the effect being greater/less than +Δ and −Δ, respectively. The Bayes factor BF₀₁ quantifies the evidence for the null hypothesis H₀ : z = 0 against the alternative H₁ : z ≠ 0 with normal prior centered around zero and standard deviation of 2 assigned to the effect size under H₁.

Sensitivity analyses for the “null results” and their replication studies from the Reproducibility Project: Psychology (RPP, Open Science Collaboration, 2015) and the Experimental Philosophy Replicability Project (EPRP, Cova et al., 2018). The Bayes factor of the replication of Ranganath and Nosek (2008) decreases very quickly and is only shown for a limited range.

Sign up for email alerts