Extracted from Law360
Statistical significance, a concept often invoked to help characterize the strength or weakness of scientific results, is a prominent feature in modern litigation. Cases requiring expert testimony on the issue of causation, for example, often involve analyses of statistical significance to support or attack (and potentially exclude) an expert’s causation conclusion.
However, some researchers now question whether statistical significance should be a driving force in scientific discourse, arguing that the concept is increasingly being used to justify unfounded conclusions and discount otherwise valuable research. This article aims to identify to legal professionals this recent scientific criticism and discuss the ways this shift in attitude may affect expert witness practice in the future.
A Brief Primer on P-Values
Researchers often present their results in terms of statistical significance to show how confident they are that the data support the hypothesis they are testing. These judgments about statistical significance are based on probability calculations — to measure their data against the range of expected results, researchers calculate the “p-value,” or “the probability under a specified statistical model that a statistical summary of the data (e.g., the sample mean difference between two compared groups) would be equal to or more extreme than its observed value.” The p-value thus tells researchers whether their data are compatible with the “null hypothesis,” or the opposite of the hypothesis they are testing.
Suppose, for example, that one is conducting a scientific study to explore the hypothesis that a certain drug causes birth defects. (In this scenario, the null hypothesis would be that there is no causal link between the drug and birth defects.) The data show that subjects exposed to the drug experienced birth defects more often than subjects who were not exposed.
But to help determine how confident one can be in this hypothesis, one wants to know whether one could have gotten that same data even if the null hypothesis is true. So the p-value is calculated. If the p-value is low, this suggests that the data are statistically incompatible with the null hypothesis (demonstrating in the above hypothesis that there may be an association between the drug and birth defects). Many researchers interpret a low p-value as casting doubt on the validity of the null hypothesis.
This is where statistical significance comes into play. Under the prevailing practice, if the p-value is lower than 5%, researchers reject the null hypothesis and conclude that they have “statistically significant” evidence to support the hypothesis being examined.
Recent Criticism of Statistical Significance
In recent years, some scientists and mathematicians have expressed growing concerns about relying on p-value thresholds as an indicator of whether a finding is meaningful. In a 2016 policy statement, the American Statistical Association warned that p-values offer limited information about the data gathered, stressing that a p-value is not the actual probability that the tested hypothesis is true. The ASA admonished researchers not to make scientific judgments based solely on a bright-line statistical threshold of a 5% p-value, which “distort[s] the scientific process” by encouraging researchers to make binary decisions about the value of scientific data without considering the broader context.
More recently, some scientists have begun publicly calling for outright abandonment of statistical significance. These scientists still advocate the use of p-values as an analytical tool, but they contend that researchers should stop using such calculations to categorize results with a bright line of being significant or not. Determining whether a causal relationship exists, they argue, is too complicated to be reduced to a rigid dichotomy, and reliance on arbitrary thresholds produces misleading results.
Despite this recent criticism, relying on statistical significance remains the norm in scientific practice. As some researchers point out, although statistical significance is imperfect and often misunderstood, it is an essential tool for filtering out weak research and preventing data manipulation. Legal practitioners should pay attention to this ongoing debate and the evolving attitudes toward statistical significance, and prepare to address these concerns in their own expert witness practice.
Statistical Significance and Admissibility
Because of its traditionally pervasive role in the interpretation of scientific data, statistical significance is a common consideration in the evidentiary analysis governing the admissibility of expert testimony. Under the familiar Rule 702 and Daubert rubrics, federal courts (and many state courts) evaluate the reliability of an expert’s methodology based on a number of factors, including: (1) whether the theory or technique can be and has been tested, (2) whether it has been subjected to peer review and publication, (3) the known or potential error rate and the existence of standards controlling its operation and (4) whether it has attracted broad acceptance in the relevant scientific community.
Applying Rule 702 and Daubert, many courts treat statistically significant reliance data as a requirement for admissibility. In General Electric Co. v. Joiner, for example, the U.S. Supreme Court concluded that it was not an abuse of discretion to exclude expert testimony in part because it relied on a study involving statistically nonsignificant data. Since then, lower federal courts routinely have excluded expert testimony as unreliable when it is not based on statistically significant data.
However, in line with the recent criticism from the scientific community, not all courts consider statistically significant reliance data to be a sine qua non of reliability. These courts hold that it is error to “treat the lack of statistical significance as a crucial flaw.”
In cases in which experts are allowed to testify despite a lack of statistically significant reliance data, courts tend to take a more circumscribed view of the Rule 702 and Daubert analyses. On the other side of the coin, courts (and most scientists) also recognize that just because results are statistically significant does not necessarily mean they are valid or reliable; significance can result from various aspects of study design and conduct, not just from “real” associations.
Implications for Expert Witness Practice
Despite some recent questioning of statistical significance in the scientific community and some case law minimizing its role in the admissibility analysis, legal practitioners should not ignore the continued import of statistical significance. When employed appropriately as a focused basis for judging the strength of an expert’s reliance data, statistical significance remains a powerful force both in the admissibility analysis and in arguing the substantive weight that should be assigned to an expert’s opinion. Even the critics of statistical significance acknowledge that the underlying statistical calculation — the p-value — is a useful tool for evaluating and interpreting scientific data.
In light of the shifting attitudes toward statistical significance, however, lawyers and experts should carefully consider how heavily they will rely on statistical significance in practice. If more researchers move away from emphasizing statistical significance, then expert witnesses must be prepared to explain the full contextual basis for their conclusions, including study design, the quality of the data and the underlying scientific mechanisms that contribute to the phenomena being tested. Experts cannot merely point to statistical significance as an authoritative badge of reliability.
When attacking expert testimony as unreliable, the practitioner should bear in mind that some courts may treat the lack of statistically significant reliance data as an issue going to the weight of the evidence rather than its admissibility. It continues to be good practice to present an argument for exclusion based on the lack of statistical significance, but one should remember that statistical significance may be a slender reed.
To better the chances of exclusion, one should prepare alternative arguments based on other weaknesses in the expert’s approach. In addition, if the expert did rely on statistically significant data, one should look for any flaws in the methodology that cast doubt on the practical significance of that data. For example, an expert’s testimony might be inadmissible if the expert relied on a technique that produced statistically significant results but purposely ignored all other techniques that did not produce such results.
The recent questioning of statistical significance should also inform the way lawyers prepare expert witnesses and vet their conclusions. Statistical significance may be part of an expert’s reasoning, but the expert should also consider all possible explanations for a scientific conclusion, including study design and the underlying scientific processes that affect the result. To the extent there are other studies or data that contradict the expert’s conclusion, the expert should not dismiss that other data merely because it is not statistically significant. Experts should consider the full context and prepare a more nuanced criticism of any contradictory scientific information.
In sum, the role of statistical significance in scientific inquiry and expert practice may be evolving, and expert witness practice must adapt to this change. Researchers and legal practitioners may still rely on p-value (and even statistical significance thresholds) to interpret and evaluate data, but they should recognize statistical significance as just one factor among many that guide the analysis of scientific data and our confidence in tested hypotheses.
 Ronald L. Wasserstein & Nicole A. Lazar, “The ASA’s Statement on p-Values: Context, Process, and Purpose,” American Statistician 70:2, at 131 (2016), https://amstat.tandfonline.com/doi/pdf/ 10.1080/00031305.2016.1154108.
 Id. at 131-32.
 Valentine Amrhein, Sander Greenland & Blake McShane, “Scientists Rise Up Against Statistical Significance,” Nature (March 20, 2019), https://www.nature.com/articles/d41586-019-00857-9.
 John P. A. Ioannidis, “The Importance of Predefined Rules and Prespecified Statistical Analyses,” JAMA (Apr. 4, 2019), https://jamanetwork.com/journals/jama/fullarticle/2730486.
 Fed. R. Evid. 702; Daubert v. Merrell Dow Pharms. Inc. , 509 U.S. 579, 593-94 (1993).
 General Electric Co. v. Joiner , 522 U.S. 136, 145-46 (1997) (“A court may conclude that there is simply too great an analytical gap between the data and the opinion proffered”).
 See, e.g., Wells v. SmithKline Beecham Corp. , 601 F.3d 375, 380 (5th Cir. 2010) (“[T]his court has frowned on causative conclusions bereft of statistically significant epidemiological support”); Norris v. Baxter Healthcare Corp. , 397 F.3d 878, 887 (10th Cir. 2005) (“We cannot allow the jury to speculate based on an expert’s opinion which relies only on clinical experience in the absence of showing a consistent, statistically significant association between breast implants and systemic disease”); Burleson v. Tex. Dep’t of Crim. Justice , 393 F.3d 577, 585-87 (5th Cir. 2004) (“Dr. Carson offers no studies which demonstrate a statistically significant link between thorium dioxide exposure in dust or fumes and Burleson’s type of lung or throat cancer”); In re Zoloft (Sertraline Hydrochloride) Prods. Liab. Litig. , No. 12-md-02342, 2015 U.S. Dist. LEXIS 161355, at *30 (E.D. Pa. Dec. 2, 2015) (“The Court agrees that Dr. Jewell’s approach to the Zoloft data de-emphasizes the importance of statistical significance”).
 Milward v. Acuity Specialty Prods. Grp. Inc. , 639 F.3d 11, 25 (1st Cir. 2011); see also Matrixx Initiatives v. Siracusano , 563 U.S. 27, 40-41 (2011) (“[C]ourts frequently permit expert testimony on causation based on evidence other than statistical significance”); Kennedy v. Collagen Corp. , 161 F.3d 1226, 1229 (9th Cir. 1998); Henricksen v. Conoco Phillips Co. , 605 F. Supp. 2d 1142, 1177 (E.D. Wash. 2009) (“[T]he absence of statistical support of causation is not necessarily fatal to a plaintiff’s case”).
 See In re Chantix (Varenicline) Prods. Liab. Litig. , 889 F. Supp. 2d 1272, 1285-86 (N.D. Ala. 2012).
 See Ioannidis, supra note 7.
 See, e.g., In re Lipitor (Atorvastatin Calcium) Mktg. Sales Practices and Prods. Liab. Litig. , 145 F. Supp. 3d 573, 583 (D.S.C. 2015) (“The problem with Dr. Jewell’s use of the mid-p test is that his use of it was results driven. He only used this test once the Fisher exact test returned a non-significant result”).