Reducing false positives in differential analyses of large RNA sequencing data sets

In biological studies, identifying biological features that are significantly different under different experimental conditions or disease states is important to understand the biological mechanisms behind phenotypes — an individual’s, or organism’s, observable traits, such as height, eye color, and blood type.

Among these features, gene expression is the one most commonly studied by scientists. The development of a technique called RNA sequencing, or RNA-seq — which allows biologists to study all genes at a time, revealing the presence and quantity of RNA molecules in a biological sample — has made it easier and faster for scientists to identify differentially expressed genes (DEGs) at the genome-wide level.

However, the generally small sample size of RNA-seq data (usually only 2-4 biological replicates per condition) makes it difficult to identify DEGs accurately. (Previous researchers have developed statistical methods based on parametric distributional assumptions and the empirical Bayes approach to improve statistical power in small samples, such as two popular methods among scientists: DESeq2 and edgeR.) As the cost of sequencing has declined, the sample size of RNA-seq data has gradually increased, reaching hundreds or even thousands, in some population-level studies. This large increase raises the question of whether methods like DESeq2 and edgeR, designed for small-sample-size data, are still applicable to much larger population-level RNA-seq data sets.

To answer this question, researchers from UCLA and UC Irvine published a paper on March 15 titled “Exaggerated false positives by popular differential expression methods when analyzing human population samples” in the journal Genome Biology.

Prof. Jingyi Jessica Li

The researchers discovered through permutation analysis that DESeq2 and edgeR have extremely high false discovery rates. False discovery rate (FDR) is a statistical concept that describes the reliability of discoveries identified by a method. The smaller the FDR, the more reliable the discoveries. This concept has been widely used as a criterion in bioinformatics tools for analyzing various biological data, such as genomics and proteomics data.

By further evaluating multiple DEG analysis methods, the researchers found that only the Wilcoxon rank-sum test could control the FDR and achieve good power.

“Researchers demand reliable discoveries that contain few false discoveries, and for population-level RNA-seq studies, we recommend the Wilcoxon rank-sum test,” said Jingyi “Jessica” Li, the study’s senior author, who is a UCLA associate professor of statistics and is affiliated with UCLA’s Bioinformatics Ph.D. program.

This research was supported by the National Science Foundation, National Institute of General Medical Sciences, Alfred P. Sloan Foundation and W.M. Keck Foundation.

Article by Yumei Li (UC Irvine), Jingyi Jessica Li, and Stuart Wolpert