Row versus column correlations: avoiding the ecological fallacy in RNA/protein expression studies.

BRIEFINGS IN BIOINFORMATICS(2018)

Cited 0|Views25
No score
Abstract
Biomedical researchers are often interested in computing the correlation between RNA and protein abundance. However, correlations can be computed between rows of a data matrix or between columns, and the results are not the same. The belief that these two types of correlation are estimating the same phenomenon is a special case of a well-known logical error called the ecological fallacy. In this article, we review different uses of correlation found in the literature, explain the differences between row and column correlations and argue that one of them has an undesirable interpretation in most applications. Through simulation studies and theoretical derivations, we show that the commonly used Pearson's coefficient, computed from protein and transcript data from a single sample, is only loosely related to the biological correlation that most researchers will be interested in studying. Beyond our basic exploration of the ecological fallacy, we examine how correlations are affected by relative quantification proteomics data and common normalization procedures, finding that double normalization is capable of completely masking true correlative relationships. We conclude with guidelines for properly identifying and computing consistent correlation coefficients.
More
Translated text
Key words
proteomics,transcripts,normalization,probe effect,guidelines
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined