A Multivariate Computational Method to Analyze High-Content RNAi Screening Data - Supplemental Material
Journal of Biomolecular Screening
- Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Computational Systems Biology Initiative, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02215, USA
- Department of Statistics, and FAS Center for Systems Biology, Harvard University, Cambridge, MA 02138, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
Abstract
High-content screening (HCS) using RNA interference (RNAi) in combination with automated microscopy is a powerful investigative tool to explore complex biological processes. However, despite the plethora of data generated from these screens, little progress has been made in analyzing HC data using multivariate methods that exploit the full richness of multidimensional data. We developed a novel multivariate method for HCS, multivariate robust analysis method (M-RAM), integrating image feature selection with ranking of perturbations for hit identification, and applied this method to an HC RNAi screen to discover novel components of the DNA damage response in an osteosarcoma cell line. M-RAM automatically selects the most informative phenotypic readouts and time points to facilitate the more efficient design of follow-up experiments and enhance biological understanding. Our method outperforms univariate hit identification and identifies relevant genes that these approaches would have missed. We found that statistical cell-to-cell variation in phenotypic responses is an important predictor of hits in RNAi-directed image-based screens. Genes that we identified as modulators of DNA damage signaling in U2OS cells include B-Raf, a cancer driver gene in multiple tumor types, whose role in DNA damage signaling we confirm experimentally, and multiple subunits of protein kinase A.
directional RIGER
Directional RIGER (dRIGER), an extended derivation of RNAi Gene Set Enrichment (RIGER), can be used to transform shRNA-level into gene-level data by computing directional normalized enrichment scores (dNES). dRIGER quantifies both the magnitude and the consistency of the phenotypic effects of multiple shRNAs targeting the same specific gene using a Kolmogorov-Smirnov motivated running-sum test statistic. Multiple shRNAs inducing a moderate but consistent phenotypic effect receive a higher dNES than a set of highly inconsistent shRNAs with one very strong outlier. Briefly, dRIGER first rank-orders all screened shRNA values from largest to smallest and sequentially traverses each rank in this list from left to right (top to bottom) to compute positional ES. A rank’s positional ES reflects how many shRNAs from the set targeting the gene of interest were previously encountered in the list and how many are still ahead in the list. This procedure quantifies whether the shRNAs targeting a gene of interest are clustered towards the left end of the list. The rank-ordered list is then similarly traversed from right to left (bottom to top). Finally, the largest positional enrichment score is normalized as in Gene Set Enrichment Analysis and selected as dNES. If the dNES was found by traversing from right to left, its sign is set to negative to indicate bottom-of-list enrichment. dNES were computed for each feature and each gene at each time point.
Mathematically, positional hit and miss scores are calculated at each position \(i\) in a rank-ordered list of length \(L\) based on the ranks of the screened shRNAs targeting the gene of interest, \(G_{f, t} = (h_1, ..., h_{|G_{f, t}|})\), where each \(h\) represents a single rank in the rank ordered list:
\(P_H(G_{f, t}, i) = \sum_{h_{j \leq i} \in G_{f, t}} \frac{h_j}{\sum_{h \in G_{f, t}}h}\)
\(P_M(G_{f, t}, i) = \sum_{h_{j \leq i} \not\in G_{f, t}} \frac{1}{L - |G_{f, t}|}\)
Similarly, inverse positional ES are computed to test for rank enrichment at the right end of the rank-ordered list using an inverse shRNA rank set \(G^I_{f, t} \) where
\(G^I_{f, t} = L - G_{f, t} + 1\)
Finally, dES are computed as
\(\epsilon_d(G_{f, t}) = \max[\max(P_H(G_{f, t}) - P_M(G_{f, t})), \max(\vec{P}_H(G^I_{f, t}) - \vec{P}_M(G^I_{f, t}))\)
and multiplied with -1 if the inverse directional ES is greater than the directional ES.
Implementation
A Java implementation of dRIGER can be obtained here (Java 1.7 or higher is required):- dRIGER 1.0.0 Executable Jar File with dependencies* (recommended)
- dRIGER 1.0.0 Executable Jar File without dependencies
* includes Apache commons-math 3.4.1
usage:
dRIGER <matrixFilePath> <plateColumnIndex>
<wellColumnIndex> <geneIdentifierColumnIndex>
<timePointColumnIndex> <featureIndexOffset>
[<verbose>] [<numPermutations>] [<separator>]
Argument | Description | Default value | Example |
---|---|---|---|
matrixFilePath | path to hairpin data matrix file | - | data/matrix.txt |
plateColumnIndex | index* of plate column in hairpin data matrix file | - | 0 |
wellColumnIndex | index* of well column in hairpin data matrix file | - | 1 |
geneIdentifierColumnIndex | index* of gene identifier column in hairpin data matrix file | - | 2 |
timePointColumnIndex | index* of time point column in hairpin data matrix file | - | 3 |
featureIndexOffset | index* of first feature column in hairpin data matrix file | - | 4 |
verbose | 1 or 0, if the former is specified, position specific scores are written to disk | 0 | 1 |
numPermutations | number of permutations performed in normalization step | 1000 | 1500 |
separator | column separator of hairpin data matrix file | ,|\t | ; |
Example
Hairpin data matrix file (matrix.txt) with three genes (gene1, gene2, gene3), three hairpins per gene, three features (feature1, feature2, feature3), one plate:
plate,well,gene_symbol,timepoint,feature1,feature2,feature3 plate1,A01,gene1,1,1.78889837051485,-1.38559942409011,-0.110859234100811 plate1,A02,gene2,1,-2.06584308249532,-0.157681530348831,-0.220422916467628 plate1,A03,gene3,1,0.0221250758955861,1.46538326829911,2.87106063095209 plate1,B01,gene2,1,-2.0529896774323,0.48586370321946,-1.1306750087691 plate1,B02,gene1,1,1.97104539052258,-2.0843637263815,-0.15153882256323 plate1,B03,gene3,1,1.71205856937918,0.466104405313041,-1.17867810808908 plate1,C01,gene3,1,0.432190476190475,1.0166497978926,-1.975492519911615 plate1,C02,gene2,1,-1.27350105047672,0.833864097839698,-0.941516050116743 plate1,C03,gene1,1,1.75521582733811,-1.185443963569041,0.180740762397609
Run dRIGER with appropriate arguments:
java -jar dRIGER-jar-with-dependencies.jar matrix.txt 0 1 2 3 4
dRIGER console output:
preprocessing: initialization preprocessing: done generation: initialization dRIGER for feature tp1_feature1 successfully calculated dRIGER for feature tp1_feature2 successfully calculated dRIGER for feature tp1_feature3 successfully calculated generation: done condensation: initialization condensation: done
Enrichment scores were saved to dRIGER_enrichment_scores.txt. The temp directory can be deleted after dRIGER completed the condensation step.