dRIGER

A Multivariate Computational Method to Analyze High-Content RNAi Screening Data – Supplemental Material

Journal of Biomolecular Screening

Jonathan Rameseder1, 2, Konstantin Krismer1, Yogesh Dayma1, Tobias Ehrenberger3, Mun Kyung Hwang1, Scott R. Floyd1, 4, Edoardo M. Airoldi5, 6, Michael B. Yaffe1, 4, 6

  1. Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
  2. Computational Systems Biology Initiative, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
  3. Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
  4. Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02215, USA
  5. Department of Statistics, and FAS Center for Systems Biology, Harvard University, Cambridge, MA 02138, USA
  6. Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA

Abstract

High-content screening (HCS) using RNA interference (RNAi) in combination with automated microscopy is a powerful investigative tool to explore complex biological processes. However, despite the plethora of data generated from these screens, little progress has been made in analyzing HC data using multivariate methods that exploit the full richness of multidimensional data. We developed a novel multivariate method for HCS, multivariate robust analysis method (M-RAM), integrating image feature selection with ranking of perturbations for hit identification, and applied this method to an HC RNAi screen to discover novel components of the DNA damage response in an osteosarcoma cell line. M-RAM automatically selects the most informative phenotypic readouts and time points to facilitate the more efficient design of follow-up experiments and enhance biological understanding. Our method outperforms univariate hit identification and identifies relevant genes that these approaches would have missed. We found that statistical cell-to-cell variation in phenotypic responses is an important predictor of hits in RNAi-directed image-based screens. Genes that we identified as modulators of DNA damage signaling in U2OS cells include B-Raf, a cancer driver gene in multiple tumor types, whose role in DNA damage signaling we confirm experimentally, and multiple subunits of protein kinase A.

directional RIGER

Directional RIGER (dRIGER), an extended derivation of RNAi Gene Set Enrichment (RIGER), can be used to transform shRNA-level into gene-level data by computing directional normalized enrichment scores (dNES). dRIGER quantifies both the magnitude and the consistency of the phenotypic effects of multiple shRNAs targeting the same specific gene using a Kolmogorov-Smirnov motivated running-sum test statistic. Multiple shRNAs inducing a moderate but consistent phenotypic effect receive a higher dNES than a set of highly inconsistent shRNAs with one very strong outlier. Briefly, dRIGER first rank-orders all screened shRNA values from largest to smallest and sequentially traverses each rank in this list from left to right (top to bottom) to compute positional ES. A rank’s positional ES reflects how many shRNAs from the set targeting the gene of interest were previously encountered in the list and how many are still ahead in the list. This procedure quantifies whether the shRNAs targeting a gene of interest are clustered towards the left end of the list. The rank-ordered list is then similarly traversed from right to left (bottom to top). Finally, the largest positional enrichment score is normalized as in Gene Set Enrichment Analysis and selected as dNES. If the dNES was found by traversing from right to left, its sign is set to negative to indicate bottom-of-list enrichment. dNES were computed for each feature and each gene at each time point.
Mathematically, positional hit and miss scores are calculated at each position \(i\) in a rank-ordered list of length \(L\) based on the ranks of the screened shRNAs targeting the gene of interest, \(G_{f, t} = (h_1, …, h_{|G_{f, t}|})\), where each \(h\) represents a single rank in the rank ordered list:

\(P_H(G_{f, t}, i) = \sum_{h_{j \leq i} \in G_{f, t}} \frac{h_j}{\sum_{h \in G_{f, t}}h}\)

\(P_M(G_{f, t}, i) = \sum_{h_{j \leq i} \not\in G_{f, t}} \frac{1}{L – |G_{f, t}|}\)

Similarly, inverse positional ES are computed to test for rank enrichment at the right end of the rank-ordered list using an inverse shRNA rank set \(G^I_{f, t} \) where

\(G^I_{f, t} = L – G_{f, t} + 1\)

Finally, dES are computed as

\(\epsilon_d(G_{f, t}) = \max[\max(P_H(G_{f, t}) – P_M(G_{f, t})), \max(\vec{P}_H(G^I_{f, t}) – \vec{P}_M(G^I_{f, t}))\)

and multiplied with -1 if the inverse directional ES is greater than the directional ES.

Implementation

A Java implementation of dRIGER can be obtained here (Java 1.7 or higher is required):

* includes Apache commons-math 3.4.1

usage:
dRIGER <matrixFilePath> <plateColumnIndex> 
  <wellColumnIndex> <geneIdentifierColumnIndex>
  <timePointColumnIndex> <featureIndexOffset> 
  [<verbose>] [<numPermutations>] [<separator>]
Argument Description Default value Example
matrixFilePath path to hairpin data matrix file data/matrix.txt
plateColumnIndex index* of plate column in hairpin data matrix file 0
wellColumnIndex index* of well column in hairpin data matrix file 1
geneIdentifierColumnIndex index* of gene identifier column in hairpin data matrix file 2
timePointColumnIndex index* of time point column in hairpin data matrix file 3
featureIndexOffset index* of first feature column in hairpin data matrix file 4
verbose 1 or 0, if the former is specified, position specific scores are written to disk 0 1
numPermutations number of permutations performed in normalization step 1000 1500
separator column separator of hairpin data matrix file ,|\t ;
* zero-based

Example

Hairpin data matrix file (matrix.txt) with three genes (gene1, gene2, gene3), three hairpins per gene, three features (feature1, feature2, feature3), one plate:

plate,well,gene_symbol,timepoint,feature1,feature2,feature3
plate1,A01,gene1,1,1.78889837051485,-1.38559942409011,-0.110859234100811
plate1,A02,gene2,1,-2.06584308249532,-0.157681530348831,-0.220422916467628
plate1,A03,gene3,1,0.0221250758955861,1.46538326829911,2.87106063095209
plate1,B01,gene2,1,-2.0529896774323,0.48586370321946,-1.1306750087691
plate1,B02,gene1,1,1.97104539052258,-2.0843637263815,-0.15153882256323
plate1,B03,gene3,1,1.71205856937918,0.466104405313041,-1.17867810808908
plate1,C01,gene3,1,0.432190476190475,1.0166497978926,-1.975492519911615
plate1,C02,gene2,1,-1.27350105047672,0.833864097839698,-0.941516050116743
plate1,C03,gene1,1,1.75521582733811,-1.185443963569041,0.180740762397609

Run dRIGER with appropriate arguments:

java -jar dRIGER-jar-with-dependencies.jar matrix.txt 0 1 2 3 4

dRIGER console output:

preprocessing: initialization
preprocessing: done
generation: initialization
dRIGER for feature tp1_feature1 successfully calculated
dRIGER for feature tp1_feature2 successfully calculated
dRIGER for feature tp1_feature3 successfully calculated
generation: done
condensation: initialization
condensation: done

Enrichment scores were saved to dRIGER_enrichment_scores.txt. The temp directory can be deleted after dRIGER completed the condensation step.