Skip to content

Commit cd91bd0

Browse files
authored
Merge pull request #153 from gwaygenomics/readme-tweaks
Update README
2 parents 4537bf9 + fa15e00 commit cd91bd0

File tree

2 files changed

+33
-24
lines changed

2 files changed

+33
-24
lines changed

README.md

Lines changed: 28 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -29,20 +29,21 @@
2929

3030
Cell health can be altered by genetic and chemical perturbations.
3131
An increased understanding of these perturbation mechanisms is directly relevant for drug discovery and personalized medicine.
32-
Here and in an accompanying paper, we present a novel cell imaging assay to measure 70 different aspects of cell health, such as proliferation, apoptosis, and cell cycle stalling.
33-
However, this assay requires expensive reagents and does not scale well.
34-
Therefore, we also developed a machine learning solution to predict cell health readouts directly from the inexpensive and high-throughput Cell Painting imaging assay.
32+
Here and in an accompanying paper, we present two novel cell imaging assays that together measure 70 different aspects of cell health, such as proliferation, apoptosis, and cell cycle stalling.
33+
However, these assays require expensive reagents and do not scale well.
34+
Therefore, we also developed a machine learning solution to predict cell health readouts directly from a separate assay, known as Cell Painting.
35+
In contrast to the Cell Health assays, Cell Painting is inexpensive, high-throughput, and unbiased (reagents are not targeted).
3536
We predict many cell health indicators with high performance, but other readouts could not be predicted.
3637
We validated our predictions by using orthogonal readouts and by applying the models to a large set of 1,500 drugs from the Drug Repurposing Hub.
3738
Cell health predictions for drugs can be browsed at https://broad.io/cell-health-app.
38-
We confirmed mitotic arrest and reactive oxygen species phenotypes via PLK and proteasome inhibition, respectively.
39+
We confirmed mitotic arrest, reactive oxygen species, and DNA damage in G1 cell cycle based phenotypes via PLK, proteasome, and aurora kinase/tubulin inhibition, respectively.
3940
In the future, we can use this approach to determine the cell health consequences of any perturbation in cells.
4041
We conducted this project using open science principles with open data and open source code.
4142

42-
The following repository stores a complete analysis pipeline using Cell Painting data to predict readouts from several cell health assays.
43+
The following repository stores a complete analysis pipeline using Cell Painting data to predict readouts from the Cell Health assays.
4344

44-
We first developed a customized microscopy assay we call "Cell Health".
45-
The Cell Health assay is comprised of two different reagent panels: "Cell cycle" and "viability".
45+
We first developed the customized microscopy assays we collectively call "Cell Health".
46+
The Cell Health assays are comprised of two different reagent panels: "Cell cycle" and "viability".
4647
Together, these two panels use reagents which mark different cell health phenotypes.
4748

4849
| Assay/Dye | Phenotype | Panel |
@@ -55,16 +56,16 @@ Together, these two panels use reagents which mark different cell health phenoty
5556
| pH3 | Cell division | Cell cycle |
5657
| gH2Ax | DNA damage | Cell cycle |
5758

58-
We hypothesized that we can use unbiased and high dimensional Cell Painting profiles to predict the readouts of each individual assay.
59+
We hypothesized that we can use unbiased and high dimensional Cell Painting profiles to predict cell health readouts.
5960

6061
## Approach
6162

62-
This overview figure outlines the Cell Health assay, the Cell Painting assay, and our machine learning approach.
63+
This overview figure outlines the Cell Health assays, the Cell Painting assay, and our machine learning approach.
6364

6465
![approach](https://raw.githubusercontent.com/broadinstitute/cell-health/master/media/approach.png)
6566

6667
> Data processing and modeling approach.
67-
> (a) Example images and workflow from the Cell Health assay.
68+
> (a) Example images and workflow from the Cell Health assays.
6869
> We apply a series of manual gating strategies (see Methods) to isolate cell subpopulations and to generate cell health readouts for each perturbation.
6970
> (top) In the “Cell Cycle” panel, in each nucleus we measure Hoechst, EdU, PH3, and gH2AX.
7071
> (bottom) In the “Cell Viability” panel, we capture digital phase contrast images, measure Caspase 3/7, DRAQ7, CellROX, and (b) Example Cell Painting image across five channels, plus a merged representation across channels.
@@ -94,7 +95,7 @@ All data are publicly available.
9495
| Data | Level | Location | Notes |
9596
| :--- | :---- | :--------| :---- |
9697
| Cell health readouts | Raw | [1.generate-profiles/data/raw](1.generate-profiles/data/raw) | Per cell health panel (cell cycle and viability) per cell line |
97-
| Cell health readouts | Normalized | `1.generate-profiles/data/raw/normalized_cell_health_labels.tsv` | |
98+
| Cell health readouts | Normalized | [1.generate-profiles/data/labels/normalized_cell_health_labels.tsv](1.generate-profiles/data/labels) | |
9899
| Cell health signatures | Consensus | [1.generate-profiles/data/consensus](1.generate-profiles/data/consensus) | |
99100

100101
#### Drug Repurposing Hub
@@ -134,12 +135,13 @@ The full analysis pipeline consists of the following steps:
134135

135136
| Order | Module | Description |
136137
| :---- | :----- | :---------- |
137-
| 0 | Download cell painting data | Retrieve single cell profiles archived on Figshare |
138-
| 1 | Generate profiles | Generate and process cell painting and cell health assay readouts |
139-
| 2 | Determine replicate reproducibility | Determine the extent to which the CRISPR perturbations result in reproducible signatures |
140-
| 3 | Train machine learning models to predict cell health assays | Train and visualize regression models using cell painting data to predict cell health assay readouts |
141-
| 4 | Apply the models | Apply the trained models to the Drug Repurposing Hub data to predict drug perturbation effect |
142-
| 5 | Validate the models | Use orthogonal readouts to validate the Drug Repurposing Hub predictions |
138+
| [0.download-data](0.download-data/) | Download cell painting data | Retrieve single cell profiles archived on Figshare |
139+
| [1.generate-profiles](1.generate-profiles/) | Generate profiles | Generate and process cell painting and cell health assay readouts |
140+
| [2.replicate-reproducibility](2.replicate-reproducibility/) | Determine replicate reproducibility | Determine the extent to which the CRISPR perturbations result in reproducible signatures |
141+
| [3.train](3.train/) | Train machine learning models to predict cell health assays | Train and visualize regression models using cell painting data to predict cell health assay readouts |
142+
| [4.apply](4.apply/) | Apply the models | Apply the trained models to the Drug Repurposing Hub data to predict drug perturbation effect |
143+
| [5.validate-repurposing](5.validate-repurposing/) | Validate the models | Use orthogonal readouts to validate the Drug Repurposing Hub predictions |
144+
| [6.ml-robustness](6.ml-robustness) | Interrogate robustness of ML predictions | Assess sample size, feature groups, and cell line holdouts to probe ML robustness |
143145

144146
Each analysis module should be run in order.
145147
View each module for specific instructions on how to reproduce results.
@@ -189,7 +191,7 @@ However, there are many cell line specific differences.
189191
### Model Interpretation
190192

191193
Because we used a logistic regression classifier, we can readily interpret the output features.
192-
These features were derived from CellProfiler and represent different measurements of cell morphology
194+
These features were derived from CellProfiler and represent different measurements of cell morphology.
193195
Shown above is a summary of coefficients from all 70 cell health models.
194196
We observed that each contribute to classifying various facets of cell health.
195197
Many different categories of cell morphology features contribute to cell health predictions.
@@ -211,20 +213,22 @@ These data represent ~1,500 compound perturbations in ~6 dose points in A549 cel
211213
Collapsing the Drug Repurposing Hub Cell Painting data into UMAP coordinates, we observed many associated Cell Health predictions.
212214
For example, predicted G1 Cell Count and predicted ROS had clear gradients in UMAP space.
213215
However, there is not exactly a 1-1 relationship.
214-
The control proteasome inhibitors (DMSO and Bortezomib) are known to induce ROS, while PLK inhibitors are known to induce cell death by blocking mitosis entry.
216+
The proteasome inhibitors (DMSO and Bortezomib) are known to induce ROS, while PLK inhibitors are known to induce cell death by blocking mitosis entry.
215217
A single PLK inhibitor (HMN-214) showed a strong dose relationship with predicted G1 count.
216218

217219
![lincs](https://raw.githubusercontent.com/broadinstitute/cell-health/master/4.apply/figures/lincs_main_figure_4.png)
218220

219-
> Applying cell health models to Cell Painting data from The Drug Repurposing Hub.
220-
> (a) We apply a Uniform Manifold Approximation (UMAP) to Drug Repurposing Hub consensus profiles of 1,571 compounds across 6 doses.
221-
> The models were not trained using the Drug Repurposing Hub data.
222-
> The point color represents the output of the cell health model trained to predict the number of cells in G1 phase (G1 cell count).
223-
> (b) The same UMAP dimensions, but colored by the output of the Cell Health model trained to predict reactive oxygen species (ROS).
221+
> Validating Cell Health models to Cell Painting data from The Drug Repurposing Hub.
222+
> (a) The results of the dose alignment between the PRISM assay and the Drug Repurposing Hub data.
223+
> This view indicates that there was not a one-to-one matching between perturbation doses.
224+
> (b) Comparing viability estimates from the PRISM assay to the predicted number of live cells in the Drug Repurposing Hub.
225+
> The PRISM assay estimates viability by measuring barcoded A549 cells after an incubation period.
224226
> (c) Drug Repurposing Hub profiles stratified by G1 cell count and ROS predictions.
225227
> Bortezomib and MG-132 are proteasome inhibitors and are used as positive controls; DMSO is a negative control.
226228
> We also highlight all PLK inhibitors in the dataset.
227229
> (d) HMN-214 is an example of a PLK inhibitor that shows strong dose response for G1 cell count predictions.
230+
> (e) Tubulin and aurora kinase inhibitors are predicted to have high Number of gH2AX spots in G1 cells compared to other compounds and controls.
231+
> (f) Barasertib (AZD1152) is an aurora kinase inhibitor that is predicted to have a strong dose response for Number of gH2AX spots in G1 cells predictions.
228232
229233
### Drug Repurposing Hub: Exploratory Tool
230234

analysis-pipeline.sh

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,3 +31,8 @@ bash apply-models.sh
3131
cd ..
3232
cd 5.validate-repurposing
3333
bash validate-pipeline.sh
34+
35+
# Step 6 - Probe machine learning robustness
36+
cd ..
37+
cd 6.ml-robustness
38+
bash ml-robustness-pipeline.sh

0 commit comments

Comments
 (0)