Skip to content

Commit 7906beb

Browse files
authored
Merge pull request #138 from shntnu/ss-readme-prune
Simplify README and point to JUMP Hub
2 parents 8c65506 + 62a506d commit 7906beb

File tree

7 files changed

+2
-3388
lines changed

7 files changed

+2
-3388
lines changed

Pipfile

Lines changed: 0 additions & 25 deletions
This file was deleted.

Pipfile.lock

Lines changed: 0 additions & 1685 deletions
This file was deleted.

README.md

Lines changed: 2 additions & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -2,64 +2,9 @@
22

33
[![DOI](https://zenodo.org/badge/552371375.svg)](https://zenodo.org/badge/latestdoi/552371375)
44

5-
This is a collection of [Cell Painting](https://jump-cellpainting.broadinstitute.org/cell-painting) image datasets generated by the [JUMP-Cell Painting Consortium](https://jump-cellpainting.broadinstitute.org/), funded in part by a grant from the Massachusetts Life Sciences Center.
5+
This is a collection of [Cell Painting](https://jump-cellpainting.broadinstitute.org/cell-painting) image datasets generated by the [JUMP-Cell Painting Consortium](https://jump-cellpainting.broadinstitute.org/).
66

7-
This repository contains notebooks and instructions to work with the datasets.
8-
9-
All the data is hosted on the Cell Painting Gallery on the Registry of Open Data on AWS ([https://registry.opendata.aws/cellpainting-gallery/](https://registry.opendata.aws/cellpainting-gallery/)). If you'd like to take a look at (a subset of) the data interactively, the [JUMP-CP Data Explorer](https://phenaid.ardigen.com/jumpcpexplorer/) by Ardigen and the [JUMP-CP Data Portal](https://www.springdiscovery.com/jump-cp) by Spring Discovery provide portals to do so.
10-
11-
## Details about the data
12-
13-
This collection comprises 4 datasets:
14-
15-
- The principal dataset of 116k chemical and >15k genetic perturbations the partners created in tandem (`cpg0016`), split across 12 data-generating centers. Human U2OS osteosarcoma cells are used.
16-
- 3 pilot datasets created to test: different perturbation conditions (`cpg0000`, including different cell types), staining conditions (`cpg0001`), and microscopes (`cpg0002`).
17-
18-
### What’s available now
19-
20-
- All data [components](https://github.com/broadinstitute/cellpainting-gallery/blob/main/documentation/data_structure.md) of the three pilots.
21-
- Most data components (images, raw CellProfiler output, single-cell profiles, aggregated CellProfiler profiles) from 12 sources for the principal dataset. Each source corresponds to a unique data generating center (except `source_7` and `source_13`, which were from the same center).
22-
- All key [metadata](metadata/README.md) files.
23-
- A [notebook](https://github.com/jump-cellpainting/datasets/blob/main/sample_notebook.ipynb) to load and inspect the data currently available in the principal dataset.
24-
- Different subsets of data in the principal dataset, assembled into single parquet files. The URLs to the subsets are [here](https://github.com/jump-cellpainting/datasets/blob/main/manifests/profile_index.csv). The corresponding folders for each contain all the data levels (e.g. this [folder](https://cellpainting-gallery.s3.amazonaws.com/index.html#cpg0016-jump-assembled/source_all/workspace/profiles/jump-profiling-recipe_2024_a917fa7/ORF/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony/)). Snakemake workflows for producing these assembled profiles are available [here](https://github.com/broadinstitute/jump-profiling-recipe/releases/tag/v0.1.0). We recommend working with the the `all` or `all_interpretable` subsets -- they contain all three data modalities in single dataframe. Note that cross-modality matching is still poor (ORF-CRISPR, COMPOUND-CRISPR, COMPOUND-ORF), but within modality generally works well.
25-
- A [tutorial](https://broadinstitute.github.io/2023_12_JUMP_data_only_vignettes/howto/1_retrieve_profiles.html) to load these subsets of data.
26-
- Other [tutorials](https://broad.io/jump) to work with `cpg0016`.
27-
- The datasets and their DOI can be found on this [Zenodo](https://zenodo.org/records/13892061/latest) record.
28-
- Multiple datasets of interest for JUMP are available on our [Zenodo](https://zenodo.org/communities/broad-imaging/records?q=&l=list&p=1&s=10&sort=newest) community.
29-
30-
### What’s coming up
31-
32-
- Extending the metadata and notebooks to the three pilots so that all these datasets can be quickly loaded together ([issue](https://github.com/jump-cellpainting/datasets-private/issues/93)).
33-
- Curated annotations for the compounds, obtained from [ChEMBL](https://www.ebi.ac.uk/chembl/) and other sources ([issue](https://github.com/jump-cellpainting/datasets-private/issues/78)).
34-
- Deep learning [embeddings](https://tfhub.dev/google/imagenet/efficientnet_v2_imagenet1k_s/feature_vector/2) using a pre-trained neural network for all 4 datasets ([issue](https://github.com/jump-cellpainting/datasets-private/issues/50)).
35-
- Methods and tools to simplify access to the data/metadata ([`cpgdata`](https://github.com/broadinstitute/cpg/tree/main/cpgdata), [`jump-portraits`](https://github.com/broadinstitute/monorepo/tree/main/libs/jump_portrait), [`jump-babel`](https://github.com/broadinstitute/monorepo/tree/main/libs/jump_babel)).
36-
37-
## How to load the data: notebooks and folder structure
38-
39-
This new resource <https://broad.io/jump> include vignettes demonstrating how to work with JUMP data.
40-
41-
See the typical [folder structure](https://github.com/broadinstitute/cellpainting-gallery/blob/main/documentation/data_structure.md) for datasets in the Cell Painting Gallery.
42-
43-
## Citation/license
44-
45-
### Citing the JUMP resource as a whole
46-
47-
All the data is released with CC0 1.0 Universal (CC0 1.0).
48-
Still, professional ethics require that you cite the associated publication.
49-
Please use the following format to cite this resource as a whole:
50-
51-
> _We used the JUMP Cell Painting datasets (Chandrasekaran et al., 2023), available from the Cell Painting Gallery on the Registry of Open Data on AWS ([https://registry.opendata.aws/cellpainting-gallery/](https://registry.opendata.aws/cellpainting-gallery/))._
52-
>
53-
> _Chandrasekaran et al., 2023: doi:10.1101/2023.03.23.534023_
54-
55-
### Citing individual JUMP datasets
56-
57-
To cite individual JUMP Cell Painting datasets, please follow the guidelines in the Cell Painting Gallery citation [guide](https://github.com/broadinstitute/cellpainting-gallery/#citationlicense).
58-
Examples are as follows:
59-
60-
> _We used the dataset cpg0001 (Cimini et al., 2022), available from the Cell Painting Gallery on the Registry of Open Data on AWS (<https://registry.opendata.aws/cellpainting-gallery/>)._
61-
>
62-
> _We used the dataset cpg0000 (Chandrasekaran et al., 2022), available from the Cell Painting Gallery on the Registry of Open Data on AWS (<https://registry.opendata.aws/cellpainting-gallery/>)._
7+
Learn more on the [JUMP Hub](https://broad.io/jump)\!
638

649
## Gratitude
6510

@@ -68,9 +13,3 @@ Thanks to Consortium Partner scientists for creating this data, from Ksilink, Am
6813
Supporting Partners include Ardigen, Google Research, Nomic Bio, PerkinElmer, and Verily. Collaborators include the Pistoia Alliance, Umeå University, and the Stanford Machine Learning Group. The AWS Open Data Sponsorship Program is sponsoring data storage.
6914

7015
This work was funded by a major grant from the Massachusetts Life Sciences Center and the National Institutes of Health through MIRA R35 GM122547 to Anne Carpenter.
71-
72-
## Questions?
73-
74-
Please ask your questions via issues [https://github.com/jump-cellpainting/datasets/issues](https://github.com/jump-cellpainting/dataset/issues).
75-
76-
Keep posted on future data updates by subscribing to our email list, see the button here: <https://jump-cellpainting.broadinstitute.org/more-info>

0 commit comments

Comments
 (0)