Skip to content

Commit 1c2e38f

Browse files
authored
Update profile_index.csv with integrated profiles (#127)
* Update docs * Add ALL * Fix urls * Update docs * Clarify profiles * update URL * Update README.md * Update README.md
1 parent 50cd2ab commit 1c2e38f

File tree

3 files changed

+12
-18
lines changed

3 files changed

+12
-18
lines changed

README.md

Lines changed: 8 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -10,18 +10,20 @@ All the data is hosted on the Cell Painting Gallery on the Registry of Open Data
1010

1111
## Details about the data
1212

13-
Currently, this collection comprises 4 datasets:
13+
This collection comprises 4 datasets:
1414

1515
- The principal dataset of 116k chemical and >15k genetic perturbations the partners created in tandem (`cpg0016`), split across 12 data-generating centers. Human U2OS osteosarcoma cells are used.
1616
- 3 pilot datasets created to test: different perturbation conditions (`cpg0000`, including different cell types), staining conditions (`cpg0001`), and microscopes (`cpg0002`).
1717

1818
### What’s available now
1919

20-
- All data [components](https://github.com/broadinstitute/cellpainting-gallery/blob/main/folder_structure.md) of the three pilots.
20+
- All data [components](https://github.com/broadinstitute/cellpainting-gallery/blob/main/documentation/data_structure.md) of the three pilots.
2121
- Most data components (images, raw CellProfiler output, single-cell profiles, aggregated CellProfiler profiles) from 12 sources for the principal dataset. Each source corresponds to a unique data generating center (except `source_7` and `source_13`, which were from the same center).
2222
- All key [metadata](metadata/README.md) files.
23-
- A [notebook](https://github.com/jump-cellpainting/datasets/blob/update-readme/sample_notebook.ipynb) to load and inspect the data currently available in the principal dataset.
24-
- A [tutorial](https://broadinstitute.github.io/2023_12_JUMP_data_only_vignettes/howto/tutorial_basic.html) to load the different subsets of data in the principal dataset, each available as a single dataframe. The URLs to the subsets are [here](https://github.com/jump-cellpainting/datasets/blob/main/profile_index.csv). The corresponding folders for each contain all the data levels (e.g. this [folder](https://cellpainting-gallery.s3.amazonaws.com/index.html#cpg0016-jump-assembled/source_all/workspace/profiles/jump-profiling-recipe_2024_a917fa7/ORF/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony/)). Snakemake workflows for producing these assembled profiles are available [here](https://github.com/broadinstitute/jump-profiling-recipe/releases/tag/v0.1.0).
23+
- A [notebook](https://github.com/jump-cellpainting/datasets/blob/main/sample_notebook.ipynb) to load and inspect the data currently available in the principal dataset.
24+
- Different subsets of data in the principal dataset, assembled into single parquet files. The URLs to the subsets are [here](https://github.com/jump-cellpainting/datasets/blob/main/manifests/profile_index.csv). The corresponding folders for each contain all the data levels (e.g. this [folder](https://cellpainting-gallery.s3.amazonaws.com/index.html#cpg0016-jump-assembled/source_all/workspace/profiles/jump-profiling-recipe_2024_a917fa7/ORF/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony/)). Snakemake workflows for producing these assembled profiles are available [here](https://github.com/broadinstitute/jump-profiling-recipe/releases/tag/v0.1.0). We recommend working with the the `all` or `all_interpretable` subsets -- they contain all three data modalities in single dataframe. Note that cross-modality matching is still poor (ORF-CRISPR, COMPOUND-CRISPR, COMPOUND-ORF), but within modality generally works well.
25+
- A [tutorial](https://broadinstitute.github.io/2023_12_JUMP_data_only_vignettes/howto/1_retrieve_profiles.html) to load these subsets of data.
26+
- Other [tutorials](https://broad.io/jump) to work with `cpg0016`.
2527

2628
### What’s coming up
2729

@@ -32,19 +34,9 @@ Currently, this collection comprises 4 datasets:
3234

3335
## How to load the data: notebooks and folder structure
3436

35-
See the [sample notebook](sample_notebook.ipynb) to learn more about how to load the data in the principal dataset.
37+
This new resource <https://broad.io/jump> include vignettes demonstrating how to work with JUMP data.
3638

37-
To get set up to run the notebook, first install the python dependencies and activate the virtual environment
38-
39-
```bash
40-
# install pipenv if you don't have it already https://pipenv.pypa.io/en/latest/#install-pipenv-today
41-
pipenv install
42-
pipenv shell
43-
```
44-
45-
See the typical [folder structure](https://github.com/broadinstitute/cellpainting-gallery/blob/main/folder_structure.md) for datasets in the Cell Painting Gallery.
46-
47-
This new resource <https://broad.io/jump> will include vignettes demonstrating how to work with JUMP data. Currently, it contains one [tutorial](https://broadinstitute.github.io/2023_12_JUMP_data_only_vignettes/howto/tutorial_basic.html) which demonstrates how to load the different subsets of data within `cpg0016`.
39+
See the typical [folder structure](https://github.com/broadinstitute/cellpainting-gallery/blob/main/documentation/data_structure.md) for datasets in the Cell Painting Gallery.
4840

4941
## Citation/license
5042

manifests/profile_index.csv

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,3 +5,5 @@
55
"orf_interpretable","https://cellpainting-gallery.s3.amazonaws.com/cpg0016-jump-assembled/source_all/workspace/profiles/jump-profiling-recipe_2024_a917fa7/ORF/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony/profiles_wellpos_cc_var_mad_outlier.parquet","97b0c31d7d678ca2a5e2353df5799fd8-217"
66
"crispr_interpretable","https://cellpainting-gallery.s3.amazonaws.com/cpg0016-jump-assembled/source_all/workspace/profiles/jump-profiling-recipe_2024_a917fa7/CRISPR/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony_PCA_corrected/profiles_wellpos_cc_var_mad_outlier.parquet","90b08b824c06bcf16dfc5e788e74f099-135"
77
"compound_interpretable","https://cellpainting-gallery.s3.amazonaws.com/cpg0016-jump-assembled/source_all/workspace/profiles/jump-profiling-recipe_2024_a917fa7/COMPOUND/profiles_var_mad_int_featselect_harmony/profiles_var_mad_int.parquet","b638fa24310db569bc869af92e16f69c-1444"
8+
"all","https://cellpainting-gallery.s3.amazonaws.com/cpg0016-jump-assembled/source_all/workspace/profiles/jump-profiling-recipe_2024_0224e0f/ALL/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony.parquet","71d03c195e41739af0f1ba64b4f6be73-324"
9+
"all_interpretable","https://cellpainting-gallery.s3.amazonaws.com/cpg0016-jump-assembled/source_all/workspace/profiles/jump-profiling-recipe_2024_0224e0f/ALL/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony/profiles_wellpos_cc_var_mad_outlier_featselect.parquet","023d74cbf007bb6d837724ac8aa78fb4-324"

manifests/src/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,10 +13,10 @@ If necessary, update the associated names for new dataset types.
1313
After updating a URL, the ETag (provided by S3) will no longer match. To update the ETags, run the following command from the home folder:
1414

1515
```bash
16-
bash manifests/src/update_etags.sh | sponge > profile_index.csv
16+
bash manifests/src/update_etags.sh manifests/profile_index.csv| sponge manifests/profile_index.csv
1717
```
1818

19-
Note: If using Nix, all dependencies are already included in the flake at the root folder. Simply run `nix develop` before the above command.
19+
Note: If using Nix, all dependencies are already included in the flake at the root folder. Simply run `nix develop --extra-experimental-features nix-command --extra-experimental-features flakes` before the above command.
2020

2121
## Commit changes
2222

0 commit comments

Comments
 (0)