Provide instructions on how to run the JUMP-specific profiling recipe

We should provide instructions on how to run the JUMP-specific profiling recipe. This might need to be done elsewhere, not in this repo.

It's possible what we have in "Instructions provided to JUMP partners" below has all the information we need but it still needs to be written up somewhere. 

We can then have someone test-drive the instructions. The goal will be to recreate everything downstream of [Chapter 5.3](https://cytomining.github.io/profiling-handbook/05-create-profiles.html#create-database-backend) in the profiling handbook, for a Cellpainting Gallery dataset (e.g., one plate of `cpg0012`). 

Some notes on our past discussions are below.

---

**Shantanu Singh**
  [2 months ago](https://broadinstitute.slack.com/archives/C01AF25CQLT/p1666224515729069?thread_ts=1666197636.830589&cid=C01AF25CQLT)
Thanks for clarifying  – I hadn’t looked carefully at the difference between the JUMP [instructions](https://github.com/jump-cellpainting/develop-computational-pipeline/issues/52#issue-1026707736)  (Step 3 onwards; this is copied below at the end of this issue comment)  and the recipe [README](https://github.com/cytomining/profiling-recipe#readme).
So looks like the only difference is
1. JUMP instructions specify which commit of the recipe to use, but the recipe README does not specify it (in fact, even if we wanted to do so, the right place to do it would be in the profiling-template README, right?
2. JUMP instructions specify what changes to make to the config.yml, but the recipe README only says that changes to config.yml can be made (“All the necessary changes to the config file must be made before the pipeline can be run.“)
Neither are differences in the workflow per se – the first specifies which commit to use, the second specifies what config to use.
Is that correct? If so, we are all set there.

Two more questions
1. is it correct that the recipe – in its current form – does not attempt to do anything upstream of annotate? It’s pretty clear in the [README](https://github.com/cytomining/profiling-recipe#downloading-the-data) (“Downloading the data”) but I wanted to doublecheck.
2. both, the recipe [README](https://github.com/cytomining/profiling-recipe/blob/master/README.md) as well as the [handbook](https://cytomining.github.io/profiling-handbook/05-create-profiles.html#make-profiles) specify step-by-step instructions for running the recipe; would it be sensible to have the instructions only in of the two locations? If so, where should they live? I think the handbook lends itself more naturally

**Niranj Chandrasekaran**
  [2 months ago](https://broadinstitute.slack.com/archives/C01AF25CQLT/p1666267827475639?thread_ts=1666197636.830589&cid=C01AF25CQLT)
> _JUMP instructions specify which commit of the recipe to use, but the recipe README does not specify it (in fact, even if we wanted to do so, the right place to do it would be in the profiling-template README._

That’s right. Currently the instructions say that we add the recipe as a submodule. We should just add another line to checkout a particular commit if we want everyone to use a specific version of the recipe.

 > _JUMP instructions specify what changes to make to the config.yml, but the recipe README only says that changes to config.yml can be made (“All the necessary changes to the config file must be made before the pipeline can be run.“)_

I guess the instructions will be dataset/project specific. Perhaps a more general version of the JUMP instructions can be added to the recipe README as recommended changes to config.yml.

>  _is it correct that the recipe – in its current form – does not attempt to do anything upstream of annotate? It’s pretty clear in the [README](https://github.com/cytomining/profiling-recipe#downloading-the-data) (“Downloading the data”) but I wanted to doublecheck._

The recipe can aggregate, given a sqlite file. But it doesn’t do it in parallel, which we may want to do for large projects. But for small projects with only a few plates, the recipe can be used for aggregation (for example - https://github.com/jump-cellpainting/pilot-cpjump1-fov-data)

>  _both, the recipe [README](https://github.com/cytomining/profiling-recipe/blob/master/README.md) as well as the [handbook](https://cytomining.github.io/profiling-handbook/05-create-profiles.html#make-profiles) specify step-by-step instructions for running the recipe; would it be sensible to have the instructions only in of the two locations? If so, where should they live? I think the handbook lends itself more naturally

I initially wanted the handbook to be the go-to location for running the recipe. But there was a lot of documentation for the recipe that didn’t fit well in the handbook. Hence I started writing the README. But, I also think that the handbook should be the location for getting the step by step instructions and the README can remain as the place for getting additional information about the recipe. (edited)




---

## Instructions provided to JUMP partners

(Copied from https://github.com/jump-cellpainting/develop-computational-pipeline/issues/52#issue-1026707736)


### Step 1: Image to single cell csv
Using the [pipelines](https://github.com/broadinstitute/imaging-platform-pipelines/tree/master/JUMP_production#1-feature-extraction-pipelines) and the instructions until step 5.2 of the [profiling handbook](https://cytomining.github.io/profiling-handbook/), generate the single cell csv files.

### Step 2: Single cell csv to well level aggregated profiles

In step 5.3, before running `collate_cmd.py`, checkout the commit that contains the updated `collate.py` code. In the first code block, after `cd pycytominer`,  do this:

```bash
git pull
git checkout jump
git checkout b4d32d39534c949ad5165f0b98b79537c2a7ca58
```

Notes:

1. When running `collate_cmd.py`, use the flag `--image-feature-categories="Intensity,ImageQuality,Granularity,Texture,Count,Threshold"`
2. If you have previously run `collate_cmd.py`, please rerun it so that the whole-image features in the `.sqlite` file are added to the aggregated profiles. Don't forget to 
   - use the `--image-feature-categories` flag mentioned in 1. 
   - use the `--aggregate-only` to skip re-creating the  `.sqlite` files
   - optionally, use the `--overwrite` flag (and do not use the `--aggregate-only` flag) if you do want to recreate the `.sqlite` files, but typically no need to do so unless something went wrong in the creation of `.sqlite` files

Note: The above instructions were updated after the discussion [here](https://broadinstitute.slack.com/archives/C033SCB245P/p1666126543636439?thread_ts=1666125021.122589&cid=C033SCB245P) (Broad internal slack) and [here](https://github.com/jump-cellpainting/aws/issues/71#issuecomment-1317550239) and [here](https://github.com/jump-cellpainting/aws/issues/71).

### Step 3: Aggregated profiles to annotated, normalized, feature selected profiles
After running collate.py, switch over to the [instructions in the profiling-recipe repo](https://github.com/cytomining/profiling-recipe/tree/745d7627213acd9d376172e5ac716a5d4c07fbec#readme). These instructions are similar to the ones in the workflow demo but with additional details.

Before [running the profiling pipeline](https://github.com/cytomining/profiling-recipe/tree/745d7627213acd9d376172e5ac716a5d4c07fbec#running-the-pipeline), issue the following commands to make sure the correct version of the profiling-recipe is used by everyone

```bash
cd profiling-recipe
git pull
git checkout 745d7627213acd9d376172e5ac716a5d4c07fbec
cd ~/work/projects/${PROJECT_NAME}/workspace/software/${DATA}/
```

Note: we had previously specified using `3584ceca79e83065c72a7acb021d360026ace2a2`. This still works. However, we now specify using `745d7627213acd9d376172e5ac716a5d4c07fbec` (because we are now able to specify the MAD Robustize fudge factor in `pycytominer`).

Then the following changes should be made to the `config.yml` for generating the profiles.

1. Give the pipeline a name.
2. Aggregation: Set `perform` under `aggregate` to `false` as aggregation will be performed while running `collate_cmd.py`
4. Annotation: Provide the name of the external metadata file, if it exists.
    - If you do have an external_metadata.csv, set perform under external to true and specify the name of the external metadata file)
    - If you do not have an external metadata file because all the metadata is already included in your platemap files, then set perform under external to false.
5. In the platemap.txt file, use the JCP ID as the perturbation identifier. Name this column jump-identifier. If perform under external to true, make sure to set merge_column under annotate to jump-identifier.
6. Normalization and feature selection: Since the code needs to know which wells contain controls, add two columns to your `platemap.txt` file: 
    (1) `pert_type` which should say `trt` for treatment wells and `control` for control wells
    (2) `control_type` which should be left empty for treatment wells, and say `negcon` for DMSO wells and `poscon` for positive control wells.
7. Provide batch names and plate names.

General instructions:
- To keep the config files easy to read, it is ok to have a different config file for each batch.
- The metadata and plate map files for Target-2 plates are available - https://github.com/jump-cellpainting/JUMP-Target
- You may want to have (though not strictly necessary) a different config file for the Target-2 plates and your assay plates, within each batch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Provide instructions on how to run the JUMP-specific profiling recipe #38

Instructions provided to JUMP partners

Step 1: Image to single cell csv

Step 2: Single cell csv to well level aggregated profiles

Step 3: Aggregated profiles to annotated, normalized, feature selected profiles

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Provide instructions on how to run the JUMP-specific profiling recipe #38

Description

Instructions provided to JUMP partners

Step 1: Image to single cell csv

Step 2: Single cell csv to well level aggregated profiles

Step 3: Aggregated profiles to annotated, normalized, feature selected profiles

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions