Update README; add full description and example to prediction script headings

joshhjacobson · joshhjacobson · commit 168bf134ed0d · 2023-06-28T15:01:58.000+10:00
diff --git a/03_prediction/cokriging.py b/03_prediction/cokriging.py
@@ -1,5 +1,19 @@
-# Produce cokriging predictions at all land-based 0.05-degree grid cells in North American
-# domain for the specified month
+"""
+Produce cokriging predictions at all land-based 0.05-degree grid cells in the North American
+domain for a specified month.
+
+This script should be run from the command line as in the following example:
+```
+conda activate cosif
+cd 03_prediction
+python cokriging.py 202107
+```
+where the string `202107` indicates that predictions and prediction standard errors 
+will be produced for July 2021.
+
+NOTE: This is a long-running processes that can take several hours to one day of compute 
+time on a 64-core server.
+"""
 
 import sys
 
diff --git a/03_prediction/kriging.py b/03_prediction/kriging.py
@@ -1,5 +1,20 @@
-# Produce kriging predictions at all land-based 0.05-degree grid cells in North American
-# domain for the specified month
+"""
+Produce kriging predictions at all land-based 0.05-degree grid cells in the North American
+domain for a specified month.
+
+This script should be run from the command line as in the following example:
+```
+conda activate cosif
+cd 03_prediction
+python kriging.py 202107
+```
+where the string `202107` indicates that predictions and prediction standard errors 
+will be produced for July 2021.
+
+NOTE: This is a long-running processes that can take several hours to one day of compute 
+time on a 64-core server.
+"""
+
 
 import sys
 
diff --git a/README.md b/README.md
@@ -4,15 +4,13 @@
 
 This repository contains code to reproduce the results in the paper:
 
-> Jacobson, J., Cressie, N., Zammit-Mangion, A. (n.d.) Spatial statistical prediction of solar-induced chlorophyll fluorescence (SIF) from multivariate OCO-2 data. Under review in *Remote Sensing*.
+> Jacobson, J., Cressie, N., Zammit-Mangion, A. (n.d.) Spatial statistical prediction of solar-induced chlorophyll fluorescence (SIF) from multivariate OCO-2 data. Submitted to *Remote Sensing*.
 
 Unless stated otherwise, all commands are to be run in the root directory of the repository.
 
-The resulting *coSIF* data product for February, April, July, and October 2021 are available here: https://doi.org/10.5281/zenodo.8078592
+The resulting *coSIF* data product for February, April, July, and October 2021 is available at: https://doi.org/10.5281/zenodo.8078592
 
-A supplementary dataset of all fitted model parameters is available here: https://doi.org/10.5281/zenodo.8078560
-
-TODO: Add graphical abstract here.
+A supplementary dataset of all fitted model parameters is available at: https://doi.org/10.5281/zenodo.8078560
 
 ## Installation and setup
 
@@ -62,23 +60,23 @@ The Terra and Aqua combined Moderate Resolution Imaging Spectroradiometer (MODIS
 
 In an initial exploratory data analysis (EDA) step, we create a bivariate time series (Figure 1) from monthly, gridded SIF and XCO2 data. This analysis is isolated in the directory `00_eda`. Note that all of the version 10r Lite files are needed for this step (see above).
 
-There are four main steps in our multivariate spatial-statistical-prediction framework, corresponding to  four numbered directories. These are: 
+There are four main steps in our multivariate spatial-statistical-prediction framework, corresponding to four numbered directories. These are: 
 
 1. `01_data_preparation`: Numbered files are to be run in order. Notebooks create the land-cover binary mask; collect and format all daily OCO-2 Lite files into a single NetCDF file for daily, spatially irregular SIF and a single NetCDF file for daily, spatially irregular XCO2; group SIF and XCO2 datasets by month and compute an average for each 0.05-degree CMG grid cell; an R script evaluates bisquare basis functions for all CMG grid cells; a final notebook combines gridded SIF, XCO2, and basis-function datasets into a single NetCDF file. 
-2. `02_modeling`: For each of February, April, July, and October 2021, notebooks compute empirical (cross-) semivariograms from the gridded SIF and XCO2 data (one month later), and fit modeled (cross-) semivariograms.
-3. `03_prediction`: Scripts for producing the coSIF data product in a specified month. For example, if using cokriging, run
+2. `02_modeling`: For each of February, April, July, and October 2021, notebooks compute empirical (cross-) semivariograms from the gridded SIF and XCO2 data (one month later), and fit modeled (cross-) semivariograms. Each notebook takes around 10 minutes to run, and can be run on a laptop.
+3. `03_prediction`: For each of February, April, July, and October 2021, the predictions and prediction standard errors required for the coSIF data product are produced by running either `cokriging.py` or `kriging.py` from the command line and specifying the year-month string as an argument. For example, to use cokriging in July 2021, run
     ```
     conda activate cosif
     cd 03_prediction
     python cokriging.py 202107
     ```
-    or, if using kriging, run
+    Or, to use kriging in October 2021, run
     ```
     conda activate cosif
     cd 03_prediction
-    python kriging.py 202102
+    python kriging.py 202110
     ```
-    Note that these are long-running processes; it is advised that they be executed in a [screen session](https://linuxize.com/post/how-to-use-linux-screen/) to avoid issues with interruption.
-4. `04_validation`: For each of February, April, July, and October 2021, one notebook produces validation predictions for the Corn Belt validation block (b1) and one notebook produces validation predictions for the Cropland validation block (b2). Metrics and scores used to summarize the validation predictions are then collected in `collect_validation_results.ipynb`.
+    Note that these are long-running processes that can take several hours to one day of compute time on a 64-core server. It is advised that they be executed in a [screen session](https://linuxize.com/post/how-to-use-linux-screen/) to avoid issues with interruption. Once the predictions and prediction standard errors have been produced for each month, the coSIF data product is collected and formatted in `collect_coSIF_datasets.ipynb`.
+4. `04_validation`: For each of February, April, July, and October 2021, one notebook produces validation predictions for the Corn Belt validation block (b1) and one notebook produces validation predictions for the Cropland validation block (b2). Each notebook can take around 30 minutes to run on a 64-core server. Metrics used to summarize the validation predictions are then collected in `collect_validation_results.ipynb`.
 
 NOTE: ensure that all notebooks are run using the `cosif` conda environment.