Skip to content

cwhitlock-NOAA/section_8_tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

section_8_tools ReadMe

########OVERVIEW########

The section_8_tools are tools written to calculate statistics, create derivative products and
produce graphs from downscaled output produced by FUDGE. The tools are all designed to be called
from the command line with text string arguments, and output either netCDF files with CF-compiant
metadata or .svg image files.

########REQUIRMENTS########

R packages needed: 
- ncdf4, ncdf4.helpers
- caTools (for statistics)
- PCICt (for noleap timeseries)
- RColorBrewer (color palettes)
- getopt, optparse (flag input parsing)

System utils needed: 
- NetCDF (nco utils)
- R 2.15 or higher (Everything works in 3.2.1)

########USING THE SECTION 8 TOOLS########

##Obtaining files##
The section 8 tools can be obtained one of two ways: by using the git repo, or copying from 
/home/cew/Code/section_8_tools/ or a flash drive/zipped folder with the files contained within it. 
To obtain the files using a git repo, assuming that you have already have a ssh key for your profile, 
(https://docs.google.com/a/noaa.gov/document/d/1moPhPAHUlGv9CfZx_rjNjUEZCr98KVRvZpG9qTBcb6E/edit) 
simply naviagate to the directory into which you want to place the section 8 tools, and type:

>git clone git@gitlab.gfdl.noaa.gov:carolyn.whitlock/section_8_tools.git
or
>git clone git@gitlab.gfdl.noaa.gov:carolyn.whitlock/section_8_tools.git directory_of_your_choice
where directory_of_your_choice is the diectory into which you want to clone the section_8_tools.

##User settings##
In either case, you will need to adjust the reference paths located in the setenv_s8 file so that
they point to the current location of the section_8_tools folder on your computer. 

Once the file is edited, source the setenv_s8 file: 
> source setenv_s8

This sets the S8_PATH variable, which the tools in section 8 use to reference the directory
in which the scripts are located. Below is a brief summary of the section 8 tools currently 
available. More documentation on each can be found in the method-specific ReadMe files. 

Please note that two tools, AvgEns.R and TrimTimeAxis.R, rely on a temporary directory to 
carry out operations. Either the environment variable S8TMPDIR needs to be set, or the
--tmpdir command-line argument needs to be set. S8TMPDIR is set by default on the analysis
nodes to $TMPDIR; on the workstations, sourcing setenv_s8 gives a warning if S8TMPDIR
is not already set.

##Starting to use the tools##
There is a sample script called sample_c_shell_script.csh located in the section_8_tools directory
that shows a simple example of calling the tools from a c-shell script. For more information, please
consult the method specific documentation or talk to Carolyn Whitlock.

########OTHER DOCUMENTATION########

For more documentation (and method-specific documentation), please consult the following: 

-- The ESD team webpage (for general information)
-- The file var_name_table.ods for information about how variables and metadata are
translated from one application to the next (useful for chaining output together)
-- The file Argument_list, for a list of arguments that each of the section 8 tools
take
-- The file sample_c_shell_script.csh for sample invocations of the tools from the
command line
-- Any file in this directory marked as section_8_toolname.txt (such as MakeFreqHist.txt)
-- Using the --help flag on a section 8 tool will give the arguments that it uses, and a 
brief description of each one.

Below is a general overview of the tools currently included for section 8:

##########SECTION 8 TOOLS##########

Statistics Calculation tools: 
  CalcTimeStats1.R
  CalcTimeStats2.R
  CalcTimeCorrel.R
  CalcAreaStats1.R
  CalcRRankOrder.R (UPDATED 4-27-16)
  CalcKSKuiper.R   (UPDATED 4-27-16)
  CalcTimeDiffStats1.R
  CalcDTR.R
  CalcDailyMeanTemp.R
  CalcTimeMeanSeries.R 
  
Derivative product tools:
  CalcClimdex.R 
  CalcCornDegDays.R 
  MakeSmoothClimo.R
  ExceedanceTester.R
  CSVBoxStatsScraper.py 
  ThresholdRelativeCounter.R (UPDATED 3-10-16)
  HeatThreshold.R (NEW 1-17-15)
  AvgEns.R (NEW 4-27-16; UPDATED 4-29-16)
  MakeAnyMask.R (NEW 4-27-16)
  CalcSimpleDegDay.R (NEW 7-27-16)
   
Data manipulation tools: 
  CalcDiff.R
  CalcAbsDiff.R
  AddAnom.R
  EditSpatialMask.R
  DetectMissvals.R
  ApplyPrecipThreshold.R 
  TrimTimeAxis.R
  ApplyAnyMask.R
  MakeDailyClimo.R (UPDATED 6-16-16)
  ExtendClimatology.R (NEW 6-16-16)
  MakeAnom.R (NEW 6-16-16)
  make_anom_and_climo.py  
  GroupBySeason.R (NEW 8-23-16)
  
Graphics tools: 
  MakeFreqHist.R
  MakeBoxplot.R

======STATISTICS CALCULATION TOOLS======


===CalcTimeStats1.R===

Given a file, computes simple summary statistics over the area of the file using either all
time axis and ensemble values or the values present in a compatible mask file passed in as an argument. 
Quantiles are calculated using the eighth method proposed in Hyndman et. al 1996
(Hyndman, R.J.; Fan, Y. (November 1996). "Sample Quantiles in Statistical Packages". 
American Statistician 50 (4): 361–365. doi:10.2307/2684934)

It differs from CalcTimeStats2 in the way that the time dimension for masked files is computed;
since CalcTimeStats1 calculates a climatological time series (either annual or monthly), it sets
the values of the time dimension at the midpoints of the mask (or the midpoint of a 365-day
timeseries), and calculates the bounds as the limits of the mask. This means that the statistics
computed can be readily weighted for reconstructing an original time series.

Preferred use case: Either no mask, seasonal mask, or monthly masks without overlap, covering the entire
time series, in a Julian or noleap calendar; any work for which having an output time series
rather than an index would be important.

Input: Rscript CalcTimeStats1.R -i input.nc -v var -O output [--tmasks tmasks] [--calc_both] [--verbose] [-h --help]

Examples:
Rscript /home/cew/Code/section_8_tools//CalcTimeStats1.R --input /work/cew/testing//diff_file.nc --var tasmaxDiff 
--out_base /work/cew/testing//stat_file

Outputs: If no time mask is provided, writes a file of the form output_ann.nc, with spatial coordinates cloned
from the input file and time coordinates cloned from a reference file starting at year 0. If a time mask is 
provided, writes a file of the form output_freq.nc, where $freq is the frequency of the time mask. If the --calc_both
flag is used with the tmask, both outputs are calculated.

Features:
- update 4-3 to include skewness and kurtosis
- update 4-7 for command line flag parsing
- update 5-18 for --out_base and -O input rather than --output
- update 6-18 to use method 8 for calculating quantiles rather than method 7 (the default)
- Update 6-22 to point to CalcTimeStats2 when input masks are of an incorrect frequency
- Update 6-25 to add --calc_both option and turn off mandatory calculation of 0mask output
- Update 9-18 to add a seasonal climatological timeseries and change name of modulo attributes
- Update 12-14 to add a new output variable, sample_size, with number of t, ens points used to calculate
- Update 1-28 to include a new option, --prof, for returning timing and memory usage info
- Update 4-25 to include general handling of data without a time axis, where appropriate

Known issues:
- Currently only capable of using a time mask with 12 steps (i.e. monthly)
--- fixed in the 9-18 update; checks for monthly, annual or seasonal mask
- Obtains frequency of time masks from the time mask names (i.e. _bymonth_, _byseason_)
-- because of this, will accept _pm2weeks_ masks as a vaid argument and return data with a time series that is not accurate 
(improper bounds dimensions on climatological time)
-- Fixed as of 9-18, checks for the presence of the "olap.nc" string in the filename, and exits if found; looks like it was
   fixed earlier but not recorded

===CalcTimeStats2.R===

Given a file, computes simple summary statistics over the area of the file using either all
time axis and ensemble values or the values present in any mask file passed in as an argument. 
Quantiles are calculated using the eigh method proposed in Hyndman et. al 1996
(Hyndman, R.J.; Fan, Y. (November 1996). "Sample Quantiles in Statistical Packages". 
American Statistician 50 (4): 361–365. doi:10.2307/2684934)

This was written to address issues with the assumption made by CalcTimeStats1.R that all time 
masks will have the same values over an annual period, and thus can use a climatological time 
series. Thus, CalcTimeStats2 creates an index for the time dimension based on the number of 
masks passed in (1 in the case that no mask is passed in), and sets its time bounds as 0.5, 
1.5, etc. This passes over to the construction of the filenames as well - 0mask refers to no 
mask being passed in, while 1mask, 2mask, 12mask, would refer to passing in 1, 2, or 12 masks 
covering any part of the time series.

Will write the same number of ouput ensembles as the input data.

Preferred use case: Any mask that does not cover the entire time series without overlap, any mask that
is not of monthly frequency.

Input: Rscript CalcTimeStats2.R --input input.nc --var var [--tmasks time_mask.nc] --out_base output [--calc_both]

Examples:
Rscript /home/cew/Code/section_8_tools//CalcTimeStats2.R -i /work/cew/testing//diff_file.nc -v tasmaxDiff 
-O /work/cew/testing//stat_file

Outputs: If no time mask is provided, writes a file of the form output_0.nc, with spatial 
coordiantes cloned from the input file and no time coordinates added. If a time mask is provided, writes a file of the form 
output_Nmasks.nc, where $Nmasks is the number of masks used in the time mask file. If the --calc_both
option is used in conjunction with the time masks, returns both the masked and unmasked output.

Features:
- update 4-7 for command line flag parsing
- update 5-18 for --out_base and -O input rather than --output
- Update 6-18 to use method 8 for calculating quantiles rather than method 7
- Update 6-25 to add --calc_both option and turn off mandatory calculation of 0mask output
- Update 12-14 to add a new output variable, sample_size, with number of t, ens points used to calculate
- Update 1-28 to include a new option, --prof, for returning timing and memory usage info
- Update 4-19 to handle stats output for data not containing a time axis

Known issues:


===CalcTimeCorrel.R===

Given two input files and two input variables, calculates the correlation along 
the timeseries and ensemble dimensions at each x,y point with the Pearson and 
Spearman methods (i.e., it pools ensemble and time dimensions at each x,y point). 

Input: Rscript CalcTimeCorrel.R -i "indir/file1.nc, indir/file2.nc" -v "var1,var2" -o out_file.nc

Examples:Rscript /home/cew/Code/section_8_tools//CalcCorrelations.R -i "/net2/kd/PROJECTS/DOWNSCALING/SEC8/sampledata//tasmax_day_PMtxp1-BCQM-A01r13X01K00_sst2090_r3i1p1_US48_20860101-20951231.nc, /work/cew/testing//diff_file.nc" -v "tasmax, tasmaxDiff" 
-o  /work/cew/testing//clim_file.nc

Outputs: A file out_file.nc, containing an annual climatological series with x and y corrdinates cloned
from the input files, with variables of the general form $var1Cor$method or $var1_$var2Cor$method 
(i.e tasmaxCorPearson, tasmax_tasminCorPearson)

Features:
- Updated 4-7 to include support for command-line flags
- Updated 4-19 to include support for datasets without a time axis (and presumably, more than one ensemble member)

Known issues: Kendall method dropped due to taking three orders of magnitude
longer than other methods on US48 files

===CalcAreaStats1.R===

Given a single input file and variable, calculates the min and max across the area of the
data for each ensemble member and at each time level, and calculates an area based on a weighting
specified as an argument (one of 'cosine' or 'none').

Input: Rscript CalcAreaStats1.R indir/file1.nc var1 out_file.nc
       Rscript CalcAreaStats1.R -i indir/file1.nc -v var1 -o out_file.nc --spatmask maskdir/spatial_mask.nc --area_weight 'cosine'

Examples: Rscript /home/cew/Code/section_8_tools//CalcAreaStats1.R -i /net2/kd/PROJECTS/DOWNSCALING/SEC8/sampledata//tasmax_day_PMtxp1-BCQM-A01r13X01K00_sst2090_r3i1p1_US48_20860101-20951231.nc -v tasmax -o /work/cew/testing/area_stats_out_file.nc --area_weight 'none'

Outputs: A file out_file.nc, containing a time series and emsemble dimension cloned from the input file, 
with variables of the general form $var1_AreaStat (i.e tasmax_AreaMin)

Features:

Known issues: 
- Mask weighting of input data not yet implemented; currently throws an error and exits.
- Output parsing plus the file redirect leads to incorrect status returned when job fails (should be 0, returns 1)
--- fixed 4-7-15

Updates: 
- Update 1-28 to include a new option, --prof, for returning timing and memory usage info
- Update 4-19 to include support for data not containing a time axis

===CalcRRankOrder.R===

Given two input files and two input variables, caluclates the Robust Rank Order statistic (Lanzante 1996) along the
time axis for each x,y point, averaging across ensemble members present, to determine whether the two datasets
have significantly different medians. If there are two time masks provided, also computes the robust rank order 
statistic for corresponding mask files as calulated under each time mask applied, assuming that
both time masks files have the same number of masks present in each.

If more than one ensemble of input data is present, then calculations will either
be carried out for each individual ensemble or all ensembles will be pooled along
the time axis before submitting to the Fortran code.

Input: Rscript CalcRRankOrder.R -i 'indir/file1.nc, indir/file2.nc' -v 'var1,var2' -O out_file [--tmasks tmask_1.nc[,tmask_2.nc] ] [--spatmask spatmask(s)] [--calc_both] [--no_lag_correlate] [--pool_ens] [--verbose] [-h --help]

Examples: Rscript /home/cew/Code/section_8_tools//CalcRRankOrder.R -i '/work/cew/testing//diff_file.nc, /work/cew/testing//abs_diff_file.nc' 
-v 'tasmaxDiff,tasmaxAbsDiff' --out_base /work/cew/testing//r_rankord_file 
--tmasks '/archive/esd/PROJECTS/DOWNSCALING/MASKS/timemasks/maskdays_bymonth_20860101-20951231.nc, /archive/esd/PROJECTS/DOWNSCALING/MASKS/timemasks/maskdays_bymonth_20860101-20951231.nc'

Outputs: Outputs a file out_file_maskN.nc, where N is the number of masks present in the mask file, with each 
level of the time series corresponding to the prob and zval at each x,y point for every point for which the time 
mask was true. If calculations were carried out for individual ensembles, the output
will have the same number of ensemble files as the input; if not (--pool_ens) there 
will be no ensemble data. It contains two variables p_rro and z_rro, corresponding to the probability (ranging from 0 to 1) 
that represents the significance level for testing the null hypothesis that the two samples have equal medians, 
and the normally-distributed z-statisic calculated along the time series at a given x,y point. Positive values of 
the z-statistic indicate that the 1st sample has a larger median; negative values indicate that the second 
sample's median was larger. If a time mask was included, also ouputs a file out_file_maskN.nc, where N is the 
number of masks present in the mask file, with each level of the time series corresponding to the prob and zval at
 each x,y point for every point for which the time mask was true.

Features: 
- If only one var is provided, assumes that the var is present in each file and searches accordingly.
- If the --calc_both option is set to TRUE, then if a mask is provided, both the masksed and unmasked
datasts as calculated
- If the --pool_ens option is set, then ensemble memebrs will be pooled lengthwise along
the time axis. The effect on lag correlation should be minimal (no time masks) to
nonexistant (non-overlapping time masks are used).

Further documentation: The Readme for the Fortran code is located at src/Readme_rnkord and has been updated as
of 8-17-15.

Known issues: 
-- Calculation with time masks takes significantly longer than other functions
-- Does not print statuses from fortan code to standard out
-- Unresolved issues associated with standard out and starting a new process in R
-- Behavior around errors needs to be discussed
---- If the distributions in the input files do not overlap each other at all, the fortran code returns 
infentessimally small values for 0. Since this behavior can pop up fairly frequenty for small 
time masks in the southern parts of the  continental US48 grid during summer (and indeed, 
did so in the example used to test the code), this behavior is now checked for, and examples area tallied
after pointing to the location of a more detailed error message. The warnings are not fatal (output should
still be produced), and can be safely ignored.


Updates:
- update 5-18 to use the --out_base -O output flag
- update 6-24 to turn of mandatory calculation of the 0mask output
- update 6-25 to use the --calc_both flag and create the --spatmask option
- update 7-2 to make non-overlapping distribution behavior more clear
- Update 7-7 to correct behavior of cloning lon and lat bnds
- Update 8-17 to add lag correlation and --no_lag_correlate option to turn on or off; 
option defaults to FALSE, which enables calculation of lag correlation; minor metadata 
updates to add new option
- Update 1-28 to include a new option, --prof, for returning timing and memory usage info
- Update 4-19 to support data without a time axis
- Updated 4-21 to include the ---pool_ens flag and avoid averaging ensembles before calculations


===CalcKSKuiper.R===

Given two input files and one or two input variables, caluclates the Kologorov-Smirnov Statistic as well as Kuiper's test along the
time axis for each x,y point, averaging across ensemble members present, to determine whether the two samples at that point are
significantly different. If one variable is proveded, it searches for the same variable in each file; if two are provided, 
searches for var1 in file1 and var2 in file2. If there are two time masks provided, also computes the KS statistic and Kuiper's 
test for corresponding mask files as calulated under each time mask applied, assuming that both time masks files have the same number 
of masks present in each. --ioptn is an argument for controlling how to use the internal lag correlations; it defaults to 
'Fisher-Z', the preferred option. For more information (and a description of all opts available), see Rscript CalcKSKuiper.R --help.

If more than one ensemble of input data is present, then calculations will either
be carried out for each individual ensemble or all ensembles will be pooled along
the time axis before submitting to the Fortran code.

Input: Rscript CalcKSKuiper.R -i 'indir/file1.nc, indir/file2.nc' -v 'var1' 
-O out_file [--tmasks 'tmask_1.nc[,tmask_2.nc]'] [--ioptn 'Fisher-Z'] [--calc_both] [--pool_ens]

Examples: Rscript /home/cew/Code/section_8_tools//CalcKSKuiper.R -i '/work/cew/testing//diff_file.nc, /work/cew/testing//abs_diff_file.nc' -v 'tasmaxDiff,tasmaxAbsDiff' --out_base /work/cew/testing//r_kskuiper_file --tmasks '/archive/esd/PROJECTS/DOWNSCALING/MASKS/timemasks/maskdays_bymonth_20860101-20951231.nc, /archive/esd/PROJECTS/DOWNSCALING/MASKS/timemasks/maskdays_bymonth_20860101-20951231.nc' --calc_both

Outputs: Outputs a file out_file_maskN.nc, where N is the number of masks in the input tmasks (0 if none are provided), with each level
ofthe time series containing four variables (dpks, dpku, dzks, dzku), corresponding to the probability that the samples
are unique and the z-value of the statistic calculated along the time series at a given x,y point at every point for which the time mask
was true. If calculations were carried out for individual ensembles, the output
will have the same number of ensemble files as the input; if not (--pool_ens) there 
will be no ensemble data.

Features: 
- If only one var is provided, assumes that the var is present in each file and searches accordingly.
- If sample size < 5 at a given point, will return NaN. 
- If the --calc_both option is set to TRUE, then if a mask is provided, both the masksed and unmasked
datasts as calculated
- If pool_ens is set, calculations will be carried out over pooled time,ens data at each x,y point; 
if not set (the default), calculations will be carried out individually for each ensemble.

Further documentation: The Readme for the Fortran code is located at src/Readme_ksku. It has been updated as of 8-17-15. 

Known issues: 
-- If errors are encountered in the Fortan code, R session experiences a fatal error
-- If the samples passed into lag correlation are idential (occurs in some cases over open ocean with monthly time masks), then the correlation returns NA and the fortran code wrapper returns NaN for that point.

Updates:
- update 5-18 to use the --out_base -O output flag
- Update 6-25 to add --calc_both option and turn off mandatory calculation of 0mask output
- Uodate 7-23 to add better documentation and modify print statments
- Update 8-17 to add new Readme_ksku to better document the p-values present
- Update 1-28 to include a new option, --prof, for returning timing and memory usage info
- Update 4-19 to support data without a time axis
- Updated 4-21 to include the --pool_ens flag and avoid averaging ensembles before calculations

===CalcTimeDiffStats1.R===

Given two files of the same dimensions, calculates the Mean Absolute Error, Root Mean Square Error, 
Bias and Standard Error by pooling the time and ensemble data at each i,j point. 
As time goes on, this may include other difference statistics as well. 
If --calc_both is enabled and a time mask file is used, produces two files: one with the unmasked 
data, and one with data for each mask present in the mask file; if no mask file is specified only 
the unmasked data is written, and if calc_both is not used with the masked data, only the masked
data is written. If more than one ensemble of data is present in the input dataset, 
then the output dataset will also contain more than one ensemble.

Input: /home/cew/Code/section_8_tools/CalcTimeDiffStats1.R -i 'input1.nc,input2.nc' -v 'var1,var2' 
-O out_base [--tmasks tmasks] [--calc_both] [--no_error_correct] [--verbose] [-h --help]

Output: A file of the same spatial (x,y) dimensions as the input file, with either
a time axis of length 1 or a time axis of the same length as the number of masks, 
depending on the argument.

Example: 
Rscript /home/cew/Code/section_8_tools/CalcTimeDiffStats1.R -i "/net2/kd/PROJECTS/DOWNSCALING/DATA/WORK_IN_PROGRESS/downscaled/GFDL-HIRAM-C360-COARSENED/sst2090/day/atmos/day/tasmax_day_CEWtxp1-BCQM-C01r1e3pm2X01K00_sst2090_r1to3i1p1_US48_20860101-20951231_ensmbl.nc, /archive/esd/PROJECTS/DOWNSCALING/GCM_DATA/NCPP/GFDL-HIRAM-C360/sst2090/day/atmos/day/r1to3i1p1/v20110601/tasmax/US48/tasmax_day_GFDL-HIRAM-C360_sst2090_r1to3i1p1_US48_20860101-20951231_ensmbl.nc" -v "tasmax" --out_base /work/cew/testing/test_timediffstats_out --verbose

Features: 
- If one var is provided for the -v option, it will be searched for in both files
- Possible to turn off the correction for correlation in the standard error calculation by adding 
  the --no_error_correct option to call

Known issues: 
- Input data is required to have a time axis longer than 1 (or 0)

Updates: 
- Update 1-28 to include a new option, --prof, for returning timing and memory usage info

===CalcDTR.R===

Given a tasmax (maximum temperature) file and a tasmin (minimum temperature) file 
of the same dimenisons, calculates the diurnal temperature range for the data by
subtracting tasmin from tasmax at all i,j,ens,t points. Output data will have
the same dimensions (including ensemble) as input data.

Input: Rscript  /home/cew/Code/section_8_tools/CalcDTR.R --tx --tn -o output [--no_check] [--verbose] [-h --help]

Output: A file of the same x,y,ens and time dimensions as the input file, with 
the difference between tasmax and tasmin at every i,j,ens,t point

Example: 
Rscript /home/cew/Code/section_8_tools/CalcDTR.R --tx /archive/esd/PROJECTS/DOWNSCALING/GCM_DATA/NCPP/GFDL-HIRAM-C360/amip/day/atmos/day/r2i1p1/v20110601/tasmax/US48/tasmax_day_GFDL-HIRAM-C360_amip_r2i1p1_US48_19790101-20081231.nc --tn /archive/esd/PROJECTS/DOWNSCALING/GCM_DATA/NCPP/GFDL-HIRAM-C360/amip/day/atmos/day/r2i1p1/v20110601/tasmin/US48/tasmin_day_GFDL-HIRAM-C360_amip_r2i1p1_US48_19790101-20081231.nc --output /work/cew/testing/dtr_full_test1.nc --verbose

Features: 
- Checks to make sure that tasmax and tasmin were downscaled with the same method; 
this can be turned off with the --no_check option

Known issues: 
- Input data is required to have a time axis longer than 1

Updates: 
- cloning all global attrirbutes added 1-13-15
- Update 1-28 to include a new option, --prof, for returning timing and memory usage info

===CalcDailyMeanTemp.R===

Given a tasmax (maximum temperature) file and a tasmin (minimum temperature) file 
of the same dimenisons, calculates the daily mean for the data by
averaging tasmin and tasmin at all i,j,ens,t points. If the input data contained 
more than one ensemble, the output data will also contain more than one ensemble.  

Input: Rscript  /home/cew/Code/section_8_tools/CalcDailyMeanTemp.R --tx --tn -o output [--no_check] [--verbose] [-h --help]

Output: A file of the same x,y,ens,time dimensions as the input file, with the 
mean of tasmax and tasmin at each x,y,ens,t point.

Example: 
Rscript /home/cew/Code/section_8_tools/CalcDailyMeanTemp.R --tx /archive/esd/PROJECTS/DOWNSCALING/GCM_DATA/NCPP/GFDL-HIRAM-C360/amip/day/atmos/day/r2i1p1/v20110601/tasmax/US48/tasmax_day_GFDL-HIRAM-C360_amip_r2i1p1_US48_19790101-20081231.nc --tn /archive/esd/PROJECTS/DOWNSCALING/GCM_DATA/NCPP/GFDL-HIRAM-C360/amip/day/atmos/day/r2i1p1/v20110601/tasmin/US48/tasmin_day_GFDL-HIRAM-C360_amip_r2i1p1_US48_19790101-20081231.nc --output /work/cew/testing/txm_full_test1.nc --verbose

Features: 
- Checks to make sure that tasmax and tasmin were downscaled with the same method; 
this can be turned off with the --no_check option

Known issues: 
- Input data is required to have a time axis longer than 1

Updates: 
- Cloning all global attributes added 1-13-15
- Update 1-28 to include a new option, --prof, for returning timing and memory usage info

===CalcTimeMeanSeries.R===

Given a file and a variable within the file, calculates a mean for the frequencies
provided (currently one or both of 'mon' or 'ann' in a comma-separated list). 
Output data will have the same dimensions (including ensemble) as input data.

Input: An input file, a variable within that file, and a comma-separated list of
frequencies over which to calculate the mean (e.g. 'mon,ann')

Output: One or more files with. If 'ann' was one of the frequencies provided,
$filename_ann.nc will have a mean with a timseries based around the midpoint
of each year; if 'mon' was provided, $filename_mon.nc will provide a mean for a 
timseries based around the midpoint of each month per year. 

Usage: Rscript /home/cew/Code/section_8_tools/CalcTimeMeanSeries.R --input input.nc --var var -O out_base --freq 'mon,ann' [--verbose]

Example:
Rscript /home/cew/Code/section_8_tools/CalcTimeMeanSeries.R --input /net2/kd/PROJECTS/DOWNSCALING/SEC8/sampledata//tasmax_day_PMtxp1-BCQM-A01r13X01K00_sst2090_r3i1p1_US48_20860101-20951231.nc --var tasmax --freq 'ann,mon' -O /work/cew/testing/1214/tmean_out --verbose

Features: 

Updates: 
- Update 1-28 to include a new option, --prof, for returning timing and memory usage info
- Update 4-19 to include stopping with an error if there is no time axis or time axis calendar is equal to 'none'

======DERIVATIVE PRODUCT TOOLS======

===CalcClimdex.R===

Given files of tasmax, tasmin and pr data that cover the same spatial area and time range, calculates
all or some set of the 27 PCIC Climdex climate change indices (http://etccdi.pacificclimate.org/list_27_indices.shtml). 
Defaults to also saving out the quantile values used to calculate the indices. The timeseries for the annual 
data is calculated as the midpoint of the two dates per year that are farthest apart (usually Jan. 1 and Dec.1); 
the monthly timeseries is calculated as the midpoint of the two dates present in a given month per year
that are farthest apart. If more than one ensemble of data is present in all input files, then the output 
will have more than one ensemble of data present.

Input: CalcClimdex.R --tx tasmax_input --tn tasmin_input --pr pr_input -O out_base 
[--spatmask spatmask.nc] [--rnmm_thold 1] [--ndays 5] [--isNorth] 
[--indices 'all', 'hard', 'quantile','temp', 'pr] [--base_range 'data_start,data_end'] 
[--read_qtiles stored_qtiles.nc] [--qtile_rip_index 1] [--no_zhang] [--calc_ann] 
[-calc_mon] [--save_qtiles] [--convert_back] [--verbose] [-h --help]

Example:
Rscript /home/cew/Code/section_8_tools//CalcClimdex.R --tx /archive/esd/PROJECTS/DOWNSCALING/GCM_DATA/NCPP/GFDL-HIRAM-C360-COARSENED/sst2090/day/atmos/day/r1i1p1/v20110601/tasmax/US48/tasmax_day_GFDL-HIRAM-C360-COARSENED_sst2090_r1i1p1_US48_20860101-20951231.nc --tn /archive/esd/PROJECTS/DOWNSCALING/GCM_DATA/NCPP/GFDL-HIRAM-C360-COARSENED/sst2090/day/atmos/day/r1i1p1/v20110601/tasmin/US48/tasmin_day_GFDL-HIRAM-C360-COARSENED_sst2090_r1i1p1_US48_20860101-20951231.nc --pr /archive/esd/PROJECTS/DOWNSCALING/GCM_DATA/NCPP/GFDL-HIRAM-C360-COARSENED/sst2090/day/atmos/day/r1i1p1/v20110601/pr/US48/pr_day_GFDL-HIRAM-C360-COARSENED_sst2090_r1i1p1_US48_20860101-20951231.nc -O /work/cew/testing//climdex_out_i --spatmask /home/cew/Code/masks/land_masks.nc --indices 'all' --base_range '2086,2087'

For a more in-depth look at the options available, please consult the CalcClimdex readme located in
climdex/README_climdex.md; further documentation on the climdex package, which CalcClimdex wraps
around, is located at: https://cran.r-project.org/web/packages/climdex.pcic/climdex.pcic.pdf

Output: 
Up to 3 files, containing the indices using a monthly timeseries (if applicable), 
the indices using an annual timeseries (if applicable), and the quantile data
used to calculate the base_range quantiles, suitable for reading in and using for
calculations at a later date.

Features: 
- Possible to save out the quantiles used to calculate climdex indices for later analysis
- the _ann output and _mon output now reference the midpoints and bounds of the years/months
for which the data was calculated, rather than an index timeseries

Known issues: 
- Not possible to specify part of a year for the --base_range; issue present in the 
way that Climdex calculates the indices
-- it is possible to duplicate that behavior with time-windowing masks
- base_range must be present in the time range of tx, tn, and pr; makes ESD analysis
more complex
-- option to specify a file with pre-calculated base quantiles present as of 10-5
-- option to specify other files from which to calculate a base range of data
is in development
- More index-specific metadata for the climdex indices would help to reduce confusion
about what the various indices are supposed to do. 
  - More index-specific metadata present in files as of ()
-- Not possible to turn off the calculation of the Zhang et. al base range quantiles
for the climdex indices when using the base range quantiles for the future time period
- Data is required to have a time axis or have a time axis with length > 1

Updates:
- Update 8-17-15 to correct monthly timeseries being written as month,year rather than year,month
- Update 11-12-15 to allow calcualtion of climdex indices with ensemble data; quantile
file output can also be written as ensemble files; new paramaeter, --qtile_rip_index
added for reusing quantile output in that case; defaults to 1, the first rip in 
the input file.
- Update 11-13-15 with more data consistency checks, renaming calc_qtiles and stored_qtiles to save_qtiles and read_qtiles respectively, and adding a bypass for quantile calculation when not used in input indices
- Update 11-16-15 to derive base range from input data when 'hard' is specified, allow for intersection of indices (i.e. 'hard,temp')
- Update 12-14-15 to make metadata on rnnmm clearer
- Update 12-16-15 to correct timeseries and time_bnds; both now calculated from 
bounds rather than min and max of timseries[factor]
- Update 1-28 to include a new option, --prof, for returning timing and memory usage info

===CalcCornDegDays.R===

Given files of tasmax and tasmin data, calculates the number of 
corn growing degree days (days between 50 and 86 degrees F) on both a daily basis 
and as an annual sum of all growing degree days. The timeseries for the daily file
is the daily timeseries; the timeseries for the annual file is calculated as the 
midpoint of the years present.

Since the number of corn growing degree days goes to 0 during the winter months, 
using a time windowing mask to narrow in on the time period of interest is strongly
encouraged; it is still possible to run the tool without a time mask, however.

As of 11-10, the corn growing degree day tool will output ensemble data in x,y,ens,t form.
Corn growing degree days are calculated seperately for each ensemble member.

Input: Rscript /home/cew/Code/section_8_tools/CalcCornDegDays.R --tx tasmax --tn tasmin
 -O out_base [--tmask tmask] [--spatmask spatmask] [--verbose] [h --help]
 
Output: T
Two files: One with the same x,y,ens,t dimensions as the input data containing the 
daily growing degree day totals, and one file with the same x,y,ens,t data as the
input with an annual timeseries showing total growing degree days at each location
per year.

Example:
Rscript /home/cew/Code/section_8_tools/CalcCornDegDays.R --tx '/archive/esd/PROJECTS/DOWNSCALING/PM/downscaled/GFDL-HIRAM-C360-COARSENED/sst2090/day/atmos/day/r1i1p1/v20110601/PMtxp1-CDFt-A01r11X01K00/tasmax/US48/tasmax_day_PMtxp1-CDFt-A01r11X01K00_sst2090_r1i1p1_US48_20860101-20951231.nc' --tn '/archive/esd/PROJECTS/DOWNSCALING/PM/downscaled/GFDL-HIRAM-C360-COARSENED/sst2090/day/atmos/day/r1i1p1/v20110601/PMtnp1-CDFt-A01r11X01K00/tasmin/US48/tasmin_day_PMtnp1-CDFt-A01r11X01K00_sst2090_r1i1p1_US48_20860101-20951231.nc' --tmask '/home/kd/PROJECTS/DOWNSCALING/FORTRAN/AgIndices/maskdays_01APRto31OCT_19790101-20081231.nc' -O corngdd_apr_oct

Features: 
- Puts out both the annual time series of cumulative growing degree days per year and the 
  growing degree days per day (_daily.nc and _ann.nc, respectively)
- the _ann output references the midpoints and bounds of the years/months
for which the data was calculated, rather than an index timeseries

Known issues: 
- Only accepts a time mask file with a single time windowing mask present; exits 
with helpful error if more than one mask detected in the file
- Did not accept ensemble data prior to 11-10
- Time axis required to be present and have length > 1

Updates:
- Updated 11-10 to add corrections for ensemble data; will calculate corn growing 
degree days seprately for each ensemble member and output data of x, y, ens, 
t dimenisons.
- Updated 11-12 to correct error with use of spatial and temporal masks; using 
both caused the NAs in the annual data to be reset to 0. 
- Update 12-11 to address memory issues
- Update 12-17 to correct annual timseries from bounds rather than time coords
- Update 1-28 to include a new option, --prof, for returning timing and memory usage info

===MakeSmoothClimo.R===

Give four (or more) files corresponding to historical observations, historical model
samples, downscaled historical data and cross-validated downscaled historical data, returns
a single file containing a smooth climatology series for each set of input data, and 
a map of standard deviations for each x, y, t in the corresponding smooth climatology - 
eight output variables total. If more than one ensemble of input data is present, calculation 
across ensemble memebrs will take place in the Fortran code, resulting in one ensemble
of output data.

Further documentation: The Readme for the Fortran code is located at src/Readme_smclim.

Input: Rscript /home/cew/Code/section_8_tools/MakeSmoothClimo.R --Oh Oh.nc 
--Mh Mh.nc --CVh CVh.nc --DSh DSh.nc -v var -o --output out_file.nc 
--ioptin ioptin [--maxmon maxmon] [--lenmn lenmn] [--lensd lensd] [--verbose] [-h --help]

Example:
#OH
set Oh1 = "/archive/esd/PROJECTS/DOWNSCALING/GCM_DATA/NCPP/GFDL-HIRAM-C360/amip/day/atmos/day/r1i1p1/v20110601/tasmax/US48/tasmax_day_GFDL-HIRAM-C360_amip_r1i1p1_US48_19790101-20081231.nc"
set Oh2 = "/archive/esd/PROJECTS/DOWNSCALING/GCM_DATA/NCPP/GFDL-HIRAM-C360/amip/day/atmos/day/r2i1p1/v20110601/tasmax/US48/tasmax_day_GFDL-HIRAM-C360_amip_r2i1p1_US48_19790101-20081231.nc"
#MH
set Mh1 = "/archive/esd/PROJECTS/DOWNSCALING/GCM_DATA/NCPP/GFDL-HIRAM-C360-COARSENED/amip/day/atmos/day/r1i1p1/v20110601/tasmax/US48/tasmax_day_GFDL-HIRAM-C360-COARSENED_amip_r1i1p1_US48_19790101-20081231.nc"
set Mh2 = "/archive/esd/PROJECTS/DOWNSCALING/GCM_DATA/NCPP/GFDL-HIRAM-C360-COARSENED/amip/day/atmos/day/r2i1p1/v20110601/tasmax/US48/tasmax_day_GFDL-HIRAM-C360-COARSENED_amip_r2i1p1_US48_19790101-20081231.nc"
#CVh
set CVh1 = '/archive/esd/PROJECTS/DOWNSCALING/PM/downscaled/GFDL-HIRAM-C360-COARSENED/amip/day/atmos/day/r1i1p1/v20110601/PMtxp1-CDFt-A00r21X01K00/tasmax/US48/tasmax_day_PMtxp1-CDFt-A00r21X01K00_amip_r1i1p1_US48_19790101-20081231.nc'
set CVh2 = '/archive/esd/PROJECTS/DOWNSCALING/PM/downscaled/GFDL-HIRAM-C360-COARSENED/amip/day/atmos/day/r2i1p1/v20110601/PMtxp1-CDFt-A00r12X01K00/tasmax/US48/tasmax_day_PMtxp1-CDFt-A00r12X01K00_amip_r2i1p1_US48_19790101-20081231.nc'
#DSh
set DSh1 = '/archive/esd/PROJECTS/DOWNSCALING/PM/downscaled/GFDL-HIRAM-C360-COARSENED/amip/day/atmos/day/r1i1p1/v20110601/PMtxp1-CDFt-A00r11X01K00/tasmax/US48/tasmax_day_PMtxp1-CDFt-A00r11X01K00_amip_r1i1p1_US48_19790101-20081231.nc'
set DSh2 = '/archive/esd/PROJECTS/DOWNSCALING/PM/downscaled/GFDL-HIRAM-C360-COARSENED/amip/day/atmos/day/r2i1p1/v20110601/PMtxp1-CDFt-A00r22X01K00/tasmax/US48/tasmax_day_PMtxp1-CDFt-A00r22X01K00_amip_r2i1p1_US48_19790101-20081231.nc'
#output
set out_file = /work/cew/testing/smclim_out.nc'

Rscript /home/cew/Code/section_8_tools/MakeSmoothClimo.R --Oh "$Oh1,$Oh2" --Mh "$Mh1,$Mh2" --DSh "$DSh1,$DSh2" --CVh "$CVh1,$CVh2" -v 'tasmax' -o $out_file2 --ioptin 0 --verbose

Outputs: A single netCDF file $output, with eight output variables: Oh_$var_mean, Mh_$var_mean, DSh_$var_mean, CVh_$var_mean and Oh_$var_sd, Mh_$var_sd, DSh_$var_sd, CVh_$var_sd, as well as a file StandardOut_smclim, containing the status information from
the Fortran code.

Features: 
- Accepts either ensemble data via separate files, or ensemble data via an ensemble
dim for members in the same file; will throw an error if one of the OH, MH, DSh or CVh
inputs contains more (or less) years of sample data than the other input datasets. 
- Accepts either synthetic data or real-world data, based on the --ioptin parameter.
Real data has nmonths and nyears set automatically; for synthetic data, maxmon and maxyr
are passed in at the command line and checked for consistency within the code. 
Checks internal to the Fortran code check for correspondence in the maxmon and 
maxday parameters; if the R script stops in the area of the Fortran code, 
the file StandardOut_smclim should be consulted. 

Known issues: 
- Currently takes ~ 30 min. to run a full US48 grid with two ensemble members (60
years total) for each input dataset.
- Ensemble dimension detection relies on there being five dimenions in the netCDF
file (lat, lon, ensmem, bnds, time) in close to that order - user-created files may
need tweaking in order to match the input order.
- Time axis required to be present and have length > 1

Updates:
- Updates 12-11 to address some of the memory issues

===CSVBoxStatsScraper.py===

CSVBoxStatsScraper.py relies on a two-step workflow to narrow a large netCDF file 
down to a few points of data, suitable for writing as a two-dimensional .csv file. 
The ideal workflow should look like this: 

Downscaled NetCDF data -> CalcAreaStats -> CalcTimeStats1 -> CSVBoxStatsScraper.py

Inputs: Either the path to a single netCDF file, or a text file containing one
path to a netCDF file per line, with netCDF files generated by calling CalcAreaStats
and CalcTimeStats1 on input files. Also takes the optional arguments -c,--color 
(a comma-separated string of R-comptaible color names or hex codes) , -x,--xpos
(a comma-separated string of positions on the x-axis to associate with a given
input file), and --xlab (a comma-separated string of labels for the individual
datasets). 

Usage: python CSVBoxStatsScraper.py -i input -o output [--color "color_a,color_b"] [--xpos "1,3"] [--xlab 'label1,label2']

Outputs: A comma-separated file with one line per ensemble,timelevel pair, 
containing the min, max, 5th, 25th, 50th, 75th, and 95th quantiles plus the average and standard deviation, 
plus relevant file metadata and paths to the original input files. 

Examples: 

Further documentation: See the script examples/scraper_example.csh for more information 
about the workflow required to use this tool. 

Features: 
- If one of the lines of the input files points to an incorrectly-formatted netCDF file
(one not generated by the AreaStats -> TimeStats1 workflow), prints a warning 
and puts out all missing values for the data that would have been obtainted 
from that file.
- Lines prefaced with '#' in the input text file will not be parsed. Useful for leaving
comments on input data.

Known issues: 
- No good check for colors allowable in X11 vs colors allowable by R at this level

Updates: 
- Update 12-17 to write lon and lat coods of 0 for better output chaining from
one section 8 tool to another

===ThresholdRelativeCounter.R===

Given an input file, variable, condition, and threshold, computes the number of 
ens, time points at each i,j location match the condition and threshold. If input
data has more than one ensemble present, the output will only have one ensemble of
data. 

Inputs: A file and the variable within it, a condition (greater than, less than
or equal to, approximately equal) and a threshold for that condition. If the 
condition is approximately equal, an epsilon (for  = +/- eps) is also required.

Usage: ThresholdRelativeCounter.R --input input.nc --var var 
--condition GT | GE | LT | LE | EQ | AQ --threshold 0.5 -O out_base --eps eps 
[--tmasks tmasks] [--spatmask spatmask] [--calc_both] [--verbose] [-h --help]

Outputs: A file with the same spatial dimensions as the input file, with a count
at each i,j, point of how many points along the time and ens axes matched the 
condition/threshold. If a time mask is provided, will write to a climatological 
time series with as many time levels as masks in the input mask file. If a spatial
mask is provided, the output will be spatially masked. 

Examples:
Rscript /home/cew/Code/section_8_tools/ThresholdRelativeCounter.R --input tx_input.nc --var tasmax --thold 275 --condition 'GT' --tmasks /archive/esd/PROJECTS/DOWNSCALING/MASKS/timemasks/maskdays_byseason_20860101-20951231.nc --calc_both -O /work/cew/testing/1214/thold_out_eq

Features: 
- Allows for checking number of points greater than (GT), less than (LT), 
greater than or equal to (GE), less than or equal to (LE), equal to (EQ), or 
approximately equal to (AQ) the threshold.

Known issues:
- Required to have a time axis present and of length > 1

Updates:
- Update 1-28 to include a new option, --prof, for returning timing and memory usage info
- bugfix 3-10-16 to fix modulo attributes of the climatological time series not being written

===HeatThreshold.R===

Given a file of maximum temperature data (tasmax), computes the number of points
per year and per month that had a maximum temperature greater than 30, 35, 38, 
40 and 45 degrees Celsius. If input data has more than one ensemble present, output
will also have more than one ensemble dimension. 

Inputs: A file of maximum temperature data, with optional spatial and temporal masks

Usage: HeatThreshold.R --tx tasmax -O out_base [--tmask tmask] [--spatmask spatmask] [--verbose] [-h --help]

Outputs: Two files with the same spatial and ensemble dimensions as the input 
data, with the annual and monthly counts of days exceeding each threshold, 
respectively. 

Examples: 
Rscript dev/HeatThreshold.R --tx /archive/esd/PROJECTS/DOWNSCALING/OBS_DATA/GRIDDED_OBS/daymet/historical/atmos/day/r0i0p1/v2p1/tasmax/SCCSC0p1/tasmax_day_daymet_historical_r0i0p1_SCCSC0p1_19800101-20051231.nc -O /work/cew/testing/heat_thold_out --verbose

Features: 

Known issues:
- Required to have a time axis present and with a length > 1

Updates: 
- Update 1-28 to include a new option, --prof, for returning timing and memory usage info

Note: currently testing spatmask and tmask options. 

===AvgEns.R===

Given a file containing a dimension named ensmem, averages all variables using 
ensemem across the ensemble dimension, making metadata additions as appropriate. 
If no output file is specified, the output file is written to the input file 
directory as ${input}_avg.nc

Inputs: A file containing one or more variables using a dimension ensmem. If no 
output filename is given, output will be written to the same directory as the 
input, using the filename ${input_basename}_ens_avg.nc

Usage: AvgEns.R --input in.nc [-o output.nc] [--tmpdir] [--verbose] [-h --help]

Outputs: A file with the same x,y, and time dimensions as the input file, where 
each variable using the ensemble dimenions has been averaged across the ensemble
dimension.

Examples: 
Rscript AvgEns.R /work/cew/downscaled/GFDL-HIRAM-C360-COARSENED/amip/day/atmos/day/r1to3i1p1//v20110601/PMtxanp1-PARM-G01a05far1e3X01K00/tasmax/US48//tasmax_day_PMtxanp1-PARM-G01a05far1e3X01K00_sst2090_r1to3i1p1_US48_20860101-20951231_ensmbl.nc /work/cew/testing/ens_avg_out.nc

Rscript AvgEns.R /work/cew/downscaled/GFDL-HIRAM-C360-COARSENED/amip/day/atmos/day/r1to3i1p1//v20110601/PMtxanp1-PARM-G01a05far1e3X01K00/tasmax/US48//tasmax_day_PMtxanp1-PARM-G01a05far1e3X01K00_sst2090_r1to3i1p1_US48_20860101-20951231_ensmbl.nc

Features: 
- Exits with an error if there is no dimenison ensmem, or no variables use ensmem

Known issues:

Updates: 

Note: 

===MakeAnyMask.R===

Given a file, variable $var, and condition, outputs a mask of the same dimensions
as the input data where all places where the condition was TRUE are 1, and all 
places where the condition was FALSE are 0 or NA. Works well in conjunction with
ApplyAnyMask to nawrrow down analysis to points of interest.

Inputs: A file, a variable $var, a condition, and a threshold in the same units
as the variable in the file

Usage: Usage: MakeAnyMask.R --input input.nc --var var 
--condition GT | GE | LT | LE | EQ | AQ --threshold 0.5 -o output.nc 
--eps eps [--zero_mask] [--verbose] [-h --help]

Outputs: A mask named ${var}_mask with the same dimensions as the input data, 
where all points where the condition was true are 1 and all points where it was 
false are either NA or 0, if the --zero_mask flag was used.

Examples: 
Rscript MakeAnyMask.R --input /archive/esd/PROJECTS/DOWNSCALING/OBS_DATA/GRIDDED_OBS/daymet/historical/atmos/day/r0i0p1/v2p1/tasmax/SCCSC0p1/tasmax_day_daymet_historical_r0i0p1_SCCSC0p1_19800101-20051231.nc --threshold 275 --condidtion GT --output /work/cew/testing/binary_mask_out.R --verbose

Features: 
--invoking the --zero_mask flag will write all places where the condition was
FALSE as 0 rather than NA.

Known issues:

Updates: 

Note: 

===CalcSimpleDegDay.R===

Given either a file of daily mean temperature data or a file of tasmax and a file
of tasmin data, calculates the number of degree days above or below the given
threshold at each i,j,ens point, writing out the daily total and monthly total 
in separate files.

a file, variable $var, and condition, outputs a mask of the same dimensions
as the input data where all places where the condition was TRUE are 1, and all 
places where the condition was FALSE are 0 or NA. Works well in conjunction with
ApplyAnyMask to nawrrow down analysis to points of interest.

Inputs: Either a file of dmt data or files of tasmax and tasmin data of the same
dimensions, a threshold, and whether days are accumulated above or below the threshold.

Usage: CalcSimpleDegDay.R [--tx tasmax --tn tasmin] [--dmt dmt]  --threshold thold --accumulate above|below --units F|C|K -O out_base [--spatmask spatmask] [--verbose] [-h --help]

Outputs: Two files containing a var degday, one with the daily counts 
($out_base_daily.nc), and one with the monthly totals ($out_base_mon.nc)

Examples: 

Rscript /home/cew/Code/section_8_tools/CalcSimpleDegDay.R --tx /archive/esd/PROJECTS/DOWNSCALING/GCM_DATA/NCPP/GFDL-HIRAM-C360-COARSENED/sst2090/day/atmos/day/r1i1p1/v20110601/tasmax/US48/tasmax_day_GFDL-HIRAM-C360-COARSENED_sst2090_r1i1p1_US48_20860101-20951231.nc --tn /archive/esd/PROJECTS/DOWNSCALING/GCM_DATA/NCPP/GFDL-HIRAM-C360-COARSENED/sst2090/day/atmos/day/r1i1p1/v20110601/tasmin/US48/tasmin_day_GFDL-HIRAM-C360-COARSENED_sst2090_r1i1p1_US48_20860101-20951231.nc --threshold 0 --units C --accumulate 'above' --out_base /work/cew/testing/0816/degday_txtn --spatmask /home/cew/Code/masks/land_masks.nc


Rscript /home/cew/Code/section_8_tools/CalcSimpleDegDay.R --dmt /work/cew/testing/0816/tas_out.nc --threshold -17.8 --accumulate 'below' --out_base /work/cew/testing/0816/degday_dmt

Features: 
- Can accept negative command-line arguments for thresholds

Known issues:

Updates: 

Note: 
======DATA MANIPULATION TOOLS======

===CalcDiff.R, CalcAbsDiff.R===

Standalone scripts for calculating the an absolute difference between two files, 
and writing the result to a netCDF file with dimensions cloned from the input files, 
including ensemble dimensions. CalcDiff writes the relative difference, and
CalcAbsDiff calculates the absolute
value of that difference.

Input: Rscript CalcAbsDiff.R -v var -i 'file1,file2' -o output_filename

Examples: 
Rscript /home/cew/Code/CalcAbsDiff.R -v tasmax -i "tasmax_file_downscaled.nc, tasmax_file_orig.nc" -o new_tasmax_diff_file.nc
Rscript /home/cew/Code/CalcAbsDiff.R -v tasmax -i "/home/cew/Code/testing/tasmax_day_test2-DeltaSD-A38e-mL01K00_rcp85_r1i1p1_RR_20060101-20991231.I300_J31-170.nc, /home/cew/Code/testing/tasmax_day_test2-CDFt-A38e-mL01K00_rcp85_r1i1p1_RR_20060101-20991231.I300_J31-170.nc"  -o /home/cew/Code/testing/new_tasmax_diff_file.nc

Outputs: A single netCDF file named output_filename, with a variable $varDiff or $varAbsDiff, depending upon whether
CalcDiff or CalcAbsDiff was called.

Known issues: None at this time
Revised Apr 7 to include command-line flag parsing

===MakeDailyClimo.R===

Given a file, computes the 365-day climatology average for the entire timeseries. If the timeseries
uses a julian calendar, the February 29ths are removed for the calculation. If 
more than one ensemble of data is present in the input files, ensembles are averaged
together for the climatology calculation. If --Sbx is greater than 0, writes the
Sbx-day running mean (Sbx defaults to 1) ); if --Sbx is not specified no smoothing is added.
If either --start_year or --end_year is specified, climatology will only be calculated to (or from) that year; defaults to the starting and ending years of the file, respectively.

Input: Rscript MakeDailyClimo.R -i input.nc -v var -o output.nc [--Sbx 1] [--tseries] [--write_climvar] [--start_year start_year] [--end_year end_year]

Examples:
Rscript /home/cew/Code/section_8_tools//MakeDailyClimo.R -i /work/cew/testing//diff_file.nc -v tasmaxDiff -o /work/cew/testing//stat_file.nc

Rscript /home/cew/Code/section_8_tools//MakeDailyClimo.R -i /work/cew/testing//diff_file.nc -v tasmaxDiff -o /work/cew/testing//stat_file.nc --Sbx 7

Rscript /home/cew/Code/section_8_tools//MakeDailyClimo.R -i /work/cew/testing//diff_file.nc -v tasmaxDiff -o /work/cew/testing//stat_file.nc --start_year 1979 --end_year 1989

Outputs: A file output.nc, containing a 365-day timeseries with the calculated modular average as the 
variable var within the file. If --Sbx is greater than 0, data will be smoothed with a --Sbx-day running mean. If --write_climvar is set, then output is written as $varClim; otherwise, defaults to $var, which matches prior behavior.

Features:
- Added command-line flag parsing 4-7
- Added calculation of leapday removal masks on the fly 9-17
- Update 1-28 to include a new option, --prof, for returning timing and memory usage info
- Update 5-2 to address bug related to time_bnds and add the --Sbx option
- Update 6-16 with new arguments, --tseries and --write_climvar.  Tseries can write a climatology
  file of the same dimenisons as the input, and --write_climvar writes output as $varClim rather than $var.
- Update 6-20 with --start_year and --end_year arguments 

Known issues:
- Required to have a timeseries length greater than 1

===AddAnom.R===

Given a climatology and anomaly file of the same dimensions covering the same timeseries, reconstructs the original
variable by adding the anomaly and climatology datasets together. Since there is
always only one ensemble of climatology data, the output has as many ensembles as
the input climatology dataset.

Input: Rscript AddAnom.R --clim "climatology_file.nc" --anom "anom_file.nc" -v var -o output.nc

Examples:
Rscript AddAnom.R --clim "/archive/esd/PROJECTS/DOWNSCALING/GCM_DATA/NCPP/GFDL-HIRAM-C360-COARSENED/sst2090/day/atmos/clim2086-2095Sbx7/r1to3i1p1/v20110601/tasmaxClim/US48/tasmaxClim_day_GFDL-HIRAM-C360-COARSENED_sst2090_r1to3i1p1_US48_20860101-20951231.nc" --anom "/archive/esd/PROJECTS/DOWNSCALING/GCM_DATA/NCPP/GFDL-HIRAM-C360-COARSENED/sst2090/day/atmos/PManom2086-2095Sbx7/r1i1p1/v20110601/tasmaxAnom/US48/tasmaxAnom_day_GFDL-HIRAM-C360-COARSENED_sst2090_r1i1p1_US48_20860101-20951231.nc" -v 'tasmax' -o '/work/cew/test_anom_add.nc'

Outputs: A file output.nc, containing the variable var obtained by adding the variables varClim and varAnom in the files 
climatology_file.nc and anom_file.nc, respectively.

Features:

Known issues: 
- Note that the -v option refers to the variable that the climatology and anomaly files were caluclated from
(i.e. 'tasmax' for 'tasmaxAnom' and 'tasmaxClim'). 
- Time axis required to be present and be greater than length 1

Updates: 
- Update 1-28 to include a new option, --prof, for returning timing and memory usage info

===EditSpatialMask.R===

Given a spatial mask and a formatted file of edits to perform, edits the spatial mask to produce a new mask. There are multiple options
to describe the effect of the  
An example of an edit file is included under sample_edit_file.txt. 

Input: EditSpatialMask.R -i input -O out_base --edit_file edit_file.txt [--start 'mask', 'all_one', 'all_missing', 'invert'] [--display] [--text_display] [--include_edit_comment] [-o --overwrite] [--force_single_point] [-t --timstamp] [--verbose] [-h --help]

Examples: 
Rscript EditSpatialMask.R -i maskdir/land_masks.nc -O /home/cew/Code/testing/florida --edit_file florida_edit_file.txt --include_edit_comment --text_display --start 'mask'

Outputs: a single file ${out_base}_mask.nc, containing a single variable $basename(out_base)_masks. 
If timestamp is set to true, the file is of the form ${out_base}_mask_${timestamp}.nc

Features: 
- Can specify longitude (XY) in degrees west or degrees east > 180
- --start will allow users to start their edits from a field of all missing values ('false'), a field
of all 1's ('all_one'), the same mask input into the editing function ('mask'), or the inverse of the mask
passed into the editing function ('invert'). Defaults to 'mask', the preferred behavior.

Known issues: 
-If the mask produced consists of all missing values, the --display command will return
an error; this error is now caught and more helpful output is printed to the command line.
--Addressed in Jun. 3 update; now displays a blank image and prints more helpful error messages
- If a point is passed in that lies between two cells, both cells are returned; this has caused
user confusion due to lack of clarity
-- Addressed Jun. 9; explicit command-line option to control behavior (--force_single_point)

===DetectMissvals.R===

Given a netCDF file and the name of a variable, counts the number of missing values present
and prints the total to stdout. If 0, the exit status is 0; if greater than 0, the exit status
is 1. If given a spatial mask of compatible dimensions, will only count missing values in the
areas not missing in the spatial mask.

Input: DetectMissvals.R -i input.nc -v var [--spatmask spatmask.nc]

Examples:
Rscript DetectMissvals.R -v tasmax -i /home/cew/Code/testing//tasmax_day_PMtx1-CDFt-A18jX01K00_rcp85_r1i1p1_RR_20060101-20991231.I300_J31-170.nc --spatmask /archive/esd/PROJECTS/DOWNSCALING/MASKS/geomasks/red_river_0p1/OneD/red_river_0p1_masks.I300_J31-170.nc

Features: Accepts files of any dimensions but 0 (single point)

Known issues: None at this time.

===ApplyPrecipThreshold.R===

Given a file of input data and a FUDGE-style precipitation threshold (either one 
of us_trace (0.01 in./day), global_trace(0.1 mm/day), zero (0 units/day), or a 
numeric in the same units as the input data), writes a file of the same dimensions
(including number of ensembles) as the input data, where all points less than 
the threshold are set to 0.

Input: section_8_tools/ApplyPrecipThreshold.R -i input -v var -o output 
--threshold threshold [--verbose] [-h --help]

Outputs: A file of the same dimensions and same units as the input data, but with 
a 0 in every x,y,ens,t point less than the threshold

Examples:
Rscript ApplyPrecipThreshold.R --input /archive/esd/PROJECTS/DOWNSCALING/GCM_DATA/NCPP/GFDL-HIRAM-C360/sst2090/day/atmos/day/r1i1p1/v20110601/pr/US48/pr_day_GFDL-HIRAM-C360_sst2090_r1i1p1_US48_20860101-20951231.nc --var 'pr' --output "/work/cew/pr_testing_i.nc" --threshold 'us_trace'

Features: 

Known issues: None at this time.

Updates: 
- Update 1-28 to include a new option, --prof, for returning timing and memory usage info
- Update 4-19 to handle data without a time axis

===TrimTimeAxis.R===

Given an input file with a time axis and one or more variables that reference the 
time axis, and one or more instructions for trimming the file, writes a file 
with a trimmed time axis, with the vars that reference time also trimmed along the 
time axis. 

Input: Rscript TrimTimeAxis.R --input in.nc -o output.nc [ --start start-date] 
[--end end-date] [--t0 new-origin-date] [--verbose] [-h --help]

Outputs: A file with the same variables, global attributes and units as the
 input file, but with a reduced time axis, and all vars that reference time 
 reduced to the new time values. If a new origin is specified (t0), the values
 of the time dimension will have been changed to reflect the new origin. 

Examples:

Rscript /home/cew/Code/section_8_tools/TrimTimeAxis.R --input /net2/kd/PROJECTS/DOWNSCALING/SEC8/sampledata//tasmax_day_PMtxp1-BCQM-A01r13X01K00_sst2090_r3i1p1_US48_20860101-20951231.nc --output /home/Oar.Gfdl.Esd/CEW_sandbox//taxis_data.nc --t0 '2000-07-04'

Rscript /home/cew/Code/section_8_tools/TrimTimeAxis.R --i /archive/esd/PROJECTS/DOWNSCALING/MASKS/timemasks/maskdays_bymonth_20860101-20951231.nc --start '2086' --end '2090' -o /home/Oar.Gfdl.Esd/CEW_sandbox//taxis_mask.nc

Features: 
- It is possible to reset the time origin (t0) of the input file without specifying
any other editing arguments
- If no editing arguments are specified, script exits with an error
- start, end and origin dates are checked for consistency against the data in the 
file and against each other

Known issues:
- Data is required to have a time axis present

Updates
- Updated 1-13-15 to use gcp, the nco utilities and an ncl routine to edit the time axis. Checks on input are still done in R.
- Updated 1-19-15 along with modules loaded in setenv_s8 to correct noleap calendars in ncks
- Update 1-28 to include a new option, --prof, for returning timing and memory usage info

===ApplyAnyMask.R===

Given an input file and variable, as well as a mask file, masks the input data 
over all dimensions that the mask and the data have in common (i.e. a mask
of i,j,t masking data of i,j,ens,t would have identical missing values for
each ensemble; applying a spatial mask would give identical missing values
for each ens,t point). If no dimensions match, or if there is more than one mask
in the mask file, the program throws an error.

Input: Rscript ApplyAnyMask.R --input in.nc -v var --anymask anymask.nc 
-o output.nc [--prof] [--verbose] [-h --help]

Outputs: A file with the same dimensions as the input file, but with a NA in all
places where the mask was missing.

Examples:

Rscript /home/cew/Code/section_8_tools/ApplyAnyMask.R --input tasmax_day_PMtxanp1-PARM-G01a05far1e3X01K00_sst2090_r1to3i1p1_US48_20860101-20951231_ensmbl.nc --var 'tasmax' --anymask /work/cew/testing/0419//bmask_out.nc --output /work/cew/testing/0419//masked_out.nc

Features: 

Known issues:
- If more than one mask variable is present in the input mask file, then the 
script will exit with an error

Updates

===ExtendClimatology.R===

Given a file with 365-day climatology and an input file containing a timeseries
longer than one year, expands the climatology in the climatolgy file until it
matches the time dimension of the input file. If the input calendar is Julian, 
Feb. 28 in the climatology is written to Feb. 29; if --Sbx is provided, a boxcar 
smoothing of Sbx days is applied. 

Input: Rscript ExtendClimatology.R -i input.nc --clim climatology.nc -v var -o output.nc [--Sbx 0]

Examples:
Rscript /home/cew/Code/section_8_tools/ExtendClimatology.R --input /net2/kd/PROJECTS/DOWNSCALING/SEC8/sampledata//tasmax_day_GFDL-HIRAM-C360_sst2090_r3i1p1_US48_20860101-20951231.nc --clim /work/cew/testing/0616/clim_file.nc --var 'tasmax' --Sbx 7 --output /work/cew/testing/0616/expanded_climatology.nc

Outputs: A file output.nc, containing a 365-day climtology of $varClim expanded to the same dimensions as $input. If --Sbx is greater than 0, data will be smoothed with a --Sbx-day running mean.

Features:

Known issues:

===MakeAnom.R===

Given a file containing $var and a climatology file of the same dimensions containing $varClim, subtracts $varClim from $var to create $varAnom.

Input: Rscript MakeAnom.R -i input.nc --clim clim.nc -v var -o output.nc 

Examples:
Rscript /home/cew/Code/section_8_tools/MakeAnom.R --input /net2/kd/PROJECTS/DOWNSCALING/SEC8/sampledata//tasmax_day_GFDL-HIRAM-C360_sst2090_r3i1p1_US48_20860101-20951231.nc --clim /work/cew/testing/0616/expanded_climatology.nc --var 'tasmax' --output /work/cew/testing/0616/anom_out.nc

Outputs: A file output.nc, containing a var $varAnom of the same dimensions as $var in $var.

Features:
- uses the updated namong conventions for title attributes

Known issues:
- was showing issues with non-duplication of environment objects in R 3.2.1; solved by casting as list, but keep an eye out in the future.

===GroupBySeason.R===

Given a file containing a variable $var and a user-defined season, computes
seasonal sums and/or weighted means for each occurance of that season in the 
time period the file covers.

Input: A file containing a variable to be averaged over/added over, a specification of a season either by a comma-separated string or a start month, 

Usage: /home/cew/Code/section_8_tools/GroupBySeason.R --input input.nc --var var --output output.nc [--mean] [--count] [--start_mon January] [--seasons 1,1,2,2,2,3,3,3,4,4,4,1 ] [--spatmask spatmask] [--verbose] [-h --help]

Outputs: A file of the same x, y, ens dimensions as the input covering the same time period. If the season
specification wraps around the end of the year ( as it does for a December-Januray-February winter or a start_mon or anything except January) the time dimension will ahve smaller entries at the end for the incomplete seasons, and the 

Example: 

Rscript /home/cew/Code/section_8_tools/GroupBySeason.R --input /work/cew/testing/0816/degday_txtn_mon.nc --var degday --start_mon July --output /work/cew/testing/0816/seasongroup_mon.nc --mean --count

Known Issues: 
- Specifying a season 12 months long or 12 seasons 1 month long throws an error, 
due to issues combining factors within the tool
- ...it should be possible to also trigger that particular issue by setting two or more of July, January or June to their own season - or March and May as their own seasons. I'm not sure what a long-term fix for this would be.

======GRAPHICS TOOLS======

===MakeFreqHist.R===

Input: Takes command-line flags to control input. The only flags required for the script to run
are -i (--input), -v (--var), and -o (--output).

Rscript MakeFreqHist.R -i input.nc --var tasmax -o output

Output is currently required to be a file basename, without a suffix, since a directory of that
name will be made if the --composite flag is set to F.

See the documentation on MakeFreqHist for more information.

Examples:
Rscript MakeFreqHist.R -i /work/cew/testing//abs_diff_file.nc --var tasmaxAbsDiff 
-o /home/cew/Code/testing/output_abs_diff_histogram --overwrite TRUE --timestamp FALSE --save TRUE --pal 'brewpal.Set3'

Rscript MakeFreqHist.R -i /work/cew/testing/ensemble_out_file.nc --var tasmax
-o /home/cew/Code/testing/output_ensemble_hist --overwrite F --pal 'red, blue, orange, pink' --legend TRUE

Outputs: If composite was true (T, TRUE, 'T' or 'TRUE'), outputs a single .svg file named output.svg, 
where individual histograms are placed on the same  axis. If composite was false (F, FALSE,'F','FALSE'), 
outputs a directory named output, containing files named output_1.svg, output_2.svg...
where each file is an individual histogram. If timestamp was set to true (same as composite), every 
file has a timestamp attatched to it. If save was set to true, also outputs a file
output_settings.R, with the settings used to generate the histograms; if composite was
false, this file is located in the output/ directory.

Depending on the arguments supplied, histograms can be plotted as separate
ensmbles, or as ensembles averaged together with different time masks applied.

Further documentation: See the MakeFreqHist_documentation.ods and ParseGraphArgs.odt documents for more details.

Known issues:
- parsing xtick and ytick arguments is buggy
--- Bugfix in 6-19; documentation updated to match.
- currently can only write in .svg format
--- arguably, this is a feature.
- output parsing plus the file redirect leads to incorrect status returned when job fails (should be 0, returns 1)
--- fixed 4-7-15
- Writing to a noexistant dir creates an unhelpful error message ('cairo error 'error while writing to output stream')
-- better error checking added 7-13-15

Features
- Now accepts a --verbose option for turning on status messages (turned off by default)

===MakeBoxplot.R===

Input: Takes a single .csv output from CSVBoxStatsScraper.py . User edits to the 
.csv are acceptable,, so long as the header names used by MakeBoxplot
are still present in the modified file.

Usage: Rscript MakeBoxplot.R -i input -o output [--title title] [--ylab ylab] [--pixel_dim '1024x800' | --in_dim '5,7' ][--verbose] [-h --help]

Output: A single.svg file of the boxplots. 

Examples: Rscript MakeBoxplot.R -i /home/cew/Code/section_8_tools/dev/modified_scraper_example.csv -o /home/cew/Code/testing/labelled_boxplot_image.svg --title 'Historical vs Future Data' --in_dim '6,8' --ylab "Total Growing Degree Days"

Further documentation: See the script examples/scraper_example.csh for more information 
about the workflow required to use this tool. 

Features: 

Known issues: None at this time

Updates:
- Updated 1-8 with a --xlim option.

About

set of analysis tools

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published