-
Notifications
You must be signed in to change notification settings - Fork 1
ECCO Cloud Utils Guide
Kevin edited this page Dec 29, 2020
·
3 revisions
Table of Contents
- The following functions are used during the execution of the pre-processing pipeline. All other functions within generalized_functions.py are used elsewhere or outdated.
- Function Location:
- ecco_cloud_utils/generalized_functions.py
- Description:
- Calculates the minimum and maximum length of the data product grid cells (length of cell at maximum latitude, length of cell at Equator).
- Gets area definition of data grid using pyresample.get_area_def().
- Converts longitude to -180 -> 180 if it was defined as 0 -> 360.
- It then creates the swath definition of the data grid using pyresample.SwathDefinition().
- Arguments:
- All arguments come from the transformation config
- product_name
- data_res
- data_max_lat
- area_extent
- dims
- proj_info
- Return Values:
- source_grid_min_L -- Minimum length of data product grid cells
- source_grid_max_L -- Maximum length of data product grid cells
- source_grid -- Swath definition of source grid
- data_grid_lons -- Longitude values for data grid
- data_grid_lats -- Latitude values for data grid
- Function Location:
- ecco_cloud_utils/generalized_functions.py
- Description:
- Transforms source data to target grid and defines metadata fields. Uses transform_to_target_grid() function in ecco-cloud-utils. This function loops through every target grid point and takes the mean (or median) of the source field values using the return values from the mapping function (find_mappings_from_source_to_target()).
- Arguments:
- data_field_info -- Dictionary containing data field information
- record_date -- Date of the current data file
- model_grid -- Model grid dataset
- model_grid_type -- Type of model grid (llc or latlon)
- array_precision -- Precision to use (‘float32’)
- record_file_name -- Filename of current data file
- data_time_scale -- Time scale of data (“daily” or “monthly”)
- extra_information -- List of additional dataset information
- ds -- Dataset for current data file
- factors -- Tuple containing return values from find_mappings_from_source_to_target()
- time_zone_included_with_time -- Boolean for time zone in data file time values
- model_grid_name -- Name of model grid
- Return Values:
- data_da -- Transformed data array for current data file
- Function Location:
- ecco_cloud_utils/generalized_functions.py
- Description:
- Saves daily data aggregated into an annual file, and (if wanted) creates a monthly meaned annual file and saves it.
- Arguments:
- DS_year_merged -- Yearly merged data array
- data_var -- Name of variable to be used in aggregation (name of transformed data array)
- do_monthly_aggregation -- Boolean for aggregating data into an annual monthly mean data file
- year -- Year for aggregation
- skipna_in_mean -- Boolean for skipna argument in np.mean()
- filenames -- Dictionary with filenames for aggregated data
- fill_values -- Dictionary with fill values for binary and netCDF
- output_dirs -- Dictionary with directories for saving files
- binary_dtype -- Binary data type
- model_grid_type -- Type of model grid (llc or latlon)
- on_aws -- AWS s3 object used to determine how to save the aggregated files. If it exists, only the binary is saved, otherwise it follows the values set by save_binary and save_netcdf.
- save_binary -- Boolean to save binary files
- save_netcdf -- Boolean to save netCDF files
- remove_nan_days_from_data -- Boolean to remove days with no data from the transformed data files.
- data_time_scale -- Time scale of data (“daily” or “monthly”)
- uuids -- List of UUIDs to add to the metadata for aggregated products (annual daily and annual monthly as requested)
- Return Values:
- True if the year contains no numerical data (all NaNs), and False otherwise.
- If the year is empty, it is not aggregated or saved, and the code continues on to the next year.
- True if the year contains no numerical data (all NaNs), and False otherwise.
-
Function Location:
- ecco_cloud_utils/records.py
-
Description:
- Creates an empty record (data array) matching the model grid format with basic metadata information and variable/coordinate structure.
-
Arguments:
- standard_name -- Data field standard name
- long_name -- Data field long name
- units -- Data field units
- record_date -- Date of the current data file
- model_grid -- Model grid dataset
- model_grid_type -- Type of model grid (llc or latlon)
- array_precision -- Precision to use (‘float32’)
-
Return Values:
- data_da -- Empty data array matching format of model grid
-
Function Location:
- ecco_cloud_utils/records.py
-
Description:
- Saves a binary and netCDF4 version of the supplied data, using the supplied filename.
-
Arguments:
-
data -- Data to be saved
- If data is a dataset, data_var must contain the name of the variable to save.
- output_filename -- Filename of output files
- binary_fill_value -- Fill value for binary file
- netcdf_fill_value -- Fill value for netCDF file
- netcdf_output_dir -- Directory to save netCDF file
- binary_output_dir -- Directory to save binary file
- binary_output_dtype -- dtype of the binary output file
- model_grid_type -- Type of model grid (llc or latlon)
- save_binary -- Boolean to save binary file (default True)
- save_netcdf -- Boolean to save netCDF file (default True)
- data_var -- Variable of the data to use (default blank)
-
data -- Data to be saved
-
Return Values:
- No return value
-
Function Location:
- ecco_cloud_utils/mapping.py
-
Description:
- Transforms source data to the supplied model grid.
-
Arguments:
- source_indicies_within_target_radius_i -- Dictionary where key is target index and value is a list of source indices within the key
- num_source_indicies_within_target_radius_i -- Array with the number of source indices present in the grid radius with index i
- nearest_source_index_to_target_index_i -- Dictionary where the key is target index and values is the closest source index
- source_field -- 2D field from the source data
- target_grid_shape -- Shape of the target grid 2D array
- operation -- Transformation operation (default “mean”, supports “mean”, “nanmean”, “median”, “nanmedian”, and “nearest”)
- allow_nearest_neighbor -- Boolean to use nearest neighbor if no data points are found in the model grid cell (default True)
-
Return Values:
- **source_on_target_grid **-- Transformed source data, now on the supplied model grid
- Function Location:
- ecco_cloud_utils/mapping.py
- Description:
- Creates the mapping from the source grid to the target grid using bin averaging or nearest neighbour if no data points exist in the target grid.
- Arguments:
- source_grid -- Source grid definition from pyresample in generalized_grid_product()
- target_grid -- Target grid definition from pyresample using SwathDefinition() and the target grid’s dimension values
- target_grid_radius -- effective_grid_radius from target grid
- source_grid_min_L -- Minimum length of data product grid cells
- source_grid_max_L -- Maximum length of data product grid cells
- neighbours -- Number of neighbours to use (default 100)
- Return Values:
- source_indicies_within_target_radius_i -- Dictionary where key is target index and value is a list of source indices within the key
- num_source_indicies_within_target_radius_i -- Array with the number of source indices present in the grid radius with index i
- nearest_source_index_to_target_index_i -- Dictionary where the key is target index and values is the closest source index
- This file contains functions used during the transformation step of the pipeline for specific datasets. These functions are made for select datasets and are used only for those datasets by providing the name of the function in the dataset’s transformation configuration file.
- The pre-transformation specific functions are done prior to the transformation itself and use the raw harvested dataset.
- The post-transformation specific functions are done immediately after transformation and use the transformed data array.
-
All pre-transformation functions take a dataset argument and return the same dataset but modified. This dataset argument is the unmodified data collected from the harvester.
-
RDEFT4_remove_negative_values():
- This function is used on the RDEFT4 sea ice dataset. For every field other than “lat” or “lon”, replace negative values with NaN.
-
All post-transformation functions take a data array argument and the name of the field of the data array and return the same data array but modified. This data array argument is the transformed data array.
-
avhrr_sst_kelvin_to_celsius():
- This function is used on the AVHRR sea surface temperature dataset. If the field is “analysed_sst”, the unit attribute is set to “Celsius” and each value is converted to celsius instead of kelvin.
-
seaice_concentration_to_fraction():
- This function is used for seaice datasets. If the field is “ice_conc”, the unit attribute is set to “1” and each value is converted to a fractional value instead of percentage.
-
MEaSUREs_fix_time():
- This function is used for the MEaSUREs dataset. It takes the time given by the dataset, removes everything but the date and sets that as the new start_time. Then adds one day and sets that as the new end_time. These new time values are then used for the time_start and time_end value in the data array.