|
| 1 | +# Calling python code from EAMxx atmosphere processes |
| 2 | + |
| 3 | +## Requirements |
| 4 | + |
| 5 | +In order to call python code from an EAMxx atmosphere process, |
| 6 | +EAMxx must be build with the CMake option `EAMXX_ENABLE_PYTHON=ON`, |
| 7 | +and the CMake variable `Python_EXECUTABLE` must point to a python3 |
| 8 | +executable, with python version >= 3.9. Additionally, the python package |
| 9 | +`pybind11` must be installed (e.g., via pip or conda). |
| 10 | + |
| 11 | +If `EAMXX_ENABLE_PYTHON=OFF`, none of the code that is needed to call |
| 12 | +python from EAMxx will be compiled. |
| 13 | + |
| 14 | +## Usage |
| 15 | + |
| 16 | +If python support is enabled, every atmosphere process stores |
| 17 | +data structures that can hold python-compatible arrays and modules. |
| 18 | +During construction, if the input parameter list contains a non-trivial |
| 19 | +entry for the key `py_module_name`, EAMxx will automatically set up |
| 20 | +these data structures. In particular, EAMxx will |
| 21 | + |
| 22 | + - create python-compatible arrays for each of the input/output/internal |
| 23 | + fields that are registered in the class. These are stored in two maps: |
| 24 | + `m_py_fields_dev` and `m_py_fields_host`, which store python-compatible |
| 25 | + arrays for the device and host views of the Field, respectively. The maps |
| 26 | + are in fact nested maps, so that the python-compatible array for field X |
| 27 | + on grid Y can be retrieved via `m_py_fields_host[Y][X]`. |
| 28 | + - load the python module provided via parameter list, so that its interfaces |
| 29 | + can later be called during init/run phases. The module is then stored in |
| 30 | + the local member `m_py_module`. If the module is in a non-standard path, |
| 31 | + the parameter list entry `py_module_path` can be used to specify its path, |
| 32 | + which will be added to python's search path before loading the module. |
| 33 | + |
| 34 | +Due to implementation details in the pybind11 library, and to avoid compiler warnings, |
| 35 | +all the python-compatible data structures are stored wrapped inside `std::any` objects. |
| 36 | +As such, the need to be properly casted to the correct underlying type before being used. |
| 37 | +In particular, the fields and module can be casted as follows: |
| 38 | + |
| 39 | +```c++ |
| 40 | + auto& f = std::any_cast<pybind11::array&>(m_py_fields_host[grid_name][fname]); |
| 41 | + auto& pymod = std::any_cast<pybind11::module&>(m_py_module); |
| 42 | +``` |
| 43 | + |
| 44 | +Once the module is available, a function from it can be called using the `attr` method of |
| 45 | +the module. For instance, if the module had a function `run` that takes 2 arrays and a double |
| 46 | +(in this order), it can be invoked via |
| 47 | + |
| 48 | +```c++ |
| 49 | + pymod.attr("run")(f1,f2,my_double); |
| 50 | +``` |
| 51 | +where `f1` and `f2` are of type `pybind11::array` (e.g., casted from objects in `m_py_fields_host`). |
| 52 | + |
| 53 | +## Example |
| 54 | + |
| 55 | +We provided an example of how to use this feature in `eamxx_cld_fraction_process_interface.cpp`, |
| 56 | +which is a very small and simple atmosphere process. We paste here the code, which shows how |
| 57 | +to support both C++ and python implemenation in the same cpp file |
| 58 | + |
| 59 | +```c++ |
| 60 | +#ifdef EAMXX_HAS_PYTHON |
| 61 | + if (m_py_module.has_value()) { |
| 62 | + // For now, we run Python code only on CPU |
| 63 | + const auto& py_fields = m_py_fields_host.at(m_grid->name()); |
| 64 | + |
| 65 | + const auto& py_qi = std::any_cast<const py::array&>(py_fields.at("qi")); |
| 66 | + const auto& py_liq_cld_frac = std::any_cast<const py::array&>(py_fields.at("cldfrac_liq")); |
| 67 | + const auto& py_ice_cld_frac = std::any_cast<const py::array&>(py_fields.at("cldfrac_ice")); |
| 68 | + const auto& py_tot_cld_frac = std::any_cast<const py::array&>(py_fields.at("cldfrac_tot")); |
| 69 | + const auto& py_ice_cld_frac_4out = std::any_cast<const py::array&>(py_fields.at("cldfrac_ice_for_analysis")); |
| 70 | + const auto& py_tot_cld_frac_4out = std::any_cast<const py::array&>(py_fields.at("cldfrac_tot_for_analysis")); |
| 71 | + |
| 72 | + // Sync input to host |
| 73 | + liq_cld_frac.sync_to_host(); |
| 74 | + |
| 75 | + const auto& py_module = std::any_cast<const py::module&>(m_py_module); |
| 76 | + double ice_threshold = m_params.get<double>("ice_cloud_threshold"); |
| 77 | + double ice_4out_threshold = m_params.get<double>("ice_cloud_for_analysis_threshold"); |
| 78 | + py_module.attr("main")(ice_threshold,ice_4out_threshold,py_qi,py_liq_cld_frac,py_ice_cld_frac,py_tot_cld_frac,py_ice_cld_frac_4out,py_tot_cld_frac_4out); |
| 79 | + |
| 80 | + // Sync outputs to dev |
| 81 | + qi.sync_to_dev(); |
| 82 | + liq_cld_frac.sync_to_dev(); |
| 83 | + ice_cld_frac.sync_to_dev(); |
| 84 | + tot_cld_frac.sync_to_dev(); |
| 85 | + ice_cld_frac_4out.sync_to_dev(); |
| 86 | + tot_cld_frac_4out.sync_to_dev(); |
| 87 | + } else |
| 88 | +#endif |
| 89 | + { |
| 90 | + auto qi_v = qi.get_view<const Pack**>(); |
| 91 | + auto liq_cld_frac_v = liq_cld_frac.get_view<const Pack**>(); |
| 92 | + auto ice_cld_frac_v = ice_cld_frac.get_view<Pack**>(); |
| 93 | + auto tot_cld_frac_v = tot_cld_frac.get_view<Pack**>(); |
| 94 | + auto ice_cld_frac_4out_v = ice_cld_frac_4out.get_view<Pack**>(); |
| 95 | + auto tot_cld_frac_4out_v = tot_cld_frac_4out.get_view<Pack**>(); |
| 96 | + |
| 97 | + CldFractionFunc::main(m_num_cols,m_num_levs,m_icecloud_threshold,m_icecloud_for_analysis_threshold, |
| 98 | + qi_v,liq_cld_frac_v,ice_cld_frac_v,tot_cld_frac_v,ice_cld_frac_4out_v,tot_cld_frac_4out_v); |
| 99 | + } |
| 100 | +``` |
| 101 | +A few observations: |
| 102 | +
|
| 103 | + - `m_py_module.has_value()` is a good way to check if the `std::any` object is storing anything or it's empty. |
| 104 | + If empty, it means that the user did not specify the `py_module_name` input option. In this case, we interpret |
| 105 | + this as "proceed with the C++ implementation", but of course, another process may only offer a python |
| 106 | + implementation, in which case it would make sense to error out if the check fails. |
| 107 | + - the namespace alias `py = pybind11` was used in this implementation. This is a common (and sometimes |
| 108 | + recommended) practice. |
| 109 | + - when casting to pybind11 data structures, we used const references, for both inputs and outputs. The reason |
| 110 | + for using a reference is to avoid copy construction of pybind11 structures (even though they are usually |
| 111 | + lightweight). The const qualifier is not really important, as python has no corresponding concept, and the |
| 112 | + code would have been perfectly fine (and working the same way) without `const`. |
| 113 | + - when passing host arrays to python, keep in mind that EAMxx only requires that device views be kept up to |
| 114 | + date by atmosphere processes. Hence, you must take care of syncing to host all inputs before calling the |
| 115 | + python interfaces, as well as syncing to device the outputs upon return. |
| 116 | +
|
| 117 | +The python implemenetation of the CldFraction process is provided in `cld_fraction.py`, in the same folder |
| 118 | +as the process interface. |
0 commit comments