Skip to content

Subscribed messages which go out of scope get garbage collected and cause undefined behaviour #676

Open
@dpad

Description

@dpad

Describe the bug

When creating a stand-alone message, as in the Creating Stand-Alone Messages example, if the user creates the message such that it goes out of scope and gets garbage-collected by Python, the subscribing module will still refer to the subscribed memory location.

This causes buggy undefined behaviour, including using potential garbage data as inputs to the module.

Impact
The faulty behaviour which occurs from this is extremely hard to track down and even harder to explain for users without a deep understanding of the C++/Python interaction underlying Basilisk.

For example, say a user sets up the MsisAtmosphere model with some initial space-weather parameters (in some stand-alone messages). Due to this bug, the MsisAtmosphere model never actually computes an appropriate atmospheric density, and the user has basically no idea why.

To reproduce
The example below sets an initial message to the CppModuleTemplate and checks that the output values are computed from that initial message. However, because the initial message goes out of scope and is garbage collected, its data gets overwritten with random garbage and still gets used by the CppModuleTemplate.

from Basilisk.architecture import messaging
from Basilisk.moduleTemplates import cppModuleTemplate
from Basilisk.utilities import SimulationBaseClass
from Basilisk.utilities import macros

import numpy as np

def addLocalStandaloneMessage(module):
    """
    This function creates a stand-alone message in the local function scope and
    subscribes the given module to it.

    Once this function returns, the message goes out of scope and is destroyed
    by the Python garbage collector, and its memory gets re-used.
    This causes a silent failure in the module, where it simply uses whatever
    value is in the re-used memory without realising it's not a message.
    """
    # create stand-alone input message
    msgData = messaging.CModuleTemplateMsgPayload()
    msgData.dataVector = [10., 20., 30.]
    msg = messaging.CModuleTemplateMsg().write(msgData)

    # connect to stand-alone msg
    module.dataInMsg.subscribeTo(msg)

def test_StandaloneMessageGoesOutOfScope():
    #  Create a sim module as an empty container
    scSim = SimulationBaseClass.SimBaseClass()

    #  create the simulation process
    dynProcess = scSim.CreateNewProcess("dynamicsProcess")

    # create the dynamics task and specify the integration update time
    dynProcess.addTask(scSim.CreateNewTask("dynamicsTask", macros.sec2nano(1.)))

    # create modules
    mod1 = cppModuleTemplate.CppModuleTemplate()
    mod1.ModelTag = "cppModule1"
    scSim.AddModelToTask("dynamicsTask", mod1)

    # setup message recording
    msgRec = mod1.dataOutMsg.recorder()
    scSim.AddModelToTask("dynamicsTask", msgRec)

    # Subscribe to the input message in a local scope, so that it doesn't exist here.
    addLocalStandaloneMessage(mod1)

    #  initialize Simulation:
    scSim.InitializeSimulation()

    #   configure a simulation stop time and execute the simulation run
    scSim.ConfigureStopTime(macros.sec2nano(3.0))
    scSim.ExecuteSimulation()

    # Confirm that the simulated data matches the expected
        expected = np.array([
        [11., 20., 30.],
        [12., 20., 30.],
        [13., 20., 30.],
        [14., 20., 30.]
    ])
    assert (msgRec.dataVector == expected).all()

if __name__ == "__main__":
    test_StandaloneMessageGoesOutOfScope()

The above test will fail, because the msg and msgData get destroyed after exiting the addLocalStandaloneMessage function scope. Because of this, the msgData.dataVector memory location gets re-used and overwritten with unknown data, but the mod1 module still reads from that location, so the msgRec.dataVector ends up with weird random garbage values in it.

Expected behavior

Any message that is subscribed to should not be allowed to get garbage collected. A simple fix is given below in newMessaging.ih, although I'm not sure if this is a totally appropriate fix (and I don't know if there are other places that would need to change). With this fix, the above test case passes.

diff --git a/src/architecture/messaging/newMessaging.ih b/src/architecture/messaging/newMessaging.ih
index b382c842ac..9cde823538 100644
--- a/src/architecture/messaging/newMessaging.ih
+++ b/src/architecture/messaging/newMessaging.ih
@@ -38,12 +38,18 @@ OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
             def subscribeTo(self, source):
                 if type(source) == messageType:
                     self.__subscribe_to(source)
+                    source.thisown = False
                     return
                 
                 try:
                     from Basilisk.architecture.messaging.messageType ## Payload import messageType ## _C
                     if type(source) == messageType ## _C:
                         self.__subscribe_to_C(source)
+                        source.thisown = False
                         return
                 except ImportError:
                     pass
-- 

I imagine a correct fix would need to take ownership of the subscribed message, or otherwise increment its reference count so that it doesn't get garbage collected until the subscription is removed. But I have zero experience with SWIG so I don't really know the best way to do this, nor the memory impact of keeping these objects alive.

Screenshots
If applicable, add screenshots/plots to help explain your problem.

Desktop (please complete the following information):

  • OS: Linux
  • Basilisk Version: 2.2.2
  • Python Version: 3.10.12

Additional context

Probably relevant? https://www.swig.org/Doc4.2/Python.html#Python_memory_management_member_variables

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

Status

👀 In review

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions