Add cnn model #813

sevmag · 2025-07-18T15:45:34Z

This is the big PR for the goal of adding CNN support to GraphNeT, enabling direct comparisons (see #771).

The CNN support consists of:

ImageDefinition to represent data as an image
CNN architectures to train
Unit tests
Example script for CNN training

An ImageDefinition consists of 2 parts:

A NodeDefinition that preprocesses the raw data and makes sure that the pulses are aggregated at the optical modules (e.g. ClusterSummaryFeatures, or PercentileClusters )
A PixelMapping, which is responsible for creating the images and mapping the nodes into the right location in the image

There are 2 CNN architectures implemented:

LCSC from Alexander Harnisch
TheosMuonEUpgoing, which is the Energy reconstruction architecture from Theo Glauch, used in IceCube

Timing of the ImageDefinition in Comparison to Other Datareps

At a low number of pulses, the bottleneck of the ImageDefinition is the initialisation of zero tensors

Timed Modules

input_feature_names = ['string', 'dom_number', 'dom_time', 'charge']
node_def = PercentileClusters(
    input_feature_names=input_feature_names,
    cluster_on = ['string', 'dom_number'],
    percentiles=np.linspace(0.2, 1.0, 5),
)
data_rep = {
    'edgeless': EdgelessGraph(
        node_definition=node_def,
        detector=IceCube86(
            replace_with_identity=input_feature_names,
        ),
        input_feature_names=input_feature_names,
    ),
    'knn_graph_8NN': KNNGraph(
        node_definition=node_def,
        detector=IceCube86(
            replace_with_identity=input_feature_names,
        ),
        input_feature_names=input_feature_names,
        nb_nearest_neighbours=8,
    ),
    'knn_graph_16NN': KNNGraph(
        node_definition=node_def,
        detector=IceCube86(
            replace_with_identity=input_feature_names,
        ),
        input_feature_names=input_feature_names,
        nb_nearest_neighbours=16,
    ),
    'knn_graph_64NN': KNNGraph(
        node_definition=node_def,
        detector=IceCube86(
            replace_with_identity=input_feature_names,
        ),
        input_feature_names=input_feature_names,
        nb_nearest_neighbours=64,
    ),
    'ic86_dnn': IC86DNNImage(
        node_definition=node_def,
        input_feature_names=input_feature_names,
        include_lower_dc=True,
        include_upper_dc=True,
    ),
}

5000-200000 Mock Pulses (log scale)

1-5000 Mock Pulses

1-500 Mock Pulses

RasmusOrsoe · 2025-08-11T11:48:38Z

examples/04_training/09_train_cnn.py

+        num_conv_layers=5,
+        conv_filters=[5, 10, 20, 40, 60],
+        kernel_size=3,
+        image_size=(8, 9, 22),  # dimensions of the example image


Would be neat to add a property to the ImageDefinition that contains the resulting image dimension. E.g. ImageDefinition.shape

RasmusOrsoe · 2025-08-11T11:50:59Z

src/graphnet/models/cnn/__init__.py

+"""CNN-specific modules, for performing the main learnable operations."""
+
+from .cnn import CNN
+from .theos_muonE_upgoing import TheosMuonEUpgoing


.theos_muonE_upgoing breaks with snake-case convention. Do we need "theos" in there? It's very jargony. Credit can be given in the associated docstring instead of the module name

RasmusOrsoe · 2025-08-11T12:07:49Z

src/graphnet/models/cnn/lcsc.py

+        """Initialize the Lightning CNN signal classifier (LCSC).
+
+        Args:
+            num_input_features (int): Number of input features.


Great with the detailed argument descriptions, but they break the existing conventions used in the library. The types should be not repeated within the docstring itself, as our documentation automatically adds them to the compiled documentation when compiled based on type hinting in code.

I.e.
num_input_features (int): Number of input features.

should be

num_input_features: Number of input features.

You can see the docstring for DynEdge here and the resulting documentation here

RasmusOrsoe · 2025-08-11T12:29:32Z

src/graphnet/models/cnn/lcsc.py

+        """
+        super().__init__(nb_inputs=num_input_features, nb_outputs=out_put_dim)
+
+        # Check input parameters


There's quite a bit of parsing in the init here. Looks like you're doing two things: checking incompatible arguments (raising errors) and parsing the acceptable arguments for subsequent use in the layer building. You could instead move this logic into one or more private methods that are used in the init function - this will improve the readability greatly. For example:

def __init__(param1: type, param2: type): """ Docstring """ # Check and parse input parameters filters, kernel_sizes, padding, .. = self._parse_conv_arguments(param1 = param1, param2=param2) pooling_size, pooling_stride, .. = self._parse_pooling_arguments(param1 = param1, param2=param2) # Set Convolution Layers self._set_conv_layers(filters = filters, kernel_sizes = kernel_sizes, ...., pooling_sizes = pooling_sizes) # Set Linear layers self.flatten = torch.nn.Flatten() self.fc1 = torch.nn.Linear(latent_dim, num_fc_neurons) self.fc2 = torch.nn.Linear(num_fc_neurons, out_put_dim)

RasmusOrsoe · 2025-08-11T12:31:50Z

src/graphnet/models/cnn/lcsc.py

+
+    def forward(self, data: Data) -> torch.Tensor:
+        """Forward pass of the LCSC."""
+        assert len(data.x) == 1, "Only Main Array image is supported for LCSC"


This assertion checks that a single image is produced by the image representation as opposed to multiple, not that a specific image representation is used, e.g. "main array".

RasmusOrsoe · 2025-08-11T12:33:59Z

src/graphnet/models/cnn/lcsc.py

+    https://github.com/AlexHarn)
+
+    Intended to be used with the IceCube 86 image containing
+    only the Main Array image.


Is it correctly understood that this method will work with any single-image representations, but that the method and default parameters were tested and selected based on IceCube simulation and a particular representation that utilizes the main array only? If so, I think adjusting this sentence would be wise.

RasmusOrsoe · 2025-08-11T12:37:46Z

src/graphnet/models/data_representation/graphs/nodes/nodes.py

    Theo Glauchs thesis (chapter 5.3):
    https://mediatum.ub.tum.de/node?id=1584755
+
+    NOTE: number of pulses per cluster is not mentioned/used in the thesis


How is this supposed to be understood? Do you mean that introducing this within the method is your own creation?

RasmusOrsoe · 2025-08-11T12:41:40Z

src/graphnet/models/cnn/theos_muonE_upgoing.py

@@ -0,0 +1,411 @@
+"""CNN used for muon energy reconstruction in IceCube.
+


src/graphnet/models/cnn/theos_muonE_upgoing.py breaks with the snake-case convention. Do we strictly need "theos" in the module name? Proper credits can be given in the module docstring.

RasmusOrsoe · 2025-08-11T12:43:52Z

src/graphnet/models/cnn/theos_muonE_upgoing.py

+
+
+class Conv3dBN(LightningModule):
+    """The Conv3dBN module from Theos CNN model."""


Theos -> Theo Glauch

Consider adding a bit more detail to inform the reader of what this module is. E.g.

"""Implementation of the Conv3dBN image convolution module from Theo Glauch."""

RasmusOrsoe · 2025-08-11T12:46:38Z

src/graphnet/models/cnn/theos_muonE_upgoing.py

+
+
+class InceptionBlock4(LightningModule):
+    """The inception_block4 module from Theos CNN model."""


Comments above apply here too.

RasmusOrsoe · 2025-08-11T12:47:50Z

src/graphnet/models/cnn/theos_muonE_upgoing.py

+
+
+class InceptionResnet(LightningModule):
+    """The inception_resnet module from Theos CNN model."""


Comments from above apply here, too.

RasmusOrsoe · 2025-08-11T12:53:02Z

src/graphnet/models/cnn/theos_muonE_upgoing.py

+        return x + self._scale * tmp
+
+
+class TheosMuonEUpgoing(CNN):


I don't think this is the official name of the method, and to my knowledge, nothing within the method restricts it to upgoing events only. I would strongly suggest finding a more accessible name for the method. I believe it's more commonly known as the "DNN" within IceCube, no? You can use the docstring to provide further details on its origin. I.e. proper credits to Theo and his use of the method.

RasmusOrsoe · 2025-08-11T12:55:50Z

src/graphnet/models/cnn/theos_muonE_upgoing.py

+class TheosMuonEUpgoing(CNN):
+    """The TheosMuonEUpgoing module."""
+
+    def __init__(self, nb_inputs: int = 15, nb_outputs: int = 16) -> None:


Is there a good reason why the hyperparameters of the method are hardcoded?

If not, let's please make them arguments, as that will greatly increase the reusability of the method.

RasmusOrsoe · 2025-08-11T12:58:52Z