Dynamically Gated Similarities

This is the code for the Thesis "Multi-Person Pose Tracking using Dynamically Gated Similarities", available ` ./thesis.pdf <https://github.com/bmmtstb/dynamically-gated-similarities/tree/master/thesis.pdf>`_ .

You can found the extended Documentation on bmmtstb.github.io.

Notes

You can find a visual Pipeline on LucidChart or downloadable as PDF (main) (or see: ./docs/figures/Pipeline-DGS-Overview.pdf). The visual pipeline of the training module is also available as PDF (training) (or see: ./docs/figures/Pipeline-DGS-Training.pdf).

Folder Structure

dynamically_gated_similarities
│
└─── configs
│ Multiple configuration.yaml files for running DGS or different submodules.
│
└─── docs
│ │ Source files for the documentation via sphinx and autodoc.
│ │
│ └─── figures
│ Images for the documentation and general explanation.
│
└─── data
│ folder containing the datasets, for structure see './data/dataset.rst' for more info.
│
└─── dependencies
│ References to git submodules e.g. to torchreid and my custom AlphaPose Fork.
│
└─── dgs
│ │ The source code of the algorithm.
│ │
│ └ dgs_config.py
│ │ Some default configuration if not overridden by config.yaml
│ │ This file will soon be replaced by 'dgs_values.yaml' .
│ └ dgs_values.yaml
│ │ Some default values if not overridden by config.yaml
│ │
│ └─── models
│ │ The building blocks for the DGS algorithm. Most models should be extendable fairly
│ │ straight-forward to implement custom sub-modules.
│ │
│ └─── utils
│ File-handling, IO, classes for State and Track handling, constants,
│ functions for torch module handling visualization, and overall image handling
└─── pre_trained_models
│ storage for downloaded or custom pre-trained models
│
└─── tests
│ tests for dgs module
│
│
└─── .gitmodules - The project uses git submodules to include different libraries.
└─── .pylintrc - Settings for the pylint linter.
└─── LICENSE - MIT License
└─── pyproject.toml - Information about this project and additional build parameters.
└─── requirements.txt - Use pip to install the requirements,
│ see './docs/installation.rst' for more information.

Abbreviations and Definitions

It is expected that all joints have 2D coordinates, but extending the code to 3D should be possible with minor adjustments. If joints have three-dimensions in the given code, it is expected, that the third dimension is the joint visibility.

Images in PyTorch and torchvision expect the dimensions as: [B x C x H x W]. Matplotlib and PIL use another structure: [B x H x W x C]. In which format the image tensor is, depends on the location in the code. Most general functions in torchvision expect uint8 (byte) tensors, while the torch Modules expect a float (float32) image, to be able to compute gradients over images. Some single images might not have the first dimension [C x H x W], even though most parts of the code expect a given Batch size.

With the :class:`~.State` object, a general class for passing data between modules is created. Therefore, modules, where child-modules might have different outputs, generally use this State object instead of returning possibly non descriptive tensors. This can be seen in the :class:`~.SimilarityModule` class and its children. SimilarityModules can be quite different, the pose similarity (e.g. :class:`~.ObjectKeypointSimilarity` ) does need the key-point coordinates to compute the OKS, while the visual similarity (e.g. :class:`~.TorchreidVisualSimilarity` ) needs the image crops to compute embeddings.

Name	Description
J	Number of joint-key-points in the given model (e.g. `coco=17`)
C	Number of channels of the current image (e.g. `RGB=3`)
B	Current batch-size, can be 0 in some cases
N	Number of detections in the current frame
T	Number of tracks at the current time
L	Number of "historical" frames in a dataset. The dataset has length `L+1`
H,W	Height and Width of the current image, as image shape: `(H, W)`
h,w	Specific given height or width, as image shape: `(h, w)`
HM_H, HM_W	Size of the heatmap, equals size of the cropped resized image
E_V, E_P	Embedding size, denoted for visual or pose based shape

Examples

https://github.com/user-attachments/assets/4639d0b6-d4c4-4eeb-b792-a81a3cbfddb3

https://github.com/user-attachments/assets/164d3eb5-f779-416d-adbd-64f2bfb02816

Citing

To cite this thesis, you can use the following BibTeX entry:

@mastersthesis{tuprints29468,
       title = {Multi-Person Pose Tracking Using Dynamically Gated Similarities},
      author = {Martin Steinborn},
      school = {Technische Universit{\"a}t Darmstadt},
    language = {en},
     address = {Darmstadt},
        year = {2025},
       pages = {VII, 56 Seiten},
       month = {M{\"a}rz},
         url = {http://tuprints.ulb.tu-darmstadt.de/29468/},
         doi = {https://doi.org/10.26083/tuprints-00029468},
    keywords = {tracking, pose-tracking, mppt}
    }

To cite the code, you can use the following BibTeX entry:

@software{brizar_2025_14910547,
  author       = {Brizar},
  title        = {bmmtstb/dynamically-gated-similarities},
  month        = feb,
  year         = 2025,
  publisher    = {Zenodo},
  version      = {v0.3.0},
  doi          = {10.5281/zenodo.14910547},
  url          = {https://doi.org/10.5281/zenodo.14910547},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!