Different baseline scripts log performances (and other metrics) under different keys

As per title; I don't understand how am I supposed to compare methods appropriately if each script does its own thing and uses completely separate conventions.

I use wandb;  if I run some of the value decomposition methods, I get performances under keys like the ones below:

![Image](https://github.com/user-attachments/assets/dd19a146-5eaa-4c12-be7b-4e3fa19657c8)

If I run methods like IPPO or MAPPO, I get performances under different keys like below:

![Image](https://github.com/user-attachments/assets/ee0ec052-43cd-49a9-a997-c0927ddd4a91)

I understand that there can be come metrics that only make sense for some methods and not others; but when it comes to things like performances, winrates, it seems impossible to compare methods because each is written is its own standalone script that does things its own way.

I genuinely cannot understand: how do I use this library to compare methods?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Different baseline scripts log performances (and other metrics) under different keys #142

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Different baseline scripts log performances (and other metrics) under different keys #142

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions