The hyperparameters optimized by optuna cannot reproduce the results when run separately. #1890

ethonchen · 2025-02-18T11:38:34Z

## 🐛 Bug Description

Does qlib's execution process have caching? The hyperparameters optimized by optuna cannot reproduce the results when run separately.

After excluding environmental influences, multi-threading effects, seed effects, and other factors, I found that in the same environment, running multiple times in succession causes the later executions to be affected by the previous ones.

To Reproduce

Steps to reproduce the behavior:
The simplified reproduction process is as follows:

Run two different models consecutively and print the results of Model 1 and Model 2.
--------------- both.py -----------------

import qlib
from qlib.constant import REG_CN
from qlib.tests import GetData
from qlib.utils import init_instance_by_config

if __name__ == '__main__':
    provider_uri = "~/.qlib/qlib_data/cn_data"
    GetData().qlib_data(target_dir=provider_uri, region=REG_CN, exists_skip=True)
    qlib.init(provider_uri=provider_uri, region="cn")

    dataset_str = {
        "class": "DatasetH",
        "kwargs": {
            "handler": {
                "class": "Alpha360",
                "kwargs": {
                    "end_time": "2024-09-30",
                    "fit_end_time": "2021-12-31",
                    "fit_start_time": "2019-01-01",
                    "infer_processors": [
                        {
                            "class": "RobustZScoreNorm",
                            "kwargs": {
                                "clip_outlier": True,
                                "fields_group": "feature",
                                "fit_end_time": "2021-12-31",
                                "fit_start_time": "2019-01-01",
                            },
                        },
                        {
                            "class": "Fillna",
                            "kwargs": {
                                "fields_group": "feature"
                            }
                        }
                    ],
                    "instruments": "csi300",
                    "label": ["Ref($close, -2) / Ref($close, -1) - 1"],
                    "learn_processors": [
                        {"class": "DropnaLabel"},
                        {
                            "class": "CSRankNorm",
                            "kwargs": {
                                "fields_group": "label"
                            }
                        }
                    ],
                    "start_time": "2019-01-01",
                },
                "module_path": "qlib.contrib.data.handler",
            },
            "segments": {
                "test": ["2022-01-01", "2023-12-31"],
                "train": ["2019-01-01", "2021-12-31"],
                "valid": ["2023-01-01", "2024-09-30"],
            },
        },
        "module_path": "qlib.data.dataset",
    }
    dataset = init_instance_by_config(dataset_str)
    # base task model
    task = {
        "model": {
            "class": "LocalformerModel",
            "module_path": "qlib.contrib.model.pytorch_localformer",
            "kwargs": {
                "d_feat": 6,  
                "d_model": 64,  
                "batch_size": 512,  
                "nhead": 4,  
                "num_layers": 3,  
                "dropout": 0.4,  
                "n_epochs": 100,  
                "lr": 0.1,  
                "early_stop": 10,  
                "loss": "mse",  
                "optimizer": "adam",  
                "reg": 0.001,  
                "n_jobs": 1,  
                "GPU": 0,  
                "seed": 618,  
            }
        }
    }

    # model param 1
    task["model"]["kwargs"]["lr"] = 0.005
    evals_result = dict()
    model = init_instance_by_config(task["model"])
    model.fit(dataset, evals_result=evals_result)
    print("model 1 : ", max(evals_result["valid"]))

    # model param 2
    task["model"]["kwargs"]["lr"] = 0.3
    evals_result = dict()
    model = init_instance_by_config(task["model"])
    model.fit(dataset, evals_result=evals_result)
    print("model 2 : ", max(evals_result["valid"]))

Run the code for Model 1 separately.
--------------- model1.py -----------------


import qlib
from qlib.constant import REG_CN
from qlib.tests import GetData
from qlib.utils import init_instance_by_config

if __name__ == '__main__':
    provider_uri = "~/.qlib/qlib_data/cn_data"
    GetData().qlib_data(target_dir=provider_uri, region=REG_CN, exists_skip=True)
    qlib.init(provider_uri=provider_uri, region="cn")


    dataset_str = {
        "class": "DatasetH",
        "kwargs": {
            "handler": {
                "class": "Alpha360",
                "kwargs": {
                    "end_time": "2024-09-30",
                    "fit_end_time": "2021-12-31",
                    "fit_start_time": "2019-01-01",
                    "infer_processors": [
                        {
                            "class": "RobustZScoreNorm",
                            "kwargs": {
                                "clip_outlier": True,
                                "fields_group": "feature",
                                "fit_end_time": "2021-12-31",
                                "fit_start_time": "2019-01-01",
                            },
                        },
                        {
                            "class": "Fillna",
                            "kwargs": {
                                "fields_group": "feature"
                            }
                        }
                    ],
                    "instruments": "csi300",
                    "label": ["Ref($close, -2) / Ref($close, -1) - 1"],
                    "learn_processors": [
                        {"class": "DropnaLabel"},
                        {
                            "class": "CSRankNorm",
                            "kwargs": {
                                "fields_group": "label"
                            }
                        }
                    ],
                    "start_time": "2019-01-01",
                },
                "module_path": "qlib.contrib.data.handler",
            },
            "segments": {
                "test": ["2022-01-01", "2023-12-31"],
                "train": ["2019-01-01", "2021-12-31"],
                "valid": ["2023-01-01", "2024-09-30"],
            },
        },
        "module_path": "qlib.data.dataset",
    }
    dataset = init_instance_by_config(dataset_str)
    # base task model
    task = {
        "model": {
            "class": "LocalformerModel",
            "module_path": "qlib.contrib.model.pytorch_localformer",
            "kwargs": {
                "d_feat": 6,
                "d_model": 64,
                "batch_size": 512,
                "nhead": 4,
                "num_layers": 3,
                "dropout": 0.4,
                "n_epochs": 100,
                "lr": 0.1,
                "early_stop": 10,
                "loss": "mse",
                "optimizer": "adam",
                "reg": 0.001,
                "n_jobs": 1,
                "GPU": 0,
                "seed": 618,
            }
        }
    }

    # model param 1
    task["model"]["kwargs"]["lr"] = 0.005
    evals_result = dict()
    model = init_instance_by_config(task["model"])
    model.fit(dataset, evals_result=evals_result)
    print("model 1 : ", max(evals_result["valid"]))

Run the code for Model 2 separately.
--------------- model2.py -----------------


import qlib
from qlib.constant import REG_CN
from qlib.tests import GetData
from qlib.utils import init_instance_by_config

if __name__ == '__main__':
    provider_uri = "~/.qlib/qlib_data/cn_data"
    GetData().qlib_data(target_dir=provider_uri, region=REG_CN, exists_skip=True)
    qlib.init(provider_uri=provider_uri, region="cn")

    dataset_str = {
        "class": "DatasetH",
        "kwargs": {
            "handler": {
                "class": "Alpha360",
                "kwargs": {
                    "end_time": "2024-09-30",
                    "fit_end_time": "2021-12-31",
                    "fit_start_time": "2019-01-01",
                    "infer_processors": [
                        {
                            "class": "RobustZScoreNorm",
                            "kwargs": {
                                "clip_outlier": True,
                                "fields_group": "feature",
                                "fit_end_time": "2021-12-31",
                                "fit_start_time": "2019-01-01",
                            },
                        },
                        {
                            "class": "Fillna",
                            "kwargs": {
                                "fields_group": "feature"
                            }
                        }
                    ],
                    "instruments": "csi300",
                    "label": ["Ref($close, -2) / Ref($close, -1) - 1"],
                    "learn_processors": [
                        {"class": "DropnaLabel"},
                        {
                            "class": "CSRankNorm",
                            "kwargs": {
                                "fields_group": "label"
                            }
                        }
                    ],
                    "start_time": "2019-01-01",
                },
                "module_path": "qlib.contrib.data.handler",
            },
            "segments": {
                "test": ["2022-01-01", "2023-12-31"],
                "train": ["2019-01-01", "2021-12-31"],
                "valid": ["2023-01-01", "2024-09-30"],
            },
        },
        "module_path": "qlib.data.dataset",
    }
    dataset = init_instance_by_config(dataset_str)
    # base task model
    task = {
        "model": {
            "class": "LocalformerModel",
            "module_path": "qlib.contrib.model.pytorch_localformer",
            "kwargs": {
                "d_feat": 6,
                "d_model": 64,
                "batch_size": 512,
                "nhead": 4,
                "num_layers": 3,
                "dropout": 0.4,
                "n_epochs": 100,
                "lr": 0.1,
                "early_stop": 10,
                "loss": "mse",
                "optimizer": "adam",
                "reg": 0.001,
                "n_jobs": 1,
                "GPU": 0,
                "seed": 618,
            }
        }
    }

    # model param 2
    task["model"]["kwargs"]["lr"] = 0.3
    evals_result = dict()
    model = init_instance_by_config(task["model"])
    model.fit(dataset, evals_result=evals_result)
    print("model 2 : ", max(evals_result["valid"]))

Expected Behavior

The return result of running python both.py is as follows:
----- python both.py --------
model 1 : -0.7482357621192932
model 2 : -0.7483096122741699

The return result of running python model1.py is as follows:
----- python model1.py --------
model 1 : -0.7482357621192932 # the same as both.py

The return result of running python model2.py is as follows:
----- python model2.py --------
model 2 : -0.7509312033653259 # The results are different from those of both.py even when running with the same parameters.

Screenshot

With the same parameters, the results show that the output of Model 1 matches that of both.py, while the output of Model 2 differs from both.py. The only difference is that in both.py, Model 2 is executed immediately after Model 1. This raises suspicion that there might be some caching affecting the execution results.

Environment

Note: User could run cd scripts && python collect_info.py all under project directory to get system information
and paste them here directly.
Linux
x86_64
Linux-5.15.0-112-generic-x86_64-with-glibc2.17
#122-Ubuntu SMP Thu May 23 07:48:21 UTC 2024

Python version: 3.8.18 (default, Sep 11 2023, 13:40:15) [GCC 11.2.0]

Qlib version: 0.9.6
numpy==1.23.5
pandas==1.5.3
scipy==1.10.1
requests==2.31.0

Additional Notes

The text was updated successfully, but these errors were encountered:

ethonchen added the bug Something isn't working label Feb 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The hyperparameters optimized by optuna cannot reproduce the results when run separately. #1890

The hyperparameters optimized by optuna cannot reproduce the results when run separately. #1890

ethonchen commented Feb 18, 2025 •

edited

Loading

The hyperparameters optimized by optuna cannot reproduce the results when run separately. #1890

The hyperparameters optimized by optuna cannot reproduce the results when run separately. #1890

Comments

ethonchen commented Feb 18, 2025 • edited Loading

To Reproduce

Expected Behavior

Screenshot

Environment

Additional Notes

ethonchen commented Feb 18, 2025 •

edited

Loading