Skip to content

Data manager

Hou Shengren edited this page Aug 5, 2024 · 1 revision

Data Manager Model

This page provides a detailed explanation of the data manager model used in the RL-ADN framework. The data manager model is responsible for managing and preprocessing time-series data related to power systems, including functionalities for data loading, cleaning, and basic manipulations.

Key Components

GeneralPowerDataManager Class

The GeneralPowerDataManager class is designed to manage and preprocess time series data for power systems. It includes various attributes and methods to handle data loading, cleaning, and selecting specific time slots or days.

Attributes

  • df: DataFrame containing the original data.
  • data_array: Array representation of the data.
  • active_power_cols: List of columns related to active power.
  • reactive_power_cols: List of columns related to reactive power.
  • renewable_active_power_cols: List of columns related to renewable active power.
  • renewable_reactive_power_cols: List of columns related to renewable reactive power.
  • price_col: List of columns related to price.
  • train_dates: List of training dates.
  • test_dates: List of testing dates.
  • time_interval: Time interval of the data in minutes.

Methods

  • __init__(self, datapath): Initializes the GeneralPowerDataManager object with the path to the data file. It loads the data, sets the time interval, and initializes other attributes.
  • _replace_nan(self): Replaces NaN values in the data with interpolated values or the average of the surrounding values.
  • _check_for_nan(self): Checks if any of the arrays contain NaN values and raises an error if they do.
  • select_timeslot_data(self, year, month, day, timeslot): Selects data for a specific timeslot on a specific day.
  • select_day_data(self, year, month, day): Selects data for a specific day.
  • list_dates(self): Lists all available dates in the data.
  • random_date(self): Randomly selects a date from the available dates in the data.
  • split_data_set(self): Splits the data into training and testing sets based on the date. The first three weeks of each month are used for training and the last week for testing.

Workflow

Initialization

  1. Data Loading: The GeneralPowerDataManager is initialized with the path to the data file. It loads the data into a DataFrame, sets the index to the 'date_time' column (or the first column if 'date_time' does not exist), and converts the index to datetime format.
  2. Time Interval Calculation: The time interval between data points is calculated and printed.

Data Cleaning

  1. NaN Replacement: The _replace_nan method replaces NaN values in the data with interpolated values or the average of the surrounding values.
  2. NaN Check: The _check_for_nan method checks if any NaN values remain in the data and raises an error if they do.

Data Selection

  1. Timeslot Data Selection: The select_timeslot_data method selects data for a specific timeslot on a specific day.
  2. Day Data Selection: The select_day_data method selects data for a specific day.

Date Management

  1. Listing Dates: The list_dates method lists all available dates in the data.
  2. Random Date Selection: The random_date method randomly selects a date from the available dates in the data.
  3. Splitting Data: The split_data_set method splits the data into training and testing sets. The first three weeks of each month are used for training, and the last week is used for testing.

Summary

The GeneralPowerDataManager class in the RL-ADN framework is essential for managing and preprocessing time-series data related to power systems. By understanding the class structure and workflow, users can effectively utilize and customize the data manager for their specific research needs. The class handles data loading, cleaning, selecting specific time slots or days, and splitting data

Clone this wiki locally