-
Notifications
You must be signed in to change notification settings - Fork 14
Data manager
This page provides a detailed explanation of the data manager model used in the RL-ADN framework. The data manager model is responsible for managing and preprocessing time-series data related to power systems, including functionalities for data loading, cleaning, and basic manipulations.
The GeneralPowerDataManager
class is designed to manage and preprocess time series data for power systems. It includes various attributes and methods to handle data loading, cleaning, and selecting specific time slots or days.
-
df
: DataFrame containing the original data. -
data_array
: Array representation of the data. -
active_power_cols
: List of columns related to active power. -
reactive_power_cols
: List of columns related to reactive power. -
renewable_active_power_cols
: List of columns related to renewable active power. -
renewable_reactive_power_cols
: List of columns related to renewable reactive power. -
price_col
: List of columns related to price. -
train_dates
: List of training dates. -
test_dates
: List of testing dates. -
time_interval
: Time interval of the data in minutes.
-
__init__(self, datapath)
: Initializes theGeneralPowerDataManager
object with the path to the data file. It loads the data, sets the time interval, and initializes other attributes. -
_replace_nan(self)
: Replaces NaN values in the data with interpolated values or the average of the surrounding values. -
_check_for_nan(self)
: Checks if any of the arrays contain NaN values and raises an error if they do. -
select_timeslot_data(self, year, month, day, timeslot)
: Selects data for a specific timeslot on a specific day. -
select_day_data(self, year, month, day)
: Selects data for a specific day. -
list_dates(self)
: Lists all available dates in the data. -
random_date(self)
: Randomly selects a date from the available dates in the data. -
split_data_set(self)
: Splits the data into training and testing sets based on the date. The first three weeks of each month are used for training and the last week for testing.
-
Data Loading: The
GeneralPowerDataManager
is initialized with the path to the data file. It loads the data into a DataFrame, sets the index to the 'date_time' column (or the first column if 'date_time' does not exist), and converts the index to datetime format. - Time Interval Calculation: The time interval between data points is calculated and printed.
-
NaN Replacement: The
_replace_nan
method replaces NaN values in the data with interpolated values or the average of the surrounding values. -
NaN Check: The
_check_for_nan
method checks if any NaN values remain in the data and raises an error if they do.
-
Timeslot Data Selection: The
select_timeslot_data
method selects data for a specific timeslot on a specific day. -
Day Data Selection: The
select_day_data
method selects data for a specific day.
-
Listing Dates: The
list_dates
method lists all available dates in the data. -
Random Date Selection: The
random_date
method randomly selects a date from the available dates in the data. -
Splitting Data: The
split_data_set
method splits the data into training and testing sets. The first three weeks of each month are used for training, and the last week is used for testing.
The GeneralPowerDataManager
class in the RL-ADN framework is essential for managing and preprocessing time-series data related to power systems. By understanding the class structure and workflow, users can effectively utilize and customize the data manager for their specific research needs. The class handles data loading, cleaning, selecting specific time slots or days, and splitting data