Skip to content

Add parsing for non 1-minute data to UO SRML parser #711

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
May 8, 2019
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions docs/sphinx/source/whatsnew/v0.6.2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,8 @@ Bug fixes
:py:func:`~pvlib.irradiance.klucher` and
:py:func:`~pvlib.pvsystem.calcparams_desoto`. (:issue:`698`)
* Fix :py:class:`~pvlib.forecast.NDFD` model by updating variables.
* Fix :py:func:`~pvlib.iotools.srml.format_index` to parse non
one-minute data correctly. (:issue:`709`)


Testing
Expand All @@ -67,3 +69,4 @@ Contributors
* Kevin Anderson (:ghuser:`kevinsa5`)
* :ghuser:`bentomlinson`
* Jonathan Gaffiot (:ghuser:`jgaffiot`)
* Leland Boeman (:ghuser: `lboeman`)
46 changes: 34 additions & 12 deletions pvlib/iotools/srml.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,11 +42,11 @@ def read_srml(filename):

Notes
-----
The time index is shifted back one minute to account for 2400 hours,
and to avoid time parsing errors on leap years. The returned data
values should be understood to occur during the interval from the
time of the row until the time of the next row. This is consistent
with pandas' default labeling behavior.
The time index is shifted back by one interval to account for the
daily endtime of 2400, and to avoid time parsing errors on leap
years. The returned data values should be understood to occur
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are labeled by the left endpoint of interval, and should be understood...

during the interval from the time of the row until the time of the
next row. This is consistent with pandas' default labeling behavior.

See SRML's `Archival Files`_ page for more information.

Expand Down Expand Up @@ -134,11 +134,17 @@ def format_index(df):
year = int(df.columns[1])
df_doy = df[df.columns[0]]
# Times are expressed as integers from 1-2400, we convert to 0-2359 by
# subracting one and then correcting the minutes at each former hour.
df_time = df[df.columns[1]] - 1
fifty_nines = df_time % 100 == 99
times = df_time.where(~fifty_nines, df_time - 40)

# subracting the length of one interval and then correcting the times
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment like "e.g. the first two rows of hourly data are 100, 200, so interval length = 100"

# at each former hour. interval_length is determined by taking the
# difference of the first two rows of the time column.
interval_length = int(df[df.columns[1]][:2].diff()[1])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line would be easier to read as, e.g., interval_length = df[df.columns[0]][1] - df[df.columns[0]][0]

df_time = df[df.columns[1]] - interval_length
if interval_length == 100:
# Hourly files do not require fixing the former hour timestamps.
times = df_time
else:
old_hours = df_time % 100 == (100 - interval_length)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs a few comments

times = df_time.where(~old_hours, df_time - 40)
times = times.apply(lambda x: '{:04.0f}'.format(x))
doy = df_doy.apply(lambda x: '{:03.0f}'.format(x))
dts = pd.to_datetime(str(year) + '-' + doy + '-' + times,
Expand All @@ -161,14 +167,30 @@ def read_srml_month_from_solardat(station, year, month, filetype='PO'):
month: int
Month to request data for.
filetype: string
SRML file type to gather. 'RO' and 'PO' are the
only minute resolution files.
SRML file type to gather. See notes for explanation.

Returns
-------
data: pd.DataFrame
One month of data from SRML.

Notes
-----
File types designate the time interval of a file and if it contains
raw or processed data. For instance, `RO` designates raw, one minute
data and `PO` designates processed one minute data. The availability
of file types varies between sites. Below is a table of file types
and their time intervals. See [1] for site information.

============= ============ ==================
time interval raw filetype processed filetype
============= ============ ==================
1 minute RO PO
5 minute RF PF
15 minute RQ PQ
hourly RH PH
============= ============ ==================

References
----------
[1] University of Oregon Solar Radiation Measurement Laboratory
Expand Down
32 changes: 32 additions & 0 deletions pvlib/test/test_srml.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,3 +73,35 @@ def test_read_srml_month_from_solardat():
file_data = srml.read_srml(url)
requested = srml.read_srml_month_from_solardat('EU', 2018, 1)
assert file_data.equals(requested)


@network
@pytest.mark.parametrize('station, year, month, filetype', [
('TW', 2019, 4, 'RQ'),
])
def test_15_minute_dt_index(
station, year, month, filetype):
data = srml.read_srml_month_from_solardat(station, year, month, filetype)
start = pd.Timestamp('{:04d}{:02d}01 00:00'.format(year, month))
start = start.tz_localize('Etc/GMT+8')
end = pd.Timestamp('{:04d}{:02d}30 23:45'.format(year, month))
end = end.tz_localize('Etc/GMT+8')
assert data.index[0] == start
assert data.index[-1] == end
assert (data.index[3::4].minute == 45).all()


@network
@pytest.mark.parametrize('station, year, month, filetype', [
('CD', 1986, 4, 'PH'),
])
def test_hourly_dt_index(
station, year, month, filetype):
data = srml.read_srml_month_from_solardat(station, year, month, filetype)
start = pd.Timestamp('{:04d}{:02d}01 00:00'.format(year, month))
start = start.tz_localize('Etc/GMT+8')
end = pd.Timestamp('{:04d}{:02d}30 23:00'.format(year, month))
end = end.tz_localize('Etc/GMT+8')
assert data.index[0] == start
assert data.index[-1] == end
assert (data.index.minute == 0).all()