Skip to content

add CRN file parser #666

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Feb 25, 2019
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ install:
pip uninstall numpy --yes;
pip uninstall pandas --yes;
pip install --no-cache-dir numpy==1.10.1;
pip install --no-cache-dir pandas==0.15.0;
pip install --no-cache-dir pandas==0.16.0;
fi
- conda list
- echo $PATH
Expand Down
1 change: 1 addition & 0 deletions docs/sphinx/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -337,6 +337,7 @@ relevant to solar energy modeling.
iotools.read_midc_raw_data_from_nrel
iotools.read_ecmwf_macc
iotools.get_ecmwf_macc
iotools.read_crn

A :py:class:`~pvlib.location.Location` object may be created from metadata
in some files.
Expand Down
4 changes: 3 additions & 1 deletion docs/sphinx/source/whatsnew/v0.6.2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,16 @@ release.
**Python 2.7 support will end on June 1, 2019**. Releases made after this
date will require Python 3. (:issue:`501`)

**Minimum pandas requirement bumped 0.15.0=>0.16.0**


API Changes
~~~~~~~~~~~


Enhancements
~~~~~~~~~~~~

* Add US CRN data reader to `pvlib.iotools`.

Bug fixes
~~~~~~~~~
Expand Down
4 changes: 4 additions & 0 deletions pvlib/data/CRNS0101-05-2019-AZ_Tucson_11_W.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
53131 20190101 1610 20190101 0910 3 -111.17 32.24 -9999.0 0.0 296 0 4.4 C 0 90 0 -99.000 -9999.0 24 0 0.78 0
53131 20190101 1615 20190101 0915 3 -111.17 32.24 3.3 0.0 183 0 4.0 C 0 87 0 -99.000 -9999.0 1182 0 0.36 0
53131 20190101 1620 20190101 0920 3 -111.17 32.24 3.5 0.0 340 0 4.3 C 0 83 0 -99.000 -9999.0 1183 0 0.53 0
53131 20190101 1625 20190101 0925 3 -111.17 32.24 4.0 0.0 393 0 4.8 C 0 81 0 -99.000 -9999.0 1223 0 0.64 0
1 change: 1 addition & 0 deletions pvlib/iotools/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@
from pvlib.iotools.midc import read_midc_raw_data_from_nrel # noqa: F401
from pvlib.iotools.ecmwf_macc import read_ecmwf_macc # noqa: F401
from pvlib.iotools.ecmwf_macc import get_ecmwf_macc # noqa: F401
from pvlib.iotools.crn import read_crn # noqa: F401
101 changes: 101 additions & 0 deletions pvlib/iotools/crn.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
"""Functions to read data from the US Climate Reference Network (CRN).
"""

import pandas as pd
import numpy as np
from numpy import dtype


HEADERS = 'WBANNO UTC_DATE UTC_TIME LST_DATE LST_TIME CRX_VN LONGITUDE LATITUDE AIR_TEMPERATURE PRECIPITATION SOLAR_RADIATION SR_FLAG SURFACE_TEMPERATURE ST_TYPE ST_FLAG RELATIVE_HUMIDITY RH_FLAG SOIL_MOISTURE_5 SOIL_TEMPERATURE_5 WETNESS WET_FLAG WIND_1_5 WIND_FLAG' # noqa: E501

VARIABLE_MAP = {
'LONGITUDE': 'longitude',
'LATITUDE': 'latitude',
'AIR_TEMPERATURE': 'temp_air',
'SOLAR_RADIATION': 'ghi',
'SR_FLAG': 'ghi_flag',
'RELATIVE_HUMIDITY': 'relative_humidity',
'RH_FLAG': 'relative_humidity_flag',
'WIND_1_5': 'wind_speed',
'WIND_FLAG': 'wind_speed_flag'
}

# as specified in CRN README.txt file. excludes 1 space between columns
WIDTHS = [5, 8, 4, 8, 4, 6, 7, 7, 7, 7, 6, 1, 7, 1, 1, 5, 1, 7, 7, 5, 1, 6, 1]
# add 1 to make fields contiguous (required by pandas.read_fwf)
WIDTHS = [w + 1 for w in WIDTHS]
# no space after last column
WIDTHS[-1] -= 1

# specify dtypes for potentially problematic values
DTYPES = [
dtype('int64'), dtype('int64'), dtype('int64'), dtype('int64'),
dtype('int64'), dtype('int64'), dtype('float64'), dtype('float64'),
dtype('float64'), dtype('float64'), dtype('float64'),
dtype('int64'), dtype('float64'), dtype('O'), dtype('int64'),
dtype('float64'), dtype('int64'), dtype('float64'),
dtype('float64'), dtype('int64'), dtype('int64'), dtype('float64'),
dtype('int64')
]


def read_crn(filename):
"""
Read NOAA USCRN [1] fixed-width file into pandas dataframe.

Parameters
----------
filename: str
filepath or url to read for the tsv file.

Returns
-------
data: Dataframe
A dataframe with datetime index and all of the variables listed
in the `VARIABLE_MAP` dict inside of the map_columns function,
along with their associated quality control flags.

Notes
-----
CRN files contain 5 minute averages labeled by the interval ending
time. Here, missing data is flagged as NaN, rather than the lowest
possible integer for a field (e.g. -999 or -99).
Air temperature in deg C.
Wind speed in m/s at a height of 1.5 m above ground level.

References
----------
[1] U.S. Climate Reference Network
`https://www.ncdc.noaa.gov/crn/qcdatasets.html <https://www.ncdc.noaa.gov/crn/qcdatasets.html>`_
[2] Diamond, H. J. et. al., 2013: U.S. Climate Reference Network after
one decade of operations: status and assessment. Bull. Amer.
Meteor. Soc., 94, 489-498. :doi:`10.1175/BAMS-D-12-00170.1`
"""

# read in data
data = pd.read_fwf(filename, header=None, names=HEADERS.split(' '),
widths=WIDTHS)
# loop here because dtype kwarg not supported in read_fwf until 0.20
for (col, _dtype) in zip(data.columns, DTYPES):
data[col] = data[col].astype(_dtype)

# set index
# UTC_TIME does not have leading 0s, so must zfill(4) to comply
# with %H%M format
dts = data[['UTC_DATE', 'UTC_TIME']].astype(str)
dtindex = pd.to_datetime(dts['UTC_DATE'] + dts['UTC_TIME'].str.zfill(4),
format='%Y%m%d%H%M', utc=True)
data = data.set_index(dtindex)
try:
# to_datetime(utc=True) does not work in older versions of pandas
data = data.tz_localize('UTC')
except TypeError:
pass

# set nans
for val in [-99, -999, -9999]:
data = data.where(data != val, np.nan)

data = data.rename(columns=VARIABLE_MAP)

return data
51 changes: 51 additions & 0 deletions pvlib/test/test_crn.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
import inspect
import os

import pandas as pd
from pandas.util.testing import assert_frame_equal
import numpy as np
from numpy import dtype, nan

from pvlib.iotools import crn


test_dir = os.path.dirname(
os.path.abspath(inspect.getfile(inspect.currentframe())))
testfile = os.path.join(test_dir,
'../data/CRNS0101-05-2019-AZ_Tucson_11_W.txt')


def test_read_crn():
columns = [
'WBANNO', 'UTC_DATE', 'UTC_TIME', 'LST_DATE', 'LST_TIME', 'CRX_VN',
'longitude', 'latitude', 'temp_air', 'PRECIPITATION', 'ghi', 'ghi_flag',
'SURFACE_TEMPERATURE', 'ST_TYPE', 'ST_FLAG', 'relative_humidity',
'relative_humidity_flag', 'SOIL_MOISTURE_5', 'SOIL_TEMPERATURE_5',
'WETNESS', 'WET_FLAG', 'wind_speed', 'wind_speed_flag']
index = pd.DatetimeIndex(['2019-01-01 16:10:00',
'2019-01-01 16:15:00',
'2019-01-01 16:20:00',
'2019-01-01 16:25:00'],
freq=None).tz_localize('UTC')
values = np.array([
[53131, 20190101, 1610, 20190101, 910, 3, -111.17, 32.24, nan,
0.0, 296.0, 0, 4.4, 'C', 0, 90.0, 0, nan, nan, 24, 0, 0.78, 0],
[53131, 20190101, 1615, 20190101, 915, 3, -111.17, 32.24, 3.3,
0.0, 183.0, 0, 4.0, 'C', 0, 87.0, 0, nan, nan, 1182, 0, 0.36, 0],
[53131, 20190101, 1620, 20190101, 920, 3, -111.17, 32.24, 3.5,
0.0, 340.0, 0, 4.3, 'C', 0, 83.0, 0, nan, nan, 1183, 0, 0.53, 0],
[53131, 20190101, 1625, 20190101, 925, 3, -111.17, 32.24, 4.0,
0.0, 393.0, 0, 4.8, 'C', 0, 81.0, 0, nan, nan, 1223, 0, 0.64, 0]])
dtypes = [
dtype('int64'), dtype('int64'), dtype('int64'), dtype('int64'),
dtype('int64'), dtype('int64'), dtype('float64'), dtype('float64'),
dtype('float64'), dtype('float64'), dtype('float64'),
dtype('int64'), dtype('float64'), dtype('O'), dtype('int64'),
dtype('float64'), dtype('int64'), dtype('float64'),
dtype('float64'), dtype('int64'), dtype('int64'), dtype('float64'),
dtype('int64')]
expected = pd.DataFrame(values, columns=columns, index=index)
for (col, _dtype) in zip(expected.columns, dtypes):
expected[col] = expected[col].astype(_dtype)
out = crn.read_crn(testfile)
assert_frame_equal(out, expected)
3 changes: 1 addition & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@
URL = 'https://github.com/pvlib/pvlib-python'

INSTALL_REQUIRES = ['numpy >= 1.10.1',
'pandas >= 0.15.0',
'pandas >= 0.16.0',
'pytz',
'six',
]
Expand All @@ -61,7 +61,6 @@
'Programming Language :: Python :: 2',
'Programming Language :: Python :: 2.7',
'Programming Language :: Python :: 3',
'Programming Language :: Python :: 3.3',
'Programming Language :: Python :: 3.4',
'Programming Language :: Python :: 3.5',
'Programming Language :: Python :: 3.6',
Expand Down