-
Notifications
You must be signed in to change notification settings - Fork 1.1k
add CRN file parser #666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
add CRN file parser #666
Changes from 8 commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
a97bdcc
initial implementation
wholmgren bc5ed41
add crn file reader
wholmgren b41ff5c
add unused cols
wholmgren b4032a8
add to api.rst
wholmgren 4f85a8d
better dtype handling
wholmgren b962780
pandas to 0.16. remove py 3.3 classifier
wholmgren 9e63811
maybe avoid issue with tz dtype specific to travis
wholmgren be48841
use fixed width parsing
wholmgren 70ee713
style
wholmgren 6e85833
unused import
wholmgren 9f9d2fb
more style and doc issues
wholmgren File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
53131 20190101 1610 20190101 0910 3 -111.17 32.24 -9999.0 0.0 296 0 4.4 C 0 90 0 -99.000 -9999.0 24 0 0.78 0 | ||
53131 20190101 1615 20190101 0915 3 -111.17 32.24 3.3 0.0 183 0 4.0 C 0 87 0 -99.000 -9999.0 1182 0 0.36 0 | ||
53131 20190101 1620 20190101 0920 3 -111.17 32.24 3.5 0.0 340 0 4.3 C 0 83 0 -99.000 -9999.0 1183 0 0.53 0 | ||
53131 20190101 1625 20190101 0925 3 -111.17 32.24 4.0 0.0 393 0 4.8 C 0 81 0 -99.000 -9999.0 1223 0 0.64 0 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,101 @@ | ||
"""Functions to read data from the US Climate Reference Network (CRN). | ||
""" | ||
|
||
import pandas as pd | ||
import numpy as np | ||
from numpy import dtype | ||
|
||
|
||
HEADERS = 'WBANNO UTC_DATE UTC_TIME LST_DATE LST_TIME CRX_VN LONGITUDE LATITUDE AIR_TEMPERATURE PRECIPITATION SOLAR_RADIATION SR_FLAG SURFACE_TEMPERATURE ST_TYPE ST_FLAG RELATIVE_HUMIDITY RH_FLAG SOIL_MOISTURE_5 SOIL_TEMPERATURE_5 WETNESS WET_FLAG WIND_1_5 WIND_FLAG' # noqa: E501 | ||
|
||
VARIABLE_MAP = { | ||
'LONGITUDE': 'longitude', | ||
'LATITUDE': 'latitude', | ||
'AIR_TEMPERATURE': 'temp_air', | ||
'SOLAR_RADIATION': 'ghi', | ||
'SR_FLAG': 'ghi_flag', | ||
'RELATIVE_HUMIDITY': 'relative_humidity', | ||
'RH_FLAG': 'relative_humidity_flag', | ||
'WIND_1_5': 'wind_speed', | ||
'WIND_FLAG': 'wind_speed_flag' | ||
} | ||
|
||
# as specified in CRN README.txt file. excludes 1 space between columns | ||
WIDTHS = [5, 8, 4, 8, 4, 6, 7, 7, 7, 7, 6, 1, 7, 1, 1, 5, 1, 7, 7, 5, 1, 6, 1] | ||
# add 1 to make fields contiguous (required by pandas.read_fwf) | ||
WIDTHS = [w + 1 for w in WIDTHS] | ||
# no space after last column | ||
WIDTHS[-1] -= 1 | ||
|
||
# specify dtypes for potentially problematic values | ||
DTYPES = [ | ||
dtype('int64'), dtype('int64'), dtype('int64'), dtype('int64'), | ||
dtype('int64'), dtype('int64'), dtype('float64'), dtype('float64'), | ||
dtype('float64'), dtype('float64'), dtype('float64'), | ||
dtype('int64'), dtype('float64'), dtype('O'), dtype('int64'), | ||
dtype('float64'), dtype('int64'), dtype('float64'), | ||
dtype('float64'), dtype('int64'), dtype('int64'), dtype('float64'), | ||
dtype('int64') | ||
] | ||
wholmgren marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
def read_crn(filename): | ||
""" | ||
Read NOAA USCRN [1] fixed-width file into pandas dataframe. | ||
|
||
Parameters | ||
---------- | ||
filename: str | ||
filepath or url to read for the tsv file. | ||
|
||
Returns | ||
------- | ||
data: Dataframe | ||
A dataframe with datetime index and all of the variables listed | ||
in the `VARIABLE_MAP` dict inside of the map_columns function, | ||
along with their associated quality control flags. | ||
|
||
Notes | ||
----- | ||
CRN files contain 5 minute averages labeled by the interval ending | ||
time. Here, missing data is flagged as NaN, rather than the lowest | ||
possible integer for a field (e.g. -999 or -99). | ||
Air temperature in deg C. | ||
Wind speed in m/s at a height of 1.5 m above ground level. | ||
|
||
References | ||
---------- | ||
[1] U.S. Climate Reference Network | ||
`https://www.ncdc.noaa.gov/crn/qcdatasets.html <https://www.ncdc.noaa.gov/crn/qcdatasets.html>`_ | ||
[2] Diamond, H. J. et. al., 2013: U.S. Climate Reference Network after | ||
one decade of operations: status and assessment. Bull. Amer. | ||
Meteor. Soc., 94, 489-498. :doi:`10.1175/BAMS-D-12-00170.1` | ||
""" | ||
|
||
# read in data | ||
data = pd.read_fwf(filename, header=None, names=HEADERS.split(' '), | ||
widths=WIDTHS) | ||
# loop here because dtype kwarg not supported in read_fwf until 0.20 | ||
for (col, _dtype) in zip(data.columns, DTYPES): | ||
data[col] = data[col].astype(_dtype) | ||
|
||
# set index | ||
# UTC_TIME does not have leading 0s, so must zfill(4) to comply | ||
# with %H%M format | ||
dts = data[['UTC_DATE', 'UTC_TIME']].astype(str) | ||
dtindex = pd.to_datetime(dts['UTC_DATE'] + dts['UTC_TIME'].str.zfill(4), | ||
format='%Y%m%d%H%M', utc=True) | ||
data = data.set_index(dtindex) | ||
try: | ||
# to_datetime(utc=True) does not work in older versions of pandas | ||
data = data.tz_localize('UTC') | ||
except TypeError: | ||
pass | ||
|
||
# set nans | ||
for val in [-99, -999, -9999]: | ||
data = data.where(data != val, np.nan) | ||
|
||
data = data.rename(columns=VARIABLE_MAP) | ||
|
||
return data |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
import inspect | ||
import os | ||
|
||
import pandas as pd | ||
from pandas.util.testing import assert_frame_equal | ||
import numpy as np | ||
from numpy import dtype, nan | ||
|
||
from pvlib.iotools import crn | ||
|
||
|
||
test_dir = os.path.dirname( | ||
os.path.abspath(inspect.getfile(inspect.currentframe()))) | ||
testfile = os.path.join(test_dir, | ||
'../data/CRNS0101-05-2019-AZ_Tucson_11_W.txt') | ||
|
||
|
||
def test_read_crn(): | ||
columns = [ | ||
'WBANNO', 'UTC_DATE', 'UTC_TIME', 'LST_DATE', 'LST_TIME', 'CRX_VN', | ||
'longitude', 'latitude', 'temp_air', 'PRECIPITATION', 'ghi', 'ghi_flag', | ||
'SURFACE_TEMPERATURE', 'ST_TYPE', 'ST_FLAG', 'relative_humidity', | ||
'relative_humidity_flag', 'SOIL_MOISTURE_5', 'SOIL_TEMPERATURE_5', | ||
'WETNESS', 'WET_FLAG', 'wind_speed', 'wind_speed_flag'] | ||
index = pd.DatetimeIndex(['2019-01-01 16:10:00', | ||
'2019-01-01 16:15:00', | ||
'2019-01-01 16:20:00', | ||
'2019-01-01 16:25:00'], | ||
freq=None).tz_localize('UTC') | ||
values = np.array([ | ||
[53131, 20190101, 1610, 20190101, 910, 3, -111.17, 32.24, nan, | ||
0.0, 296.0, 0, 4.4, 'C', 0, 90.0, 0, nan, nan, 24, 0, 0.78, 0], | ||
[53131, 20190101, 1615, 20190101, 915, 3, -111.17, 32.24, 3.3, | ||
0.0, 183.0, 0, 4.0, 'C', 0, 87.0, 0, nan, nan, 1182, 0, 0.36, 0], | ||
[53131, 20190101, 1620, 20190101, 920, 3, -111.17, 32.24, 3.5, | ||
0.0, 340.0, 0, 4.3, 'C', 0, 83.0, 0, nan, nan, 1183, 0, 0.53, 0], | ||
[53131, 20190101, 1625, 20190101, 925, 3, -111.17, 32.24, 4.0, | ||
0.0, 393.0, 0, 4.8, 'C', 0, 81.0, 0, nan, nan, 1223, 0, 0.64, 0]]) | ||
dtypes = [ | ||
dtype('int64'), dtype('int64'), dtype('int64'), dtype('int64'), | ||
dtype('int64'), dtype('int64'), dtype('float64'), dtype('float64'), | ||
dtype('float64'), dtype('float64'), dtype('float64'), | ||
dtype('int64'), dtype('float64'), dtype('O'), dtype('int64'), | ||
dtype('float64'), dtype('int64'), dtype('float64'), | ||
dtype('float64'), dtype('int64'), dtype('int64'), dtype('float64'), | ||
dtype('int64')] | ||
expected = pd.DataFrame(values, columns=columns, index=index) | ||
for (col, _dtype) in zip(expected.columns, dtypes): | ||
expected[col] = expected[col].astype(_dtype) | ||
out = crn.read_crn(testfile) | ||
assert_frame_equal(out, expected) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.