-
Notifications
You must be signed in to change notification settings - Fork 438
Description
What went wrong?
Metpy version: 1.7.1
Python: 3.13.5 (conda environment)
Summary: using the Iowa State archive of WPC surface front text files, we encountered an error related to invalid characters in the lat/lon fields. While this would be trivial to fix for one day, these invalid characters are peppered throughout the archive. If you use a year or more of data, it becomes difficult to track down the issues.
Steps to download example data that produces the error:
- Go to: https://mesonet.agron.iastate.edu/wx/afos/p.php?pil=CODSUS
- Download Text
- Read in the file using parse_wpc_surface_bulletin
When investigating the file, you see that " is inserted in the coded coordinate:
COLD WK 44135 42138 411"45 37152 35155 33161
The invalid character in some cases appears to be "fixed", where you get an "updated" text product for the same forecast and valid time with the invalid character removed:
https://mesonet.agron.iastate.edu/wx/afos/p.php?pil=CODSUS&e=200005260728
It does appear as though a malformed front label (TROF, COLD, etc.) will result in the parser ignoring that line. It might be good to let the user know this is happening. For example, this text file has a label TRmOF
that does not wind up in the DataFrame:
https://mesonet.agron.iastate.edu/wx/afos/p.php?pil=CODSUS&e=200003290728
Possible fixes (maybe a utility function clean_wpc_surface_bulletin
):
- replace lowercase letters with ""
- replace punctuation with ""
Example function:
import string
def clean_wpc_surface_bulletin(input_path, output_path=None):
"""Remove common invalid characters from WPC surface bulletin.
Specifically, this function will remove any lowercase letters
and punctuation. This function could help fix exceptions when
running parse_wpc_surface_bulletin and keep some cases where
an invalid character is handled by removing the entire line
from the resulting DataFrame.
Parameters
----------
bulletin : file-like object
file-like object that will be read from directly.
output_path : str
location at which to write out the cleaned version of the file.
If None, the resulting text will be returned.
Returns
-------
cleaned_text: str
If output_path is None, this will return the cleaned text.
Otherwise, return None.
"""
remove_chars = string.ascii_lowercase + string.punctuation
with open(input_path, "r", encoding="utf-8") as f:
text = f.read()
cleaned_text = "".join(ch for ch in text if ch not in remove_chars)
if output_path:
with open(output_path, "w", encoding="utf-8") as f:
f.write(cleaned_text)
else:
return cleaned_text
This works for many cases, except in cases like https://mesonet.agron.iastate.edu/wx/afos/p.php?pil=CODSUS&e=200012271924
Where you get a line like: WARM WK 4627 4324 I4222
Problems with cleaning function and cleaning in general:
- This has to undergo significant testing to make sure that "good" cases are not removed.
- There are some otherwise valid characters that are misplaced. This could require parse-time cleaning, where the invalid characters are checked based on the line splits, e.g., once that line is split, write a rule to check certain indexes for invalid characters:
['WARM', 'WK', '4627', '4324', 'I4222']
- There may be some cases where you should ignore an initial forecast and only use the updated forecast.
Operating System
Linux
Version
1.7.1
Python Version
3.13.5
Code to Reproduce
from metpy.io import parse_wpc_surface_bulletin
df = parse_wpc_surface_bulletin("200010190721-KWBC-ASUS1 -CODSUS.txt", year=2000)
Errors, Traceback, and Logs
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[9], line 3
1 from metpy.io import parse_wpc_surface_bulletin
----> 3 df = parse_wpc_surface_bulletin("200010190721-KWBC-ASUS1 -CODSUS.txt", year=2000)
5 df
File ~/.conda/envs/metpy_test/lib/python3.13/site-packages/metpy/io/text.py:139, in parse_wpc_surface_bulletin(bulletin, year)
136 strength, boundary = np.nan, info
138 # Create a list of Points and create Line from points, if possible
--> 139 boundary = [Point(_decode_coords(point)) for point in boundary]
140 boundary = LineString(boundary) if len(boundary) > 1 else boundary[0]
142 # Add new row in the data for each front
File ~/.conda/envs/metpy_test/lib/python3.13/site-packages/metpy/io/text.py:60, in _decode_coords(coordinates)
58 # Insert decimal point at the correct place and convert to float
59 lat = float(f'{lat[:2]}.{lat[2:]}') * flip
---> 60 lon = -float(f'{lon[:3]}.{lon[3:]}')
61 return lon, lat
ValueError: could not convert string to float: '"45.'