Skip to content

invalid characters in CODSUS text causes parse_wpc_surface_bulletin to fail #3921

@ahaberlie

Description

@ahaberlie

What went wrong?

Metpy version: 1.7.1
Python: 3.13.5 (conda environment)

Summary: using the Iowa State archive of WPC surface front text files, we encountered an error related to invalid characters in the lat/lon fields. While this would be trivial to fix for one day, these invalid characters are peppered throughout the archive. If you use a year or more of data, it becomes difficult to track down the issues.

Steps to download example data that produces the error:

  1. Go to: https://mesonet.agron.iastate.edu/wx/afos/p.php?pil=CODSUS
  2. Download Text
  3. Read in the file using parse_wpc_surface_bulletin

When investigating the file, you see that " is inserted in the coded coordinate:

COLD WK 44135 42138 411"45 37152 35155 33161

The invalid character in some cases appears to be "fixed", where you get an "updated" text product for the same forecast and valid time with the invalid character removed:

https://mesonet.agron.iastate.edu/wx/afos/p.php?pil=CODSUS&e=200005260728

It does appear as though a malformed front label (TROF, COLD, etc.) will result in the parser ignoring that line. It might be good to let the user know this is happening. For example, this text file has a label TRmOF that does not wind up in the DataFrame:

https://mesonet.agron.iastate.edu/wx/afos/p.php?pil=CODSUS&e=200003290728

Possible fixes (maybe a utility function clean_wpc_surface_bulletin):

  1. replace lowercase letters with ""
  2. replace punctuation with ""

Example function:

import string

def clean_wpc_surface_bulletin(input_path, output_path=None):
    """Remove common invalid characters from WPC surface bulletin.
    Specifically, this function will remove any lowercase letters
    and punctuation. This function could help fix exceptions when
    running parse_wpc_surface_bulletin and keep some cases where
    an invalid character is handled by removing the entire line
    from the resulting DataFrame.
    
    Parameters
    ----------
    bulletin : file-like object
        file-like object that will be read from directly.
    output_path : str
        location at which to write out the cleaned version of the file. 
        If None, the resulting text will be returned.

    Returns
    -------
    cleaned_text: str
        If output_path is None, this will return the cleaned text. 
        Otherwise, return None.
    """
    remove_chars = string.ascii_lowercase + string.punctuation
    
    with open(input_path, "r", encoding="utf-8") as f:
        text = f.read()

    cleaned_text = "".join(ch for ch in text if ch not in remove_chars)
    
    if output_path:
        with open(output_path, "w", encoding="utf-8") as f:
            f.write(cleaned_text)
    else:
        return cleaned_text

This works for many cases, except in cases like https://mesonet.agron.iastate.edu/wx/afos/p.php?pil=CODSUS&e=200012271924

Where you get a line like: WARM WK 4627 4324 I4222

Problems with cleaning function and cleaning in general:

  1. This has to undergo significant testing to make sure that "good" cases are not removed.
  2. There are some otherwise valid characters that are misplaced. This could require parse-time cleaning, where the invalid characters are checked based on the line splits, e.g., once that line is split, write a rule to check certain indexes for invalid characters: ['WARM', 'WK', '4627', '4324', 'I4222']
  3. There may be some cases where you should ignore an initial forecast and only use the updated forecast.

Operating System

Linux

Version

1.7.1

Python Version

3.13.5

Code to Reproduce

from metpy.io import parse_wpc_surface_bulletin

df = parse_wpc_surface_bulletin("200010190721-KWBC-ASUS1 -CODSUS.txt", year=2000)

Errors, Traceback, and Logs

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[9], line 3
      1 from metpy.io import parse_wpc_surface_bulletin
----> 3 df = parse_wpc_surface_bulletin("200010190721-KWBC-ASUS1 -CODSUS.txt", year=2000)
      5 df

File ~/.conda/envs/metpy_test/lib/python3.13/site-packages/metpy/io/text.py:139, in parse_wpc_surface_bulletin(bulletin, year)
    136     strength, boundary = np.nan, info
    138 # Create a list of Points and create Line from points, if possible
--> 139 boundary = [Point(_decode_coords(point)) for point in boundary]
    140 boundary = LineString(boundary) if len(boundary) > 1 else boundary[0]
    142 # Add new row in the data for each front

File ~/.conda/envs/metpy_test/lib/python3.13/site-packages/metpy/io/text.py:60, in _decode_coords(coordinates)
     58 # Insert decimal point at the correct place and convert to float
     59 lat = float(f'{lat[:2]}.{lat[2:]}') * flip
---> 60 lon = -float(f'{lon[:3]}.{lon[3:]}')
     61 return lon, lat

ValueError: could not convert string to float: '"45.'

Metadata

Metadata

Assignees

No one assigned

    Labels

    Area: IOPertains to reading dataType: EnhancementEnhancement to existing functionality

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions