Skip to content

Strange behavior and interaction in GamitSession and PyNetwork between clusters, tie_points, and backbone #116

Open
@espg

Description

@espg

This issue is closely tied this discussion, so please read the linked content before continuing.

Examining the data from this query:

SELECT * FROM public.gamit_subnets where "DOY"='180' and "Year"='2022'

Shows interesting behavior:

df = pd.read_csv('/Users/espg/Downloads/gamit_subnets_180_2022.csv')
df.iloc[1].stations

Which outputs the following (note the color highlight)

$${\color{blue}igs.badg,igs.cas1,igs.coco,igs.daej,igs.darw,igs.dumg,igs.guam,}$$ $${\color{blue}igs.hob2,igs.hrao,igs.iisc,igs.kiru,igs.mal2,igs.mcil,igs.mobs,igs.nklg,igs.pohn,igs.pol2,igs.reun,}$$ $${\color{red}igs.cas1,igs.darw,igs.dumg,igs.hob2,igs.hrao,igs.kiru,igs.mal2,igs.mobs,igs.nklg,igs.pol2,igs.reun}$$

All of the red entries above are duplicates of stations already listed in the blue highlighting.

For public.gamit_subnets on DOY of 2022, there are 17 listed clusters in the data table, with the first cluster (labeled subnet 0) being the backbone network. That leaves 16 clusters, which correspond to the 16 clusters that make_clusters produces. Since index zero in the postgres data table corresponds to the backbone, the indexing is off by 1; i.e., df.iloc[1].stations compares to a[0] and b[0] from a, b = make_clusters(points.T, stations), with "a" and "b" being the clusters dictionary and cluster_ties list respectively.

This is the zero-th entry for cluster stations from the clusters dictionary-- note that it's identical to the blue highlighted text from public.gamit_subnets table for DOY 180 in 2022:

>>> a['stations'][0]
[array(['igs', 'badg'], dtype='<U4'),
 array(['igs', 'cas1'], dtype='<U4'),
 array(['igs', 'coco'], dtype='<U4'),
 array(['igs', 'daej'], dtype='<U4'),
 array(['igs', 'darw'], dtype='<U4'),
 array(['igs', 'dumg'], dtype='<U4'),
 array(['igs', 'guam'], dtype='<U4'),
 array(['igs', 'hob2'], dtype='<U4'),
 array(['igs', 'hrao'], dtype='<U4'),
 array(['igs', 'iisc'], dtype='<U4'),
 array(['igs', 'kiru'], dtype='<U4'),
 array(['igs', 'mal2'], dtype='<U4'),
 array(['igs', 'mcil'], dtype='<U4'),
 array(['igs', 'mobs'], dtype='<U4'),
 array(['igs', 'nklg'], dtype='<U4'),
 array(['igs', 'pohn'], dtype='<U4'),
 array(['igs', 'pol2'], dtype='<U4'),
 array(['igs', 'reun'], dtype='<U4')]

Now, this is the output from the cluster_ties list, which is identical to the red highlighted text from public.gamit_subnets table for DOY 180 in 2022:

>>> b[0]
 [array(['igs', 'cas1'], dtype='<U4'),
 array(['igs', 'darw'], dtype='<U4'),
 array(['igs', 'dumg'], dtype='<U4'),
 array(['igs', 'hob2'], dtype='<U4'),
 array(['igs', 'hrao'], dtype='<U4'),
 array(['igs', 'kiru'], dtype='<U4'),
 array(['igs', 'mal2'], dtype='<U4'),
 array(['igs', 'mobs'], dtype='<U4'),
 array(['igs', 'nklg'], dtype='<U4'),
 array(['igs', 'pol2'], dtype='<U4'),
 array(['igs', 'reun'], dtype='<U4')]

Looking at two additional entries from public.gamit_subnets and the clusters dictionary & cluster_ties list confirms the pattern.

Questions

  1. Was this the case with earlier runs that @eckendrick was doing, such as public.gamit_soln 2022 days 001-008?
    • If not, this might be a bug with these lines that check for tie points repeats on load from the database
    • If the tie and stations are getting added together inside of GamitSession, we can fix the issue with the code from the previous bullet or similar
  2. What is default and preferred behavior for handling stations, and should subnetwork stations include the tie stations?
    • Reading this comment, it looks like currently GamitSession wants these two data objects (tie points and station clusters) not to overlap.
    • Regardless of what the current default behavior is, we should intentionally determine what makes sense for the behavior to be, and if we want to change it.
    • Having the clusters include the tie stations (or not) will impact other downstream code, such as how subnetwork plots are currently handled.
    • Having the clusters include the tie stations (or not) will also impact the 'check' that's run when determining how large the subnetworks are (should it be the 'base' size of the clusters, or the 'expanded' size that includes the tie points)
    • @demiangomez my intuition is that it will make more sense to change the behavior in GamitSession than what is setup in pyNetwork
  3. Regardless of what the default behavior is or where the tie stations and subnetworks are being double merged, we should be testing for repeats:
    • With unit tests that tell us (and fail submitted PRs) if the control logic needlessly duplicates entries
    • With runtime checks that can detect, fix and remove duplicate stations before time consuming numerics

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions