Strange behavior and interaction in GamitSession and PyNetwork between clusters, tie_points, and backbone

This issue is closely tied [this discussion](https://github.com/demiangomez/Parallel.GAMIT/issues/101#issuecomment-2414748165), so please read the linked content before continuing.

Examining the data from this query:

```SELECT * FROM public.gamit_subnets where "DOY"='180' and "Year"='2022'```

Shows interesting behavior:

```python
df = pd.read_csv('/Users/espg/Downloads/gamit_subnets_180_2022.csv')
df.iloc[1].stations
```

Which outputs the following (note the color highlight)

$${\color{blue}igs.badg,igs.cas1,igs.coco,igs.daej,igs.darw,igs.dumg,igs.guam,}$$
$${\color{blue}igs.hob2,igs.hrao,igs.iisc,igs.kiru,igs.mal2,igs.mcil,igs.mobs,igs.nklg,igs.pohn,igs.pol2,igs.reun,}$$
$${\color{red}igs.cas1,igs.darw,igs.dumg,igs.hob2,igs.hrao,igs.kiru,igs.mal2,igs.mobs,igs.nklg,igs.pol2,igs.reun}$$

All of the red entries above are duplicates of stations already listed in the blue highlighting. 

For `public.gamit_subnets` on DOY of 2022, there are 17 listed clusters in the data table, with the first cluster (labeled subnet 0) being the backbone network. That leaves 16 clusters, which correspond to the 16 clusters that `make_clusters` produces. Since index zero in the postgres data table corresponds to the backbone, the indexing is off by 1; i.e., `df.iloc[1].stations` compares to `a[0]` and `b[0]` from `a, b = make_clusters(points.T, stations)`, with "a" and "b" being the `clusters` dictionary and `cluster_ties` list respectively.

This is the zero-th entry for cluster stations from the clusters dictionary-- note that it's identical to the blue highlighted text from `public.gamit_subnets` table for DOY 180 in 2022:

```python
>>> a['stations'][0]
[array(['igs', 'badg'], dtype='<U4'),
 array(['igs', 'cas1'], dtype='<U4'),
 array(['igs', 'coco'], dtype='<U4'),
 array(['igs', 'daej'], dtype='<U4'),
 array(['igs', 'darw'], dtype='<U4'),
 array(['igs', 'dumg'], dtype='<U4'),
 array(['igs', 'guam'], dtype='<U4'),
 array(['igs', 'hob2'], dtype='<U4'),
 array(['igs', 'hrao'], dtype='<U4'),
 array(['igs', 'iisc'], dtype='<U4'),
 array(['igs', 'kiru'], dtype='<U4'),
 array(['igs', 'mal2'], dtype='<U4'),
 array(['igs', 'mcil'], dtype='<U4'),
 array(['igs', 'mobs'], dtype='<U4'),
 array(['igs', 'nklg'], dtype='<U4'),
 array(['igs', 'pohn'], dtype='<U4'),
 array(['igs', 'pol2'], dtype='<U4'),
 array(['igs', 'reun'], dtype='<U4')]
```

Now, this is the output from the `cluster_ties` list, which is identical to the red highlighted  text from `public.gamit_subnets` table for DOY 180 in 2022:

```python
>>> b[0]
 [array(['igs', 'cas1'], dtype='<U4'),
 array(['igs', 'darw'], dtype='<U4'),
 array(['igs', 'dumg'], dtype='<U4'),
 array(['igs', 'hob2'], dtype='<U4'),
 array(['igs', 'hrao'], dtype='<U4'),
 array(['igs', 'kiru'], dtype='<U4'),
 array(['igs', 'mal2'], dtype='<U4'),
 array(['igs', 'mobs'], dtype='<U4'),
 array(['igs', 'nklg'], dtype='<U4'),
 array(['igs', 'pol2'], dtype='<U4'),
 array(['igs', 'reun'], dtype='<U4')]
```

Looking at two additional entries from `public.gamit_subnets` and the `clusters` dictionary & `cluster_ties` list confirms the pattern.

Questions
========

1. Was this the case with earlier runs that @eckendrick was doing, such as `public.gamit_soln` 2022 days 001-008?
    - If not, this might be a bug with these lines that [check for tie points repeats](https://github.com/demiangomez/Parallel.GAMIT/blob/master/pgamit/pyNetwork.py#L338-L340) on load from the database
    - If the tie and stations are getting added together inside of `GamitSession`, we can fix the issue with the code from the previous bullet or similar
2. What is default and preferred behavior for handling stations, and should subnetwork `stations` include the tie stations?
    - Reading [this comment](https://github.com/demiangomez/Parallel.GAMIT/blob/master/pgamit/pyNetwork.py#L333-L337), it looks like currently `GamitSession` wants these two data objects (tie points and station clusters) ***not to overlap***. 
    - Regardless of what the current default behavior is, we should ***intentionally determine*** what makes sense for the behavior to be, and if we want to change it. 
    - Having the clusters include the tie stations (or not) will impact other downstream code, such as how subnetwork plots are currently handled.
    - Having the clusters include the tie stations (or not) will also impact the 'check' that's run when determining how large the subnetworks are (should it be the 'base' size of the clusters, or the 'expanded' size that includes the tie points)
    - @demiangomez my intuition is that it will make more sense to change the behavior in `GamitSession` than what is setup in `pyNetwork`
3. Regardless of what the default behavior is or where the tie stations and subnetworks are being double merged, we should be testing for repeats:
    - With unit tests that tell us (and fail submitted PRs) if the control logic needlessly duplicates entries
    - With runtime checks that can detect, fix and remove duplicate stations before time consuming numerics

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Strange behavior and interaction in GamitSession and PyNetwork between clusters, tie_points, and backbone #116

Questions

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Strange behavior and interaction in GamitSession and PyNetwork between clusters, tie_points, and backbone #116

Description

Questions

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions