Skip to content

basic check to prevent 'orphan' clusters during prune #222

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

espg
Copy link
Collaborator

@espg espg commented Jun 17, 2025

Additional check inside of prune to ensure that each cluster is connected to at least one other cluster

@espg
Copy link
Collaborator Author

espg commented Jun 17, 2025

Worth reviewing the logic so we're clear on what is happening:

        if problems == 0:
            for row in mod:
                if np.sum(row) > 0:
                    problems += ~(np.max(mod[:,row].sum(axis=0)) >= 2)
            if problems == 0:
                subset.append(i)
                OC[i, :] = np.zeros(rowlength)

First line, problems == 0 means that we haven't removed any station from the processing dataset-- i.e., each station in the processing matrix has at least one entry, and will be processed by GAMIT at least once.

Lines 2 and 3: iterate thru each row of the M by N station matrix, where M rows refer to M clusters, and N columns refer to N total stations to process. The if np.sum(row) > 0 says to only check for non-pruned clusters; we don't care about clusters that have previously been removed, including if the current cluster is under consideration for removal.

Line 4: we entered this loop because removing the current cluster didn't cause any problems in station coverage. Now we'll check if removing the cluster causes any problems in overlap coverage. mod[:,row] takes the current row (cluster), checks what stations (columns) are in that cluster, and then subsets the M by N matrix to be M by Stations-in-that-cluster. The .sum(axis=0) gives the counts for each station across all clusters; a value of 1 in sum means that the station isn't tied anywhere else, a value of 2 or higher means that station is overlapped/tied. In plain english, mod[:,row].sum(axis=0) returns for cluster/row i, a list of the station counts in cluster i across all clusters. With ~(np.max(mod[:,row].sum(axis=0)) >= 2), we return a boolean of True if at least one station in cluster i is present/overlapping in another cluster; since True means there was no problem, we flip it to False if there are no problems, or True if there is an issue, and then add that value to problems.

Line 5 and beyond: When removing a cluster, we have to check all other cluster to see if we've removed all overlap/ties, so we iterate thru the full overcluster matrix everytime we remove a cluster, and see if any clusters are impacted. If none are, we modify the overcluster matrix and remove that cluster. If removing that cluster leaves any other clusters without any ties, then we don't remove that cluster.

@espg
Copy link
Collaborator Author

espg commented Jun 17, 2025

This doesn't have any logic to check for a minimum number of ties/overlaps. It just checks that clusters have overlap post pruning.

@espg
Copy link
Collaborator Author

espg commented Jun 17, 2025

One more note on this-- if we have a case where we haven't overclustered something, such as the rejection_threshold triggering:

image

...then adding in the logic in this PR will result in no pruning at all for the clusters. This is because if overcluster doesn't add overlap stations for a row, then problems will always be non-zero in line 5 above. Line 4 will not generate any problems for every cluster, until it gets to the cluster that didn't expand, which will report that it isn't tied to anything.

This is very much an edge case, but we should be aware of it. We could add another function to check for orphans between the overcluster and prune steps explicitly, and then error / warn or apply different logic when we detect the edge case.

@demiangomez
Copy link
Owner

I think that when an orphan is detected, we should probably remove it from the dataset to allow the processing to continue, but we also need to make PGAMIT aware of this so that it prints a message to the user. Maybe we could do this through an exception during prune. Since no pruning will occur anyways, this would allow the processing to continue but PGAMIT can print a message in the log so that the user becomes aware of this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants