Skip to content

new paired method for over_cluster() function #218

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

espg
Copy link
Collaborator

@espg espg commented May 14, 2025

This is an enhancement / test case that we had discussed previously-- it expands cluster membership in 'pairs' by adding the closest point from an adjacent cluster along with a distant point from that same cluster to (potentially) improve the baseline estimation and water vapor correction within GAMIT runs. For most cases, the 'distant point' will be the furthest point from the neighboring cluster... however, we check against the rejection_threshold (with a default value of 5,000km), so in some cases it will instead be the furthest point under that threshold.

This PR has a few other changes that are worth mentioning. The parameter overlap_points has been renamed to just overlap to help with line lengths, and the rejection_threshold has been set to a default value of 5,000,000 meters. This latter point may have some effect on repeat runs, since earlier it was set to None-- i.e., if the baseline distance was exceeded, we'd see the error in the GAMIT output. Now, with this update, there shouldn't be GAMIT errors related to baseline length. The documentation has been updated, and some additional pep8 fixes within the cluster.py module have also been implemented... it still needs more to be pep8 'compliant' since the agglomerative clustering function added several hundred new lines to the file, but the easier to fix errors have been fixed.

Besides the new paired method itself, the biggest update in this PR is the fix for a bug that's been present for a while. Previously, setting neighbors=N for the default static method would add N+1 new points per cluster, instead of just N points per cluster as is described in the documentation. This was a zero indexing bug; the while loop checked for a condition <= neighbors, and started at 0. So, if you set neighbors=2, the termination condition wouldn't be met until the number of neighbors added was '3'. This is fixed now-- method='static' and neighbors=N will yield N points added to each cluster, for n_clusters * neighbors additional redundant tie points that need to be processed for the GAMIT subnetwork runs. However, this behavior (while correct) is different than what it was when running previous pgamit runs-- new runs will need to update there parameters appropriately to be comparable. As for the the new paired method, it will produce ties to N=neighbors adjacent clusters, with each neighbor always including exactly two tie points-- i.e., neighbors2n_clusters additional redundant tie points.

@espg
Copy link
Collaborator Author

espg commented May 14, 2025

@demiangomez this passes unit test and is ready to be used. Not sure why the CI/CD tests are so slow right now-- they usually take ~5 mins, but these took almost a half hour... might be unrelated to us and something with github??

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant