-
Notifications
You must be signed in to change notification settings - Fork 906
add date filter to contributer resolution logic queries #3253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Isaac Milarsky <imilarsky@gmail.com>
Signed-off-by: Isaac Milarsky <imilarsky@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor change noted with some additional comments on how getting the since_date
bulletproofed against dufus contributors who make commit_date random could affect some other parts of the code. LOOKS VERY GOOD! Let me know when this is ready to test.
Signed-off-by: Isaac Milarsky <imilarsky@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question about the change to the makefile.
On many systems, that should fail, because installing packages systemwide is not supported anymore (on EG: Ubuntu), and I believe with UV, we are no longer intended to run Augur in a venv (as UV supposedly manages those automatically).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - Lets let it fly!
The test would be:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To ensure we can still support full collection on demand I think it would good to pass a full_collection flag and if it is true then grab all commits, otherwise do this new logic to only get new commits
in poking through the code, im actually not sure this even affects analyze_commits_in_parallel beyond removing some date related things |
The partial commit collection was already supported previously in the body of the repo = get_repo_by_repo_id(repo_id)
#Get the huge list of commits to process.
absolute_path = get_absolute_repo_path(facade_helper.repo_base_directory, repo.repo_id, repo.repo_path, repo.repo_name)
repo_loc = (f"{absolute_path}/.git")
# Grab the parents of HEAD
parent_commits = get_parent_commits_set(repo_loc)
# Grab the existing commits from the database
existing_commits = get_existing_commits_set(repo_id)
# Find missing commits and add them
missing_commits = parent_commits - existing_commits
facade_helper.log_activity('Debug',f"Commits missing from repo {repo_id}: {len(missing_commits)}") These changes just effect the contributor resolution logic to only go through commits that we haven't before by default. |
Signed-off-by: Isaac Milarsky <imilarsky@gmail.com>
What would the behavior of this PR be in the case of a repo deciding to rewrite the history of their git repo? Is that something we can detect and maybe just fall back to the full recollection? |
Signed-off-by: Isaac Milarsky <imilarsky@gmail.com>
from augur.tasks.init.celery_app import AugurFacadeRepoCollectionTask | ||
from augur.tasks.github.util.github_data_access import GithubDataAccess, UrlNotFoundException | ||
from augur.tasks.github.util.github_random_key_auth import GithubRandomKeyAuth | ||
from augur.application.db.models import Contributor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[pylint] reported by reviewdog 🐶
W0611: Unused Contributor imported from augur.application.db.models (unused-import)
The logic is based on git commit hashes for the main analysis logic and for the contributor resolution logic it's based on the date that the commit record was inserted. So, if a repository is rewriting their history the hashes would presumably be different and would be collected by the main logic into the commits table. Then that new commit would be collected later and therefore not be filtered by the new logic. The date that it is filtering by is the date that the commit record was collected not the date the commit record was committed. |
conn.execute(text(f""" | ||
INSERT INTO "augur_operations"."config" ("section_name", "setting_name", "value", "type") VALUES ('Facade', 'facade_contributor_full_recollect', '{0}', 'int'); | ||
""")) | ||
# ### end Alembic commands ### |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A migration is not needed if you are simply adding a value to the database - its really only needed for changing the structure of the DB tables/columns themselves. I suspect you could add this to augur/application/config.py:~71 or so and it would get included in the database for anyone who has augur set to insert that config on startup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gotcha, I just wasn't sure where we create the config in the code. Have changed to reflect.
Signed-off-by: Isaac Milarsky <imilarsky@gmail.com>
Description
Signed commits