Skip to content

Conversation

officialasishkumar
Copy link

@officialasishkumar officialasishkumar commented Jul 6, 2025

Description

This PR introduces contributor metric feature, that records and stores the contributors information that are in different level to the database using a background process with the help of celery.

  • d0: first fork or star/watch event.
  • d1: first issue opened, first PR opened, first PR comment (within a configurable window).
  • d2: longer-term flags and counts—PR merge, >5 issues opened, total comments, PRs with >3 commits, comments on >2 distinct PRs.

The final table includes:

  • PK engagement_id, FKs repo_id, cntrb_id
  • Contributor identity (username, full_name, country, platform)
  • d0 flags and timestamp; d1 timestamps; d2 booleans and comment count
  • Metadata columns (tool_source, tool_version, data_source, data_collection_date)

Indexes are on repo_id, cntrb_id, username, and platform.

This PR fixes #2992

Notes for Reviewers

Signed commits

  • Yes, I signed my commits.

@officialasishkumar officialasishkumar changed the title D0 contributor metric feat: contributor metric Jul 6, 2025
@officialasishkumar officialasishkumar marked this pull request as ready for review July 14, 2025 18:58
@MoralCode
Copy link
Contributor

Hello @officialasishkumar, thanks for this contribution (and apologies for the delay in getting around to this).

We've made some fairly substantial changes to the repo (notably using uv for python dependency management). Could you rebase this PR on top of the current main branch? This should also fix a lot of the linter warnings.

@officialasishkumar
Copy link
Author

Sure @MoralCode

Will do by the EOD

@MoralCode
Copy link
Contributor

@officialasishkumar Rebasing would probably be a better way to update this PR so that the diffs are easier to review because they only contain the changes you made, rather than also containing code from other peoples unrelated PRs that already exist on the main branch.

Are you familiar with the process of rebasing in git? Happy to provide guidance if you would like

Akshatb2006 and others added 7 commits July 18, 2025 23:50
Signed-off-by: Akshat Baranwal <kysuakshat23@gmail.com>
Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
Signed-off-by: Akshat Baranwal <kysuakshat23@gmail.com>
Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
Signed-off-by: Akshat Baranwal <kysuakshat23@gmail.com>
@Akshatb2006 Akshatb2006 force-pushed the d0-contributor-metric branch from 4f525e2 to 91a2bf8 Compare July 18, 2025 18:21
@Akshatb2006
Copy link

Hey @MoralCode
Could you please review this once??

@sgoggins
Copy link
Member

FYI -- @MoralCode is out of office until early next week.

repo = relationship("Repo")
repo = relationship("Repo")

class ContributorEngagement(Base):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@officialasishkumar : New database objects should be in a file in {repo root}/augur/application/schema/alembic/versions

I think with the PR open for the other GSOC team the next number in sequence is 35.

That enables alembic upgrades and downgrade.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sgoggins updated with commit 299bd90

Co-authored-by: Akshat <kysuakshat23@gmail.com>
Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
Copy link
Member

@sgoggins sgoggins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting for confirmation from @Ulincsys and @ABrain7710 ... but I think we don't modify the maindatabase creation script, and we do modify the version script ... so a person creating a new install gets all the old tables and your new ones last.

repo = relationship("Repo")

class ContributorEngagement(Base):
__tablename__ = "contributor_engagement"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ABrain7710 / @Ulincsys : Can you confirm that its our practice not to modify the main script for table creation, but to have the versioning script also included so that new installs just get "all the upgrades"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way i've done it in the past for other projects is that both the main schema gets modified AND migrations get created. Then if someone creates a new database, they get the latest schema (this has required a small bit of code when augur detects a new DB and creates the tables to stamp it with the current alembic version). Then that database can be upgraded as time goes on, but new dbs are always starting out on the latest version

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Conversion Rate
4 participants