Skip to content

Conversation

Orbital-Web
Copy link
Contributor

Description

Accounts extracted from GitHub attributes were showing up weird because of the way emails are shown in the metadata. This now extracts the email portion before passing it into the process_email function, rather than passing the whole string and treating everything before the @ as the name and after as the domain.

Now, even if the attribute looks like {'name' John, 'email': johndoe@gmail.com}, the correct name and domain will be extracted. Same goes with any arbitrary string that contains an email.

How Has This Been Tested?

Without this PR:
Accounts extracted from GitHub have a weird } at the end. You'd have to test this on a repo with users that display their email.

With this PR:
The name and account should look correct.

Backporting (check the box to trigger backport action)

Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.

  • This PR should be backported (make sure to check that the backport attempt succeeds)
  • [Optional] Override Linear Check

@Orbital-Web Orbital-Web requested a review from a team as a code owner June 22, 2025 00:24
Copy link

vercel bot commented Jun 22, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
internal-search ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jun 22, 2025 10:00pm

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

Improves email extraction from GitHub metadata by updating the knowledge graph utilities to handle complex email strings more robustly.

  • Adds new extract_email function in backend/onyx/kg/utils/formatting_utils.py to properly parse emails from complex metadata strings
  • Removes is_email validation in backend/onyx/kg/utils/extraction_utils.py in favor of the more robust extract_email approach
  • Fixes incorrect name display issue where GitHub accounts had trailing '}' characters by properly extracting emails from attribute strings like {name: John, email: johndoe@gmail.com}

3 files reviewed, no comments
Edit PR Review Bot Settings | Greptile

@Orbital-Web
Copy link
Contributor Author

Will be part of antoher pr

auto-merge was automatically disabled June 23, 2025 22:59

Pull request was closed

@Orbital-Web Orbital-Web deleted the kg-process-email-fix branch June 27, 2025 17:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant