-
Notifications
You must be signed in to change notification settings - Fork 29
feat(heuristics): add Fake Email analyzer to validate maintainer email domain #1106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
c6f35f7
to
8d29103
Compare
f945882
to
a7103e4
Compare
src/macaron/malware_analyzer/pypi_heuristics/metadata/fake_email.py
Outdated
Show resolved
Hide resolved
…l domains Signed-off-by: Amine <amine.raouane@enim.ac.ma>
Signed-off-by: Amine <amine.raouane@enim.ac.ma>
Signed-off-by: Amine <amine.raouane@enim.ac.ma>
759ab97
to
d99495c
Compare
…mail domain validation Signed-off-by: Amine <amine.raouane@enim.ac.ma>
d99495c
to
a8d373b
Compare
""" | ||
emailinfo = None | ||
try: | ||
emailinfo = validate_email(email, check_deliverability=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please make check_deliverability
configurable via defaults.ini
so that you can turn it off in unit tests? We try to avoid network calls in our unit tests.
analyzer: FakeEmailAnalyzer, pypi_package_json_asset_mock: MagicMock, mock_validate_email: MagicMock | ||
) -> None: | ||
"""Test analyzer passes when only maintainer_email is present and valid.""" | ||
email = "maintainer@example.net" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is example.net
supposed to be a valid domain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and no. The domain is technically valid, but it’s reserved by IANA and not intended for real-world use. So to ensure the email is actually usable by a real user, Should I add a list of reserved domains and TLDs and check against them before proceeding with validation or acceptance, or should I keep it as it is?
validated_emails = info.get("validated_emails") | ||
assert isinstance(validated_emails, list) | ||
assert len(validated_emails) == 2 | ||
assert {"normalized": "author@example.com", "local_part": "author", "domain": "example.com"} in validated_emails |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question, if example.com
is supposed to be a valid domain.
Summary
This PR adds a new heuristic analyzer called
FakeEmailAnalyzer
. It verifies the validity of maintainer email addresses listed in a PyPI package by checking both the format and the existence of MX records for their domains. This helps detect packages with fake or throwaway emails, which are often indicative of malicious intent.Description of changes
FakeEmailAnalyzer
that:detect_malicious_metadata_check.py
to include and invoke this new analyzer.Related issues
None
Checklist
verified
label should appear next to all of your commits on GitHub.