Skip to content

Conversation

regit
Copy link
Contributor

@regit regit commented Jul 24, 2025

Update of #13620 addressing comments.

Contribution style:

Our Contribution agreements:

Changes (if applicable):

Link to ticket: https://redmine.openinfosecfoundation.org/issues/

Describe changes:

  • GPL code inclusion is now optional (so MIT licensed crate is added)
  • Remove proprietary code detection in configure
  • Remove GPL warning in configure
  • Address comments

SV_BRANCH=OISF/suricata-verify#2606

@regit regit requested review from jasonish, jufajardini, victorjulien and a team as code owners July 24, 2025 18:43
@regit regit force-pushed the mime-type-reload-v2 branch from 8c68f72 to c104e61 Compare July 24, 2025 18:45
@regit regit marked this pull request as draft July 24, 2025 18:45
@regit
Copy link
Contributor Author

regit commented Jul 24, 2025

Setting to draft to see how the CI is going. Will unset if green "enough".

@regit regit force-pushed the mime-type-reload-v2 branch 2 times, most recently from c3123c5 to 44c063f Compare July 24, 2025 18:54
Copy link

codecov bot commented Jul 25, 2025

Codecov Report

❌ Patch coverage is 56.56566% with 43 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.69%. Comparing base (6bbba95) to head (eb76905).
⚠️ Report is 24 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #13647   +/-   ##
=======================================
  Coverage   83.69%   83.69%           
=======================================
  Files        1011     1014    +3     
  Lines      275071   275149   +78     
=======================================
+ Hits       230230   230296   +66     
- Misses      44841    44853   +12     
Flag Coverage Δ
fuzzcorpus 62.85% <53.53%> (+0.01%) ⬆️
livemode 19.01% <30.30%> (-0.17%) ⬇️
pcap 44.76% <47.47%> (+0.01%) ⬆️
suricata-verify 65.05% <56.56%> (-0.03%) ⬇️
unittests 59.17% <44.44%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@regit regit force-pushed the mime-type-reload-v2 branch from e843bee to a01a3ef Compare July 25, 2025 08:00
Copy link
Contributor

@jufajardini jufajardini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@regit
Copy link
Contributor Author

regit commented Jul 25, 2025

CI failures: dependency issues? https://github.com/OISF/suricata/actions/runs/16516991164/job/46711126442?pr=13647#step:10:35

Yes, got an optional crate and need to make sure it is available everywhere.

regit added 7 commits July 25, 2025 17:20
Following commit will add a new value that will overflow the actual
size.

Ticket: 7816
File magic is known to have performance issue. When looking for
an alternative, I've been pointed to the tree_magic_mini crate
that output the mime type of a binary stream. This is different
from magic but it has multiple advantages as it is a standard
so it can be correlated with other tools.

So instead of replacing magic, this patch adds a new mime type
output to fileinfo events.

This patch also adds the `file.mimetype` sticky buffer.

Mime type has 2 advantages over file magic. First it is really
faster and second, the result are easier to use as the MIME type
are well defined. But it provides less information than magic for
example with regards to the size of images.

Ticket: 7816
This patch adds a flag `--enable-gpl-mimetype` to force the
usage of `tree_magic_db` that is GPL license and will not allow
double licensing.

This will allow the built Suricata to have mime type identification
not dependant of the operating system database.

Ticket: 7816
This can be used with the requires keyword to avoid signatures
load failure.

Ticket: 7816
As file data may have not been inspected yet, this is going to miss
in the event even if it could be available. Let's compute it before
logging the file.

Ticket: 7816
@regit regit force-pushed the mime-type-reload-v2 branch from a01a3ef to 6f251a4 Compare July 25, 2025 15:38
@catenacyber
Copy link
Contributor

Why is CI still red ?

strlcat(features, "HAVE_JA4 ", sizeof(features));
#endif
strlcat(features, "HAVE_LIBJANSSON ", sizeof(features));
strlcat(features, "HAVE_MIMETYPE ", sizeof(features));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no ifdef around this ?

#define FILE_SIG_NEED_MD5 BIT_U16(4)
#define FILE_SIG_NEED_SHA1 BIT_U16(5)
#define FILE_SIG_NEED_SHA256 BIT_U16(6)
#define FILE_SIG_NEED_SIZE BIT_U16(7)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can start by cherry-picking #13601 ;-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show off :P

@jufajardini
Copy link
Contributor

Why is CI still red ?

Dependencies -- libraries not found, still 🤔

der-parser = { version = "~9.0.0", default-features = false }
kerberos-parser = { version = "~0.8.0", default-features = false }

tree_magic_mini = { version = "~3.1.6", features = [@WITH_GPL_DATA@] }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the support is always compiled in? What happens if the data is not there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can load the mime type def from a file on disk. I'll document that.

@jasonish
Copy link
Member

jasonish commented Jul 29, 2025

Would it make more sense to only compile in the mime-type crate if the feature is enabled, so we're not left in the situation where a user starts relying on the operating system files? I know the license always allows linking in, but practically, it might not make sense to.

Tests could then rely on the HAVE_MIMETYPE feature flag and know they are always testing against the same database. Likewise for rule writers.

@regit
Copy link
Contributor Author

regit commented Jul 29, 2025

Why is CI still red ?

Dependencies -- libraries not found, still 🤔

2 issues:

  • dependencies are still a problem
  • mime type definition file may also be a fix

I'll work on it this week.

@regit
Copy link
Contributor Author

regit commented Jul 29, 2025

Would it make more sense to only compile in the mime-type crate if the feature is enabled, so we're not left in the situation where a user starts relying on the operating system files? I know the license always allows linking in, but practically, it might not make sense to.

Tests could then rely on the HAVE_MIMETYPE feature flag and know they are always testing against the same database. Likewise for rule writers.

I'm fine with it but it means proprietary license user can't get the feature. Should we do:

  • enable feature with GPL flag: (most common use case)
  • flag on top of previous to not get the GPL data but get the feature

So GPL build does --enable-mimetype and proprietary build does --enable-mimetype --disable-gpl-mimetype

This will simplify the CI a lot if we get the MIME_TYPE feature conditional doing so.

@jasonish
Copy link
Member

jasonish commented Jul 29, 2025

So GPL build does --enable-mimetype and proprietary build does --enable-mimetype --disable-gpl-mimetype

This will simplify the CI a lot if we get the MIME_TYPE feature conditional doing so.

For the purposes of getting CI working, I think we should have:

  • --enable-mimetype: enables the feature
  • --enable-bundled-gpl-mimetype: to turn on the feature for the bundled data

And HAVE_MIMETYPE feature flags for each I suppose, for easy testing with S-V.

I'm not 100% sure that would be the best for a release, but it gets us somewhere, with an opt-in behavior for now.

I took a quick look, and there does not appear to be any other more permissively licensed MIME database out there.

@regit
Copy link
Contributor Author

regit commented Jul 29, 2025

So GPL build does --enable-mimetype and proprietary build does --enable-mimetype --disable-gpl-mimetype
This will simplify the CI a lot if we get the MIME_TYPE feature conditional doing so.

For the purposes of getting CI working, I think we should have:

  • --enable-mimetype: enables the feature
  • --enable-bundled-gpl-mimetype: to turn on the feature for the bundled data

I fear bug report with that approach: I did enable the feature and it is not working. Do we really risk something with proprietary license owner ? They will need to update their compilation to get the feature and should see the second flag. So they should be safe enough ?

@regit regit force-pushed the mime-type-reload-v2 branch from 6f251a4 to 7b456ce Compare July 29, 2025 20:00
@regit regit force-pushed the mime-type-reload-v2 branch from 7b456ce to f37f5ab Compare July 29, 2025 20:03
@jasonish
Copy link
Member

So GPL build does --enable-mimetype and proprietary build does --enable-mimetype --disable-gpl-mimetype
This will simplify the CI a lot if we get the MIME_TYPE feature conditional doing so.

For the purposes of getting CI working, I think we should have:

  • --enable-mimetype: enables the feature
  • --enable-bundled-gpl-mimetype: to turn on the feature for the bundled data

I fear bug report with that approach: I did enable the feature and it is not working. Do we really risk something with proprietary license owner ? They will need to update their compilation to get the feature and should see the second flag. So they should be safe enough ?

I think enabling anything that pulls in GPL license code/data needs to be opt-in at this time. The only other feature I think that pulls in GPL code is nfqueue, and its opt-in as well.

@regit
Copy link
Contributor Author

regit commented Jul 29, 2025

So GPL build does --enable-mimetype and proprietary build does --enable-mimetype --disable-gpl-mimetype
This will simplify the CI a lot if we get the MIME_TYPE feature conditional doing so.

For the purposes of getting CI working, I think we should have:

  • --enable-mimetype: enables the feature
  • --enable-bundled-gpl-mimetype: to turn on the feature for the bundled data

I fear bug report with that approach: I did enable the feature and it is not working. Do we really risk something with proprietary license owner ? They will need to update their compilation to get the feature and should see the second flag. So they should be safe enough ?

I think enabling anything that pulls in GPL license code/data needs to be opt-in at this time. The only other feature I think that pulls in GPL code is nfqueue, and its opt-in as well.

Yes but --enable-mimetype is disabled by default so it will result in the feature being opt-in.

@catenacyber
Copy link
Contributor

CI is green, cool :-) , will you do a new clean PR ?

@regit
Copy link
Contributor Author

regit commented Aug 4, 2025

CI is green, cool :-) , will you do a new clean PR ?

Yes, fixing the last request from Jason and doing that.

@regit regit mentioned this pull request Aug 5, 2025
5 tasks
@regit regit closed this Aug 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants