Skip to content

Conversation

regit
Copy link
Contributor

@regit regit commented Aug 17, 2025

Update of #13715 addressing comments from @jufajardini. As it seems it was not an in depth review. I'm reproducing here the details of changes from previous PR.

Contribution style:

Our Contribution agreements:

Changes (if applicable):

Link to ticket: https://redmine.openinfosecfoundation.org/issues/7816

Describe changes:

  • Address comments from @jufajardini
  • Don't embed GPL data by default (it needs to be explicitly set)
  • Load mime type files from disk
  • Embed mime type files in code
  • make install-conf install mime type files
  • make upgrade-data update mime type files

SV_BRANCH=OISF/suricata-verify#2606

regit added 5 commits August 10, 2025 20:14
Following commit will add a new value that will overflow the actual
size.

Ticket: 7816
File magic is known to have performance issue. When looking for
an alternative, I've been pointed to the tree_magic_mini crate
that output the mime type of a binary stream. This is different
from magic but it has multiple advantages as it is a standard
so it can be correlated with other tools.

So instead of replacing magic, this patch adds a new mime type
output to fileinfo events.

This patch also adds the `file.mimetype` sticky buffer.

Mime type has 2 advantages over file magic. First it is really
faster and second, the result are easier to use as the MIME type
are well defined. But it provides less information than magic for
example with regards to the size of images.

This patch is using version 3.2.0 of tree_magic_mini that provides
capabilities to load mime files from a give directory.

This patch is embedding the GPL database if an explicit flag is
passed to configure (--enable-bundled-gpl-mimetype). In this case
the mime type files comes from `tree_magic_mini` dependencies.

The patch adds the mime definition in Suricata source code.
This means that if the user uses this data, then we have no dependency
in term of results on the operating system and on the evolution of
the crate. Updating the mime type definition on the target system
will be done by installing the files present in Suricata code.

Installation of the mime type files is done via `make install-conf`
to the mimetype/ directory below data directory. This directory is
used by default even if the suricata.yaml is not updated with the
new `mimetype-dir` config variable.

The new Makefile target `upgrade-data` takes care of upgrading
the file. Users will be able to use it when there is a new version
available.

The `mimetype-dir` YAML variable can be used to set up a different
directory with data under control by the user.

Ticket: 7816
Add some builds with mimetype activated and
suricata-verify so we can test feature.

Test with and without embedded GPL data are
done.
Copy link

codecov bot commented Aug 17, 2025

Codecov Report

❌ Patch coverage is 83.25581% with 36 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.72%. Comparing base (b93a277) to head (f322fa6).

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #13727      +/-   ##
==========================================
+ Coverage   82.99%   83.72%   +0.72%     
==========================================
  Files        1001     1014      +13     
  Lines      272978   275266    +2288     
==========================================
+ Hits       226556   230461    +3905     
+ Misses      46422    44805    -1617     
Flag Coverage Δ
fuzzcorpus 62.92% <63.51%> (+0.04%) ⬆️
livemode 19.00% <36.48%> (?)
pcap 44.74% <55.40%> (+0.05%) ⬆️
suricata-verify 65.09% <83.98%> (+0.01%) ⬆️
unittests 59.17% <56.75%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- run: ./scripts/bundle.sh
- run: ./autogen.sh
- run: ./configure --enable-warnings
- run: ./configure --enable-warnings --enable-mimetype
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder : could this be a plugin ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To rephrase, what are we missing in Suricata so that it could be a plugin ?..

}

if ((s->file_flags & FILE_SIG_NEED_MIMETYPE) && file_size == 0) {
SCLogDebug("sig needs file content, but we don't have any");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debug log message is bad copy/paste

int list_id;
const MpmCtx *mpm_ctx;
const DetectEngineTransforms *transforms;
} PrefilterMpmFileMimetype;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not reuse PrefilterMpmListId ?


file.mimetype; content:"application/vnd.microsoft.portable-executable";

``file.mimetype`` supports multiple buffer matching, see :doc:`multi-buffer-matching`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs to be added as well in doc//userguide/rules/multi-buffer-matching.rst see #13752

},
"mimetype": {
"type": "string",
"description": "The MIME type of the file (e.g., application/pdf, image/png, etc.)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe say how it is found (not from headers, but from content) ?

hashes.

The type of the file can also be determined by doing an analysis of the beginning of the content
of the file. This is done by using libmagic and/or libmimetype. Magic is slower than mimetype, but
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: space at end of line

Copy link
Contributor

@catenacyber catenacyber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI : ✅
Code : need to check
Commits segmentation : ok (I would squash the doc commits)
Commit messages : good
Git ID set : looks fine for me
CLA : you already contributed
Doc update : good (some comments)
Redmine ticket : ok
Rustfmt : 🟠 rustfmt rust/src/filemimetype.rs has a diff
Tests : cool
Dependencies added: ⚠️ I do not know what needs to be checked for this new dependency

Copy link
Contributor

@catenacyber catenacyber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now needs a rebase

Could you do the rustfmt ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants