Skip to content

Conversation

Ankita13-code
Copy link

Merge Checklist
Summary
Test Methodology

danmihai1 and others added 30 commits March 26, 2025 17:02
Specify --release when BUILD_TYPE was not specified, or when
BUILD_TYPE=release. The default "cargo build" behavior is to
build in debug mode.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
- post-process to remove box_pointers annotation for generated files
- while this unblocks the build, should look for better solution:
  - request/teach codegen to not add certain linter annotation
  - update ttrpc-codegen or other dependencies

Signed-off-by: Manuel Huber <mahuber@microsoft.com>
- counterpart to upstream a131eec
- unblocks build with rust v1.84

Signed-off-by: Manuel Huber <mahuber@microsoft.com>
- rust linter detects a potential future name collision
- see rustc lint static 'UNSTABLE_NAME_COLLISIONS'
- we can revisit this at a later point

Signed-off-by: Manuel Huber <mahuber@microsoft.com>
Use protobuf = "=3.7.1" for both agent and genpolicy, to fix version
mismatches with the protobuf crate.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Fix rebase merge in genpolicy-settings.json.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Enable allow_storages after rule.rego rebase merge.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
$ git rev-parse cc-msft-prototypes
328e440
$ git checkout cc-msft-prototypes src/utarfs
$ git checkout cc-msft-prototypes src/tarfs
$ git checkout cc-msft-prototypes src/overlay
$ git checkout cc-msft-prototypes src/tardev-snapshotter
$ git checkout cc-msft-prototypes Makefile
Add option to use an IGVM image in the UVM

Signed-off-by: Dallas Delaney <dadelan@microsoft.com>
Add configuration file configuration-clh-snp.toml with SNP related settings enabled

Signed-off-by: Dallas Delaney <dadelan@microsoft.com>
* Remove unused param from CLH SNP config
* Remove extra package definition from cbl-mariner rootfs config
* Enable SNP=on with CH conf-guest enabled and add dummy host_data value

* Add IGVM, HostData, Snp to config markdown doc

* sanitize clh-snp.toml.in and clh.toml.in

* Further changes required for SEV SNP enablement

* Remove unnecessary debug output

* Update outdated comment in config
Signed-off-by: Dallas Delaney <dadelan@microsoft.com>
This makes it so that any container has access to /dev/sev-guest out of
the box with no privileges required.

Since /dev/sev-guest isn't available yet, I've validated this change
using /dev/cpu_dma_latency (original chmod 600) by:

 1. Verifying that the device is present in the container.
 2. Verifying that reading from the device from a container yields
    the same result as from the VM context.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
Add image build macro to change partition format for kernel's "dm-mod.create" command, and allow for igvm + image usecase in kata shim

Signed-off-by: Dallas Delaney <dadelan@microsoft.com>
This is a workaround for
kata-containers#7993.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
* rootfs: delete some of the mariner packages

Delete some of the mariner packages from the Guest image, for faster
TEE memory measurement.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
The shell is useful for debugging.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Useful for debugging.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Useful for debugging.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Make sure the hash of an incoming Policy matches the value of the
SNP Host Data field. The value of Host Data will be validated through
Remote Attestation, outside of this patch.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
There are 10 segments in the ACPI tables, and CLH works better when
it uses all of them.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This is needed when enabling dm-verity. `udevd` reads kernel uevents
that announce the creation of `/dev/dm-XXX` devices, and then creates
devices with the actual names under `/dev/mapper/`.

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
This allows us to avoid repeating paths when they're the same.

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
This replicates Wedson's changes in 0935263 in a way that is aligned
with the upstream implementation introduced in
kata-containers#7200.

NOTE: This will require compiling the runtime with
DEFSHAREDFS_CLH_SNP_VIRTIOFS=none.
Upstream now uses the new DEFSTATICRESOURCEMGMT_TEE variable to set static
resource management for TEEs so we align on that. It's true by default so we
don't have to update our build script for this.

NOTE: For non-tee CH, upstream now uses DEFSTATICRESOURCEMGMT_CLH (already in
our codebase) instead of DEFSTATICRESOURCEMGMT. It's still false by default so
we WILL have to update our build script for this one.
Adjust configuration-clh-snp.toml.in to be more consistent with
configuration-qemu-snp.toml.in.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
The layer string is now base64-encoded, so decode it before inspecting
the fields.

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
3u13r and others added 14 commits March 31, 2025 19:18
The policy module augments the policy generated with genpolicy by keeping and
providing state to each invocation.
Therefore, it is not sufficient anymore to test the passing of requests in
the genpolicy crate.

Since in Rust, integration tests cannot call functions that are not exposed
publicly, this commit factors out the policy module of the agent into its
own crate and exposes the necessary functions to be consumed by the agent
and an integration tests. The integration test itself is implemented in the
following commits.

Signed-off-by: Leonard Cohnen <lc@edgeless.systems>
The generated rego policies for `CreateContainerRequest` are stateful and that
state is handled in the policy crate. We use this policy crate in the
genpolicy integration test to be able to test if those state changes are
handled correctly without spinning up an agent or even a cluster.

This also allows to easily test on a e.g., CreateContainerRequest level
instead of relying on changing the yaml that is applied to a cluster.

Signed-off-by: Leonard Cohnen <lc@edgeless.systems>
Move PolicyCopyFile request to shared policy crate so we can test it

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
Useful for go dependency upgrades, so that we actually
commit changes in the vendor folder

Signed-off-by: Manuel Huber <mahuber@microsoft.com>
Introduce rule to block routes from source addresses which are the
loopback. Block routes added to the lo device.

Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>
Introduce rules for UpdateInterfaceRequest and genpolicy tests for them.

Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>
Add test cases for basic and legacy requests to create pause container

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
Validate sandbox name using a regex. If the YAML specifies metadata.name, use a regex that exact matches.
If the YAML specifies metadata.generateName, use a regex that matches the prefix of the generated name.

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
Copy the sample yaml files from the msft-main branch after rebasing
to upstream.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Fix the scripts after rebase merge.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Re-generate Cargo.lock for protocols, agent, and genpolicy.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Update samples with the latest policy after rebasing to recent
upstream code.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This keeps the repo changes clean while running update script and removes the
need to remove the test settings at the end of the script

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
Replicate some samples from mcr to marinerconfpodstest since they fail to pull sometimes from mcr
Copy link

@Sumynwa Sumynwa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to handle

.get_mut(&ctr.init_process_pid)

@Ankita13-code Ankita13-code force-pushed the ankitapareek/exec-id-agent-fix branch from 497cb9d to 25c2185 Compare May 22, 2025 13:42
@Ankita13-code
Copy link
Author

We also need to handle

.get_mut(&ctr.init_process_pid)

Handled this case as well!

@danmihai1
Copy link

Please explain in the commit message that multiple Execs using a single exec_id can be problematic especially for use cases where the Host is trusted less than the Guest (e.g., CoCo use cases).

@@ -584,7 +584,7 @@ impl AgentService {
let p = sandbox.find_container_process(cid.as_str(), eid.as_str())?;

p.exit_watchers.push(exit_send);
pid = p.pid;
// pid = p.pid;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this line should be deleted.

@@ -1258,7 +1254,7 @@ impl BaseContainer for LinuxContainer {
let spec = self.config.spec.as_mut().unwrap();
update_namespaces(&self.logger, spec, p.pid)?;
}
self.processes.insert(p.pid, p);
self.processes.insert(p.exec_id.clone(), p);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we need to check at the beginning of start() if the exec_id has been used already and return failure instead of executing if we detect such collision.

Copy link

@Sumynwa Sumynwa May 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @danmihai1 for pointing this out. Yes, we need to handle collision & return failure.
This was the expectation in previous implementation as well, but it allowed misconfiguration and, hence allowed collision.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch @danmihai1! Added a check for this scenario at the beginning of start().

This patch changes the container process HashMap to use exec_id as the primary
key instead of PID, preventing exec_id collisions that could be exploited in
Confidential Computing scenarios where the host is less trusted than the guest.

Key changes:
- Changed `processes: HashMap<pid_t, Process>` to `HashMap<String, Process>`
- Added exec_id collision detection in `start()` method
- Updated process lookup operations to use exec_id directly
- Simplified `get_process()` with direct HashMap access

This prevents multiple exec operations from reusing the same exec_id, which
could be problematic in CoCo use cases where process isolation and unique
identification are critical for security.

Signed-off-by: Ankita Pareek <ankitapareek@microsoft.com>
@Ankita13-code Ankita13-code force-pushed the ankitapareek/exec-id-agent-fix branch from d1865ca to c317634 Compare May 23, 2025 10:43
@Ankita13-code
Copy link
Author

Please explain in the commit message that multiple Execs using a single exec_id can be problematic especially for use cases where the Host is trusted less than the Guest (e.g., CoCo use cases).

Done!

@@ -584,8 +584,7 @@ impl AgentService {
let p = sandbox.find_container_process(cid.as_str(), eid.as_str())?;

p.exit_watchers.push(exit_send);
pid = p.pid;


Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary empty line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.