Skip to content

Conversation

Mivik
Copy link
Collaborator

@Mivik Mivik commented Mar 10, 2025

Use AddrSpace::check_region_access to implement UserPtr.

@AsakuraMizu
Copy link
Collaborator

It seems that using AddrSpace::check_region_access only instead of querying the page table may lead to a new issue that if the user program uses mmap to lazily allocate some memory without using it and passed some address in that memory area to a syscall, which means that the virtual address is not mapped to a physical one but still valid in the address space, the check here will pass and cause a page fault in the kernel.

@Mivik
Copy link
Collaborator Author

Mivik commented Mar 10, 2025

It seems that using AddrSpace::check_region_access only instead of querying the page table may lead to a new issue that if the user program uses mmap to lazily allocate some memory without using it and passed some address in that memory area to a syscall, which means that the virtual address is not mapped to a physical one but still valid in the address space, the check here will pass and cause a page fault in the kernel.

However using check_region_access is correct and page fault in kernel should be properly handled. Imagine invoking write(fd, buf) with a mem-mapped memory region (to disk file for example) in Linux, pages are not physically allocated and are allocated in-time when calling syscall. If we query the page this would result in an early EFAULT.

@Mivik
Copy link
Collaborator Author

Mivik commented Mar 10, 2025

Is there any particular reason we're not handling page fault from kernel?

https://github.com/oscomp/starry-next/blob/main/src/mm.rs#L196

@Azure-stars
Copy link
Collaborator

Is there any particular reason we're not handling page fault from kernel?

https://github.com/oscomp/starry-next/blob/main/src/mm.rs#L196

We think that memory access operation from kernel mode which cause page fault will be an illegal operation. If user hope to access the lazy area through syscall like write, kernel need to check whether the memory area is valid. If the area is a lazy area, kernel need to allocate physical pages for it and continue. In this process it should not cause page fault.

@Azure-stars
Copy link
Collaborator

Is there any particular reason we're not handling page fault from kernel?
https://github.com/oscomp/starry-next/blob/main/src/mm.rs#L196

We think that memory access operation from kernel mode which cause page fault will be an illegal operation. If user hope to access the lazy area through syscall like write, kernel need to check whether the memory area is valid. If the area is a lazy area, kernel need to allocate physical pages for it and continue. In this process it should not cause page fault.

It is aim to reduce overhead from trap causing by page fault.

@Azure-stars
Copy link
Collaborator

It seems that using AddrSpace::check_region_access only instead of querying the page table may lead to a new issue that if the user program uses mmap to lazily allocate some memory without using it and passed some address in that memory area to a syscall, which means that the virtual address is not mapped to a physical one but still valid in the address space, the check here will pass and cause a page fault in the kernel.

However using check_region_access is correct and page fault in kernel should be properly handled. Imagine invoking write(fd, buf) with a mem-mapped memory region (to disk file for example) in Linux, pages are not physically allocated and are allocated in-time when calling syscall. If we query the page this would result in an early EFAULT.

So @AsakuraMizu 's opinion is right. When we check whether region is valid, the flag of the page table is not necessarily equal to the area flag.

It can be solved by check whether the area belong to Maparea::alloc and populate field is false. If we invoke alloc_for_lazy func for a lazy area, we can update the populate field to be true.

@Mivik
Copy link
Collaborator Author

Mivik commented Mar 10, 2025

It is aim to reduce overhead from trap causing by page fault.

It can be solved by check whether the area belong to Maparea::alloc and populate field is false. If we invoke alloc_for_lazy func for a lazy area, we can update the populate field to be true.

I checked Linux kernel and they have copy_{from/to}_user, which basically does:

  • Check if user have proper access to specific memory region
  • Enable supervisor-user memory access
  • Use exception fixups to stop page fault propagating into kernel panic

So basically, Linux is tolerating page fault in kernel space under this userspace accessing scenario. I don't think we should treat this specially (like pre-allocate all the pages).

@Azure-stars
Copy link
Collaborator

It is aim to reduce overhead from trap causing by page fault.
It can be solved by check whether the area belong to Maparea::alloc and populate field is false. If we invoke alloc_for_lazy func for a lazy area, we can update the populate field to be true.

I checked Linux kernel and they have copy_{from/to}_user, which basically does:

  • Check if user have proper access to specific memory region
  • Enable supervisor-user memory access
  • Use exception fixups to stop page fault propagating into kernel panic

So basically, Linux is tolerating page fault in kernel space under this userspace accessing scenario. I don't think we should treat this specially (like pre-allocate all the pages).

Give me some time to search more information, sorry!

@Azure-stars
Copy link
Collaborator

Perhaps it would be more convenient to put the processing of access to the lazy area in the page fault processing, avoiding special judgments for all operations that need to access user addresses.

@Azure-stars
Copy link
Collaborator

Then please add processing for page fault for kernel mode at https://github.com/oscomp/starry-next/blob/main/src/mm.rs#L196 in this PR @Mivik

@Azure-stars
Copy link
Collaborator

Fix the CI test

@Azure-stars Azure-stars merged commit 2b7a74f into oscomp:main Mar 12, 2025
25 checks passed
Ressed pushed a commit that referenced this pull request Mar 15, 2025
* feat: update implementation of UserPtr

* style: remove unused code

* feat(ptr): use check_page in check_cstr

* feat(ptr): force allocate pages on get ptr

* fix: upload mm.rs

* fix: disable preemption inside access_user_memory

* feat(ptr): add get_as_null_terminated to replace get_as_cstr

* feat(ptr): add nullable helper

* style(ptr): refactor implementation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants