Replies: 12 comments 4 replies
-
Thanks! We'll try to study this in the coming weeks. |
Beta Was this translation helpful? Give feedback.
-
To make comparison easier, here's a GitHub diff from v3.9.5, which is where this code was forked: https://github.com/python/cpython/compare/v3.9.5...yuleil:cds?expand=1 |
Beta Was this translation helpful? Give feedback.
-
I haven't looked deeply in the implementation, but the idea looks decent enough: there's a "dump" mode that creates an mmap'ed segment with a snapshot of the heap in a file (or some part of it -- perhaps only code objects?), and a "use" mode that maps that file into memory at the original address. The beauty is that this supports arbitrary 3rd party modules. The complexity is caused by the need to fix up the segment after it's been mmap'ed in, because
The solution is nicely general, but requires two new Comparing with Experiment E (#84), the tooling is easier to use with 3rd party modules, although the dump/use mechanism is a bit clunky. I wonder if you could borrow an idea from #84 and generate a table of fix-ups for the mmap'ed segment that do things like patching references to standard types and singleton values, instead of adding new Ideal would be if you could package this as a 3rd party extension module that can be distributed via PyPI. |
Beta Was this translation helpful? Give feedback.
-
Some more questions:
|
Beta Was this translation helpful? Give feedback.
-
We are thankful for your timely feedback. Below are some explanations regarding to your questions.
The reason for the two new
Our access to the data in mmap'ed segments is not read-only, considering stuff like the reference counting and patches to
This matches perfectly with our planning. We also hope to distribute this via PyPI and will continue working in that direction. more Q & A
Not yet. We will take a close look at that.
def patch_import_paths():
if sys.flags.cds_mode == 1:
def patch_get_code(orig_get_code):
def wrap_get_code(self, name):
code = orig_get_code(self, name)
SharedCodeWrap.set_module_code(name, code)
return code
return wrap_get_code
SourceFileLoader.get_code = patch_get_code(SourceLoader.get_code) We patch the One challenge here is that all objects referenced by
This function was developed on MacOS systems running the M1 chip. It will be perfectly fine working on Mac and Linux. It doesn’t work on Windows for now, due to the current usage of the
A practical reference is OpenJDK's JEP 341: Default CDS Archives, which generates a CDS archive of JDK internal classes at build time. When a user needs to dump 3rd party libraries, a new archive file is generated, with the stdlib and 3rd party libraries used by the program. So there’s no need to use the pre-determined archive. We tested an empty Python program We sincerely appreciate the attention you are giving to this. We will continue working on ways to make our deliveries more efficient. I will keep you posted on our progress. |
Beta Was this translation helpful? Give feedback.
-
Thanks for your answers. I hope you bring the project to maturity. I have one follow-up question:
That number looks suspiciously low. Which modules are included in that? The PYC files for the stdlib total to at least 70 MB. |
Beta Was this translation helpful? Give feedback.
-
CDS uses a trace-based model. Since the test program is More specifically, these modules are:
|
Beta Was this translation helpful? Give feedback.
-
The previous branch is obsolete and we've rewritten the implementation (e.g. remove extra fields in type object and less hard-coded GC logics), which can be found at python/cpython@54a4e1b...oraluben:cds/main. This will be the new basement of our future development. |
Beta Was this translation helpful? Give feedback.
-
Thanks. I don't think anyone on our team will have time to review the new version before our meeting, so hopefully you can explain some of the differences when we talk tomorrow. |
Beta Was this translation helpful? Give feedback.
-
Hi, during implementing a third-party version of this CDS approach, there’s an issue that we would like to hear your advice. As we’ve introduced, there’re three roles in CDS progress, and we need to set role of each python instance. The CPython fork POC reads The third-party CDS have APIs like Is there any way we can inject a start hook to achieve that?
|
Beta Was this translation helpful? Give feedback.
-
Status update in python-ideas: https://mail.python.org/archives/list/python-ideas@python.org/thread/UKEBNHXYC3NPX36NS76LQZZYLRA4RVEJ/ |
Beta Was this translation helpful? Give feedback.
-
Finally, we're excited to share the open-sourced third-party library at https://github.com/alibaba/code-data-share-for-python. We're currently working on detailed docs and infra setup (CI & releases based on Github Actions) and PyPI package is not available yet, but I think they'll be ready very soon. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
This is a Cpython startup improvement approach proposed by Alibaba Compiler Team.
We are working on ways to speed up python application startup time. The main idea here is sharing code objects from mmaped file, which produces similar startup benefits with a simpler implementation, compared to Experiment E.
Our design is inspired by the Application Class-Data Sharing (AppCDS) feature, introduced in OpenJDK. AppCDS allows a set of application classes to be pre-processed into a shared archive file, which can then be memory-mapped at runtime to reduce startup time and memory footprint.
Based on the above principle, we proposed Code-Data Sharing (CDS) approach, which allows a set of code objects to be deep copied into a memory-mapped heap image file. During runtime:
MAP_FIXED
to map to the predetermined heap image to ensure that the pointers are correctob_type
may point to wrong address in memory. The solution is to patch the correct address forob_type
by traversing each object in heap image.frozen_set
sExperiments
Env: Linux & Intel skylake
Running empty application
Startup time benefits: 19.18% reduction
WebServer (flask + requests + pymongo)
Startup time benefits: 15.18% reduction
Summary
Compared to the existing approaches, the main contribution of Our CDS approach includes:
CDS use the heap object directly, while the memory-mapped implementation in PyICE needs some deserialization
CDS doesn't need to generate C source code, thus avoiding using C toolchain for compiling. This is essential for a production environment on the cloud
Considering AppCDS has proved to be successful in OpenJDK 10, we believe our proposal can be a practical feature to enhance CPython startup performance, even while our overall design is still evolving.
Beta Was this translation helpful? Give feedback.
All reactions