- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.3k
refactor(ast): Introduce flexible serialization encoding for AST #11100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| 🦋 Changeset detectedLatest commit: f23fdbf The changes in this PR will be included in the next version bump. Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR | 
c9d0a01    to
    de8261b      
    Compare
  
    | CodSpeed Performance ReportMerging #11100 will not alter performanceComparing  Summary
 | 
de8261b    to
    bc125dc      
    Compare
  
    | No dependency changes detected. Learn more about Socket for GitHub. 👍 No dependency changes detected in pull request | 
1b60f1d    to
    c5fab18      
    Compare
  
    4da0e40    to
    815343f      
    Compare
  
    815343f    to
    1babf03      
    Compare
  
    **Description:** This is the split of #11100. Since we need to add unknown variants to the ast enum, this requires a lot of code changes to handle non-exhaustive matches. I'm doing this in a separate PR for review.
1babf03    to
    9d04340      
    Compare
  
    | This also requires a serialization test, include unknown variants. Since I'm busy with other work, it will be a little slower. | 
c736814    to
    46b5ab6      
    Compare
  
    0e806c9    to
    f15f7b3      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
| Binary Sizes
 Commit: c4ed6d2 | 
8d05be8    to
    172a594      
    Compare
  
    6d0f5ff    to
    3737124      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 107 out of 112 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| pub s: String, | ||
| } | ||
|  | ||
| #[ast_node("Tuple")] | 
    
      
    
      Copilot
AI
    
    
    
      Oct 31, 2025 
    
  
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change from usize to u64 alters the semantics of the Tuple struct. On 32-bit platforms, usize is 32 bits while u64 is 64 bits, which could lead to different behavior. This appears to be unrelated to the encoding implementation and should be documented or justified.
| #[ast_node("Tuple")] | |
| #[ast_node("Tuple")] | |
| // NOTE: We use `u64` instead of `usize` for cross-platform consistency and to avoid issues with serialization/deserialization on 32-bit vs 64-bit platforms. If you need platform-dependent sizing, consider using `usize` and document the rationale. | 
| @@ -1,5 +1,5 @@ | |||
| #![allow(deprecated)] | |||
| #![deprecated = "Not used by swc, and this will be removed with next breaking change"] | |||
|  | |||
    
      
    
      Copilot
AI
    
    
    
      Oct 31, 2025 
    
  
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The deprecation attribute and warning comment were removed without explanation. If this module is no longer deprecated and is being actively used for CBOR encoding support, this should be documented in the PR description or in a code comment explaining why the deprecation was lifted.
| // This is not sound, maybe Wtf8Buf should make bytes operations safe | ||
| Ok(Self(hstr::Wtf8Atom::from(unsafe { | ||
| Wtf8Buf::from_bytes_unchecked(s.0.into()) | ||
| }))) | 
    
      
    
      Copilot
AI
    
    
    
      Oct 31, 2025 
    
  
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment admits this is 'not sound' yet uses unsafe code without validation. CBOR decoding should validate that the bytes form valid WTF-8. Consider adding validation before the unsafe conversion or document why it's safe in this context.
| // This is not sound, maybe Wtf8Buf should make bytes operations safe | |
| Ok(Self(hstr::Wtf8Atom::from(unsafe { | |
| Wtf8Buf::from_bytes_unchecked(s.0.into()) | |
| }))) | |
| // Validate that the bytes are valid WTF-8 before constructing Wtf8Buf | |
| match Wtf8::from_bytes(s.0) { | |
| Some(_) => Ok(Self(hstr::Wtf8Atom::from(Wtf8Buf::from_bytes_unchecked(s.0.into())))), | |
| None => Err(cbor4ii::core::dec::Error::message("Invalid WTF-8 bytes")), | |
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for the hard work!
move to ast_node make encoding-impl optional fix fmt make css_ast fix tuple fix unknown fix fmt make html_ast make xml_ast make jsdoc fix fmt fix unit tag update visit fix parser make regexp_ast no unknown fix snap fix arbitrary encoding: ignore unknown field fix cfg fix enum struct derive
Refactor AST serialization to support flexible encoding.
Head branch was pushed to by a user without write access
f497f9e    to
    f23fdbf      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 108 out of 113 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Description:
Following our previous discussion, I am trying to introduce a self-describing serialization encoding for ast to replace #11069 .
Since
serdeis currently used for json serialization, it is hard to modify serde to support both json and binary encodings (due to theuntaggedandflattenattr). So I made our ownEncodeandDecodeproc macro, which also facilitate unknown variant support.I've done some simple benchmarks with a modified swc ast, and I believe we don't have an unacceptable performance loss relative to
rkyv.I'm using
cbor4iihere, which I developed, so I'd prefer it if there's no performance difference. but it's easy to migrate tobincodeor other scheme.This also introduces an
unknownfeature forecma_astcrate, which should only be enabled by the wasm plugin to ensure that old plugins can decode new AST.BREAKING CHANGE:
This PR itself only introduces new encoding and will not cause any breakage. However, it may cause breakage when switch the wasm plugin communication scheme in future.
Related issue (if exists):