Skip to content

Conversation

@quininer
Copy link
Contributor

Description:

Following our previous discussion, I am trying to introduce a self-describing serialization encoding for ast to replace #11069 .

Since serde is currently used for json serialization, it is hard to modify serde to support both json and binary encodings (due to the untagged and flatten attr). So I made our own Encode and Decode proc macro, which also facilitate unknown variant support.

benchmark

I've done some simple benchmarks with a modified swc ast, and I believe we don't have an unacceptable performance loss relative to rkyv.
I'm using cbor4ii here, which I developed, so I'd prefer it if there's no performance difference. but it's easy to migrate to bincode or other scheme.

This also introduces an unknown feature for ecma_ast crate, which should only be enabled by the wasm plugin to ensure that old plugins can decode new AST.

BREAKING CHANGE:

This PR itself only introduces new encoding and will not cause any breakage. However, it may cause breakage when switch the wasm plugin communication scheme in future.

Related issue (if exists):

@changeset-bot
Copy link

changeset-bot bot commented Sep 18, 2025

🦋 Changeset detected

Latest commit: f23fdbf

The changes in this PR will be included in the next version bump.

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@quininer quininer force-pushed the r/self-desc-encode branch 2 times, most recently from c9d0a01 to de8261b Compare September 18, 2025 12:27
@codspeed-hq
Copy link

codspeed-hq bot commented Sep 18, 2025

CodSpeed Performance Report

Merging #11100 will not alter performance

Comparing quininer:r/self-desc-encode (f23fdbf) with main (dba23f5)

Summary

✅ 138 untouched

@socket-security
Copy link

socket-security bot commented Sep 19, 2025

No dependency changes detected. Learn more about Socket for GitHub.

👍 No dependency changes detected in pull request

@quininer quininer force-pushed the r/self-desc-encode branch 12 times, most recently from 1b60f1d to c5fab18 Compare September 22, 2025 12:10
@quininer quininer force-pushed the r/self-desc-encode branch 5 times, most recently from 4da0e40 to 815343f Compare September 23, 2025 09:47
kdy1 pushed a commit that referenced this pull request Oct 12, 2025
**Description:**

This is the split of #11100.
Since we need to add unknown variants to the ast enum, this requires a
lot of code changes to handle non-exhaustive matches. I'm doing this in a
separate PR for review.
@quininer
Copy link
Contributor Author

This also requires a serialization test, include unknown variants.

Since I'm busy with other work, it will be a little slower.

@quininer quininer force-pushed the r/self-desc-encode branch 2 times, most recently from c736814 to 46b5ab6 Compare October 15, 2025 10:35
Copilot AI review requested due to automatic review settings October 30, 2025 04:17
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 30, 2025

Binary Sizes

File Size
swc.linux-x64-gnu.node 31M (32023392 bytes)

Commit: c4ed6d2

Copilot AI review requested due to automatic review settings October 31, 2025 07:30
@quininer quininer force-pushed the r/self-desc-encode branch 2 times, most recently from 6d0f5ff to 3737124 Compare October 31, 2025 07:32
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 107 out of 112 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pub s: String,
}

#[ast_node("Tuple")]
Copy link

Copilot AI Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change from usize to u64 alters the semantics of the Tuple struct. On 32-bit platforms, usize is 32 bits while u64 is 64 bits, which could lead to different behavior. This appears to be unrelated to the encoding implementation and should be documented or justified.

Suggested change
#[ast_node("Tuple")]
#[ast_node("Tuple")]
// NOTE: We use `u64` instead of `usize` for cross-platform consistency and to avoid issues with serialization/deserialization on 32-bit vs 64-bit platforms. If you need platform-dependent sizing, consider using `usize` and document the rationale.

Copilot uses AI. Check for mistakes.
@@ -1,5 +1,5 @@
#![allow(deprecated)]
#![deprecated = "Not used by swc, and this will be removed with next breaking change"]

Copy link

Copilot AI Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deprecation attribute and warning comment were removed without explanation. If this module is no longer deprecated and is being actively used for CBOR encoding support, this should be documented in the PR description or in a code comment explaining why the deprecation was lifted.

Copilot uses AI. Check for mistakes.
Comment on lines +40 to +43
// This is not sound, maybe Wtf8Buf should make bytes operations safe
Ok(Self(hstr::Wtf8Atom::from(unsafe {
Wtf8Buf::from_bytes_unchecked(s.0.into())
})))
Copy link

Copilot AI Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment admits this is 'not sound' yet uses unsafe code without validation. CBOR decoding should validate that the bytes form valid WTF-8. Consider adding validation before the unsafe conversion or document why it's safe in this context.

Suggested change
// This is not sound, maybe Wtf8Buf should make bytes operations safe
Ok(Self(hstr::Wtf8Atom::from(unsafe {
Wtf8Buf::from_bytes_unchecked(s.0.into())
})))
// Validate that the bytes are valid WTF-8 before constructing Wtf8Buf
match Wtf8::from_bytes(s.0) {
Some(_) => Ok(Self(hstr::Wtf8Atom::from(Wtf8Buf::from_bytes_unchecked(s.0.into())))),
None => Err(cbor4ii::core::dec::Error::message("Invalid WTF-8 bytes")),
}

Copilot uses AI. Check for mistakes.
Copy link
Member

@kdy1 kdy1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for the hard work!

@kdy1 kdy1 changed the title refactor(ast): Introduce flexible serialization encoding for ast refactor(ast): Introduce flexible serialization encoding for AST Oct 31, 2025
@kdy1 kdy1 enabled auto-merge (squash) October 31, 2025 07:44
quininer and others added 4 commits October 31, 2025 15:58
move to ast_node

make encoding-impl optional

fix fmt

make css_ast

fix tuple

fix unknown

fix fmt

make html_ast

make xml_ast

make jsdoc

fix fmt

fix unit tag

update visit

fix parser

make regexp_ast

no unknown

fix snap

fix arbitrary

encoding: ignore unknown field

fix cfg

fix enum struct derive
Refactor AST serialization to support flexible encoding.
Copilot AI review requested due to automatic review settings October 31, 2025 07:58
auto-merge was automatically disabled October 31, 2025 07:58

Head branch was pushed to by a user without write access

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 108 out of 113 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@kdy1 kdy1 merged commit 8ad3647 into swc-project:main Oct 31, 2025
180 checks passed
@quininer quininer deleted the r/self-desc-encode branch October 31, 2025 10:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants