[ php-wasm ] add intl extension#2187
[ php-wasm ] add intl extension#2187adamziel merged 38 commits intoWordPress:trunkfrom mho22:add-intl-extension
intl extension#2187Conversation
|
I tried to optimize the But first of all, these are the sizes of the My first attempt was to greatly decrease the it resulted in a 12.7Mb size. Not great, not terrible. The second attempt was to decrease the lib directory with new flags : Unfortunately PHP won't compile with the So I ended up having my best optimization process : And here is a comparison between php 8.4 with intl
- export const dependenciesTotalSize = 16143865;
+ export const dependenciesTotalSize = 18472927;php 8.4 with intl -DUCONFIG_NO_LEGACY_CONVERSION=1 -DUCONFIG_NO_COLLATION=1 -DUCONFIGU_NO_FORMATTING=1 -DUCONFIG_NO_TRANSLITERATION=1 -DUCONFIG_NO_REGULAR_EXPRESSIONS=1
- export const dependenciesTotalSize = 16143865;
+ export const dependenciesTotalSize = 18135309;Questions :
|
What are the consequences of having them? Are we missing out on some languages or types of information? Or is it just more compressed? If we retain most information, yes, let's keep those flags on.
Just to summarize the total download size impact for the JSPI build
29MB is way too large for a default download on the web, so let's leave
Would using the latest version be just a matter of changing the build configuration? If so, let's do it. However, if that would create additional compilation hurdles, let's stick with 74.2 for now. It's from December 2023 so still fairly recent.
Thinking about Node.js, a separate file seems fine. Here's a few thoughts I had:
|
|
Adding the others will disable some php functions like Here are the different sizes without and with php 8.4 without intl
php_8_4.wasm: 16,1 Mb
php_8_4.js: 148 Kbphp 8.4 with intl without filters
data : 31,9 Mb
include : 5,1 Mb
lib : 8,5 Mb
php.data: 31,9 Mb
php_8_4.wasm: 18,5 Mb
php_8_4.js: 153 Kbphp 8.4 with intl -DUCONFIG_NO_LEGACY_CONVERSION=1 -DUCONFIG_NO_REGULAR_EXPRESSIONS=1 without filters
data : 31,9 Mb
include : 5,1 Mb
lib : 8,2 Mb
php.data: 31,9 Mb
php_8_4.wasm: 18,4 Mb
php_8_4.js: 153 KbThese builds are made with latest ICU version 77.1. Nothing more has to be done to make this version work.
Just to be sure, should I disable
If you add multiple
If I understand that correctly : We have two strategies here. First is the
I am still investigating this but simply instanciating the environment variable
Adding that line in the This is not the |
Only in web, let's still build the Node version with |
Gotcha! What would it take to still rename it, though? Would it be as simple as a string replacement in the built
We'll need to keep Asyncify until Blink (Safari, Bun) supports JSPI 😢
Yes, e.g. XDebug is a dynamic library and it's a short term priority. Lazy loading will be challenging in that we'll need to create extension stubs with the right function signatures to trick PHP into thinking it actually loaded the extension.
I've meant this one: But it seems to be called too late. Hm. There's always the ENV here that we can control without messing with the Perhaps there's some elegant way of injecting that env variable from here: Or maybe baking it into the php.js module is for the best, since it depends on the build options. Looping in @brandonpayton for thoughts |
Yes but there is one for
It is as easy as it looks. What would you like to name it? Maybe
This works as you mentioned : However, it implies that the path can be changed, while in reality it's fixed at build time based on this line in But it is indeed way more elegant and it means we can avoid a |
If we are configuring a fixed path that we completely control, it seems like it would be cleanest to just bake a global into the build. I haven't digested all the details in this PR, but adding another |
This is great! Lovely! To confirm my understanding:
Is that right? If yes then yes, let's build all php versions WITH_INTL. Then, separately from this PR, let's discuss the API to load the dat file on the web. In Node we can just always load it. |
|
@adamziel That's right! I probably need some extra informations :
PHP will still work, and intl extension will be loaded, but when running intl functions without the data from the dat file, php exceptions will be thrown. |
For
I'm confused. I thought it worked as well without
This is fine for v1. For v2, let's explore disabling those functions – I worry some developers might check the availability of the |
|
@adamziel I was wrong about the size of
Users could add it manually in
Users could add it manually after const php = new PHP( await loadNodeRuntime( '8.3' ) );
php.mkdir( '/icu-data-path' );
php.writeFile( '/icu-data-path/icudt74l.dat', fs.readFileSync( 'node_modules/@php-wasm/node/shared/icudt74l.dat' ) );OR I should do it in the code, around
export async function loadNodeRuntime(
phpVersion: SupportedPHPVersion,
options: PHPLoaderOptions = {}
) {
const emscriptenOptions: EmscriptenOptions = {...};
const id = await loadPHPRuntime(
await getPHPLoaderModule(phpVersion),
await withNetworking(emscriptenOptions)
);
const php = new PHP( id );
php.mkdir( '/icu-data-path' );
php.writeFile( '/icu-data-path/icudt74l.dat', new Uint8Array( readFileSync( `${__dirname}/shared/icudt74l.dat` ) ) );
return id;
}And yes, this is really bad. But now this code works without having to indicate a ENV variable or loading a data file by myself : import { PHP } from '@php-wasm/universal';
import { loadNodeRuntime } from '@php-wasm/node';
const code = `<?php
$formatter = new \NumberFormatter('en-US', \NumberFormatter::CURRENCY);
var_dump($formatter->format(100.00));
$formatter = new \NumberFormatter('fr-FR', \NumberFormatter::CURRENCY);
var_dump($formatter->format(100.00));`
const php = new PHP( await loadNodeRuntime( '8.3' ) );
const result = await php.run( { code : code } );
console.log( result.text );But honestly, this is not the right solution. What do you think about it ?
Apologies for the confusion. It works as well without, I just wanted to know what was the best way for you, and it seems to be the "after runtime loaded" way. |
|
@adamziel Regarding the directories, I suggest creating a This setup will streamline the transfer of the
I think |
|
Shared directory sounds great! The rest I'll address on Monday, but the rule of thumb is this: we dont want the minimum download size by more than 2-3 MB |
Ideally we'd disable these functions when the file is missing. It's tricky with a statically built extension – we'd either need to
It's not bad at all! But it would be a bit cleaner if we added another const id = await loadPHPRuntime(
await getPHPLoaderModule(phpVersion),
await withNetworking(emscriptenOptions),
await withIcuData()
);Where We could parallelize all the async stuff to speed it up: const args = await Promise.all([
getPHPLoaderModule(phpVersion),
withNetworking(emscriptenOptions),
withIcuData()
]);
const id = await loadPHPRuntime( ...args );And, finally, make the ICU loaded conditional in the browser: const args = await Promise.all([
getPHPLoaderModule(phpVersion),
withNetworking(emscriptenOptions),
options.withICU && withIcuData()
]);
const id = await loadPHPRuntime( ...args );That way, |
|
Aha, there's no way to access the |
In dynamic symlink mounting, I used onRuntimeIntialized to access PHP in loadNodeRuntime. |
|
@adamziel Thanks for pointing me in the right direction! Since the First, the data file copy from
// Copy data files
const libDir = path.resolve(process.cwd(), 'packages/php-wasm/compile');
const publicDir =
platform === 'node'
? `${path.dirname(outputDir)}`
: `${path.dirname(path.dirname(outputDir))}`;
if (getArg('WITH_INTL').endsWith('yes')) {
await asyncSpawn(
'cp',
[`${libDir}/libintl/icudt74l.dat`, `${publicDir}/shared/icudt74l.dat`],
{ cwd: sourceDir, stdio: 'inherit' }
);
}Next is the copy of
try {
fs.mkdirSync('dist/packages/php-wasm/node/shared');
fs.copyFileSync(
'packages/php-wasm/node/shared/icudt74l.dat',
'dist/packages/php-wasm/node/shared/icudt74l.dat'
);
} catch (e) {
// Ignore
}Nothing to do on the Next up is the data file loading. Let’s start with the initialization of the environment variable:
ENV: {
ICU_DATA: '/shared',
},Then the
...(options.emscriptenOptions || {}),
onRuntimeInitialized: (phpRuntime: PHPRuntime) => {
/*
* An ICU data file must be loaded to support Intl extension.
* To achieve this, a shared directory is mounted and referenced
* via the ICU_DATA environment variable.
* By default, this variable is set to `/shared`,
* which corresponds to the actual file location.
*/
const icuFileName = 'icudt74l.dat';
const icuFilePath = `${__dirname}/shared/${icuFileName}`;
if (
!FSHelpers.fileExists(phpRuntime.FS, `${phpRuntime.ENV.ICU_DATA}/${icuFileName}`) &&
fs.existsSync(icuFilePath)
) {
phpRuntime.FS.mkdirTree(phpRuntime.ENV.ICU_DATA);
phpRuntime.FS.writeFile(
`${phpRuntime.ENV.ICU_DATA}/${icuFileName}`,
new Uint8Array(fs.readFileSync(icuFilePath))
);
}
},The data file will be written to whatever path However, in the
export async function loadWebRuntime(
phpVersion: SupportedPHPVersion,
options: LoaderOptions = {}
) {
const emscriptenOptions: EmscriptenOptions = {
...(options.emscriptenOptions || {}),
onRuntimeInitialized: async (phpRuntime: PHPRuntime) => {
/*
* An ICU data file must be loaded to support Intl extension.
* To achieve this, a shared directory is mounted and referenced
* via the ICU_DATA environment variable.
* By default, this variable is set to `/shared`,
* which corresponds to the actual file location.
* The web version requires a `loaderOption` to load ICU data.
*/
if (options?.withICU === true) {
const icuFileName = 'icudt74l.dat';
const icuFilePath = 'node_modules/@php-wasm/web/shared/icudt74l.dat';
if (
!FSHelpers.fileExists(phpRuntime.FS, `${phpRuntime.ENV.ICU_DATA}/${icuFileName}`) &&
(await fetch(icuFilePath, {method: 'HEAD'})).ok
) {
phpRuntime.FS.mkdirTree(phpRuntime.ENV.ICU_DATA);
phpRuntime.FS.writeFile(
`${phpRuntime.ENV.ICU_DATA}/${icuFileName}`,
new Uint8Array(await (await fetch(icuFilePath)).arrayBuffer())
);
}
}
}
}
const phpLoaderModule = await getPHPLoaderModule(phpVersion);
options.onPhpLoaderModuleLoaded?.(phpLoaderModule);
const websocketExtension = options.tcpOverFetch
? tcpOverFetchWebsocket(options.tcpOverFetch)
: fakeWebsocket();
return await loadPHPRuntime(phpLoaderModule, {
...emscriptenOptions,
...websocketExtension,
});This setup works, but in my separate test project, hardcoding the path to the ICU data file like this: const icuFilePath = 'node_modules/@php-wasm/web/shared/icudt74l.dat';feels like bad practice. I’m struggling to find a cleaner, more flexible approach. I've always found path resolution a bit tricky. Any suggestions? Another consideration : to
async onRuntimeInitialized() {
if (phpModuleArgs.onRuntimeInitialized) {
await phpModuleArgs.onRuntimeInitialized(PHPRuntime);
}
resolvePHP();
},Is this relevant enough ? |
|
About the hardcoded path, there are two stages to consider. Take the php_8_2.js file and the process of loading the related php_8_2.wasm asset: Stage 1: Building a reusable npm packagephp_8_2.js "imports" a wasm file but in reality it only gets a URL of that file. I think vite.config.js make that happen. import dependencyFilename from './8_2_10/php_8_2.wasm';
export { dependencyFilename }; When the import dependencyFilename from './8_2_10/php_8_2.wasm';
export { dependencyFilename }; Stage 2: Building playground.wordpress.net with specific pathsThe final build stage relies in part on the vite.config.js files shipped by the var hi="/assets/php_8_2-5df719e6.wasm"; |
The built emscripten module doesn't await there so I'd say it cannot be async. But you could fetch earlier: /*
* An ICU data file must be loaded to support Intl extension.
* To achieve this, a shared directory is mounted and referenced
* via the ICU_DATA environment variable.
* By default, this variable is set to `/shared`,
* which corresponds to the actual file location.
* The web version requires a `loaderOption` to load ICU data.
*/
if (options?.withICU === true) {
// fetch()
// Add the onRuntimeInitialized callback to emscriptenOptions
}
const phpLoaderModule = await getPHPLoaderModule(phpVersion);
options.onPhpLoaderModuleLoaded?.(phpLoaderModule);
const websocketExtension = options.tcpOverFetch
? tcpOverFetchWebsocket(options.tcpOverFetch)
: fakeWebsocket();
return await loadPHPRuntime(phpLoaderModule, {
...emscriptenOptions,
...websocketExtension,
});Also, consider a separate async withIcu() helper to parallelize downloading the WASM file and the ICU data file. |
|
You were right to be concerned about the canceled operation. It appears that |
|
Maybe separating php 7.2-8.0 from 8.1-8.4 could also help? I wonder if there's a single test somewhere that takes a lot of memory and we run it 10 times - once for each php version |
|
@adamziel Splitting
|
|
@mho22 1. Up to you :) 2. Yes please! |
|
Let's brush up the PR description with all the necessary details for posterity and we can merge |
|
The details section reads weird, e.g.:
Was it AI generated? Let's make it specific - what's the shared directory path? Why this design choice over another? What else was tried and didn't work? Etc. PR descriptions are a form of documentation and we'll be revisiting this one soon to learn about this or that aspect of intl support - let's provide a nice writeup for the next person. |
|
To be honest, I reviewed the files changed in this pull request and listed each step I took to complete it. I didn’t use AI, but it seems I wasn’t precise enough. Let me provide a more detailed explanation. |
|
@adamziel Done. Is this what you had in mind? I intentionally skipped some minor points and focused on explaining the most important ones, along with a bit of background context. |
|
This is great, thank you! And it seems like I was too hasty with my AI comment, sorry about that. The description is great and gives so much context now ❤️ I'd just add a few more things:
Crash risk!
Those functions are available because Edit: it didnt work here 🙈 maybe because I'm writing this comment from the ios app? I meant this: https://github.com/orgs/community/discussions/16925 Other than that, I only have one nitpick - a matter of taste really: There's a lot of empty lines in the description. It doesn't really matter if it's this way or the other way, but since:
It would be nice to condense these empty lines for consistency. |
|
@adamziel I’ve updated the description with your suggestions. Let me know if there’s anything else that needs to be adjusted. Ah and sorry for the extra empty lines! I sometimes find my comments too condensed, so I add a bit of spacing for readability. I’ll keep that in mind for future PR descriptions. |
## Motivation for the change, related issues This is a pull request to dynamically load Intl in @php-wasm Node JSPI. ## Related issues and pull requests Issues - #2466 - #2299 - #1295 Pull requests - #2247 - #2187 ## Implementation details ### Intl Dynamic Extension Compilation JSPI - Creation of a dedicated `shared` directory in `php-wasm/compile` which will store the dynamic extensions build processes and files. - Creation of a main `build.js` script with options related to the dynamic extensions - Creation of a specific Dockerfile for the creation of the Intl extension `.so` file based on PHP versions and JSPI - Creation of a dedicated `project.json` file which will store the list of compilation commands related to each dynamic extension for JSPI - Compilation of every version of Intl Dynamic Extension JSPI ### PHP.wasm Node WithIntl option - Loading of Intl extension based on the option `withIntl` [ same logic as Xdebug ]. This loads dynamically the needed version of the dynamic extension. Stores it in the filesystem. Prepare the related php ini file and load the related ICU data file. - Test the correct use of the extension in the `php-dynamic-loading.spec.ts` file. - Keep the Intl static extension working for PHP.wasm Node Asyncify. - Keep the Intl static extension compilation process for PHP.wasm Node Asyncify and Web. ## Testing Instructions (or ideally a Blueprint) `test.js` ```javascript import { PHP } from '@php-wasm/universal'; import { loadNodeRuntime } from '@php-wasm/node'; const script = `<?php $formatter = numfmt_create('en-US', NumberFormatter::CURRENCY); echo numfmt_format($formatter, 100.00); $formatter = numfmt_create('fr-FR', NumberFormatter::CURRENCY); echo numfmt_format($formatter, 100.00); ?>`; const php = new PHP( await loadNodeRuntime( '8.3', { withIntl : true } ) ); const result = await php.runStream( { code : script } ); console.log( await result.stdoutText ); ``` ``` > node --experimental-wasm-jspi scripts/example.js //withIntl : true $100.00100,00 € //withIntl : false <br /> <b>Fatal error</b>: Uncaught Error: Call to undefined function numfmt_create() in /internal/eval.php:3 Stack trace: #0 {main} thrown in <b>/internal/eval.php</b> on line <b>3</b><br /> ``` ## Next steps - [x] Experimental PHP Node JSPI 8.3 - [x] PHP.wasm Node JSPI - [ ] PHP.wasm Node Asyncify - [ ] PHP.wasm Web - [ ] Remove artifacts in PHP.wasm - [ ] Remove artifacts in Playground - [ ] Move Xdebug in shared directory alongside Intl
…2501 (#2557) ## Motivation for the change, related issues This is a pull request to dynamically load Intl in @php-wasm Node ASYNCIFY. ## Related issues and pull requests Issues - #2466 - #2299 - #1295 Pull requests - #2501 - #2247 - #2187 ## Implementation details ### Intl Dynamic Extension Compilation ASYNCIFY - Improvement to the specific Intl dynamic extension Dockerfile file based on PHP versions and ASYNCIFY - Modification of the dedicated `project.json` file which will store the list of compilation commands related to each dynamic extension for asyncify. - Compilation of every version of Intl Dynamic Extension For Asyncify ### PHP.wasm Node WithIntl option - Add Intl extension file import for Asyncify - Test the correct use of the extension in the `php-dynamic-loading.spec.ts` file. - Keep the Intl static extension compilation process for PHP.wasm Web. ## Testing Instructions (or ideally a Blueprint) `test.js` ```javascript import { PHP } from '@php-wasm/universal'; import { loadNodeRuntime } from '@php-wasm/node'; const script = `<?php $formatter = numfmt_create('en-US', NumberFormatter::CURRENCY); echo numfmt_format($formatter, 100.00); $formatter = numfmt_create('fr-FR', NumberFormatter::CURRENCY); echo numfmt_format($formatter, 100.00); ?>`; const php = new PHP( await loadNodeRuntime( '8.3', { withIntl : true } ) ); const result = await php.runStream( { code : script } ); console.log( await result.stdoutText ); ``` ``` > node scripts/example.js //withIntl : true $100.00100,00 € //withIntl : false <br /> <b>Fatal error</b>: Uncaught Error: Call to undefined function numfmt_create() in /internal/eval.php:3 Stack trace: #0 {main} thrown in <b>/internal/eval.php</b> on line <b>3</b><br /> ``` ## Next steps - [x] Experimental PHP Node JSPI 8.3 - [x] PHP.wasm Node JSPI - [x] PHP.wasm Node Asyncify - [ ] PHP.wasm Web - [ ] Remove artifacts in PHP.wasm - [ ] Remove artifacts in Playground - [ ] Move Xdebug in shared directory alongside Intl
## Motivation for the change, related issues This is a pull request to dynamically load Intl in PHP.wasm Web. ## Related issues and pull requests Issues - #2466 - #2299 - #1295 Pull requests - #2557 - #2501 - #2247 - #2187 ## Implementation details - Removal of static Intl options in PHP compilation - Set up of PHP as a `MAIN_MODULE` in node and web - Correction of #2318 by adding`worker` to the [`web`] environment - Improvement of build file for shared libraries - Implementation of Intl dynamic extension lazy loading logic in PHP.wasm web - Creation of a `ignore-lib-imports` Vite plugin - Playwright E2E tests implementation for PHP.wasm web by duplicating existing ones from PHP.wasm Node - Creation of a virtual alias for `wasm-feature-detect` to simulate JSPI mode enabled based on Playwright ENV - CI jobs implementation to test PHP.wasm web in JSPI and Asyncify mode ## Testing Instructions (or ideally a Blueprint) CI 🧪 test-e2e-php-wasm-web-jspi 🧪 test-e2e-php-wasm-web-asyncify ## Next steps - [x] Experimental PHP.wasm Node JSPI 8.3 - [x] PHP.wasm Node JSPI - [x] PHP.wasm Node Asyncify - [x] Experimental PHP.wasm Web JSPI 8.3 - [x] Experimental PHP.wasm Web Asyncify 8.3 - [x] PHP.wasm Web JSPI - [x] PHP.wasm Web Asyncify - [ ] Implement Intl in Blueprints - [ ] Remove remaining Intl artifacts in PHP.wasm - [ ] Remove remaining Intl artifacts in Playground --------- Co-authored-by: Adam Zieliński <adam@adamziel.com>

Motivation for the change, related issues
Mostly based on @oskardydo excellent work and pull request #2173
This pull request adds support for the
intlextension inphp-wasmfrom version7.2to version8.4on both Node and Web platforms. A new option,withICU, is now available on the Web platform to enable loading of the ICU data file.Crash risk!
Warning
In Web platform, calling an intl function without loading the dat file will crash PHP.
These functions are available because Intl is built as a static extension and included in all php.wasm binaries. However, they cannot operate without the required data. To avoid accidental crashes, you can explicitly disable specific Intl functions using the
disable_functionsdirective in yourphp.inifile.Roadmap
Implementation details
This pull request is composed of :
src/todist/on build time for Node and Web platforms.with-icu-datafiles for ICU data file loading during PHP runtime, provided in Node but optional withwithICUboolean option in Web platform..wasmfiles..wasmimports.php.spec.tsSome points deserve more explanation :
2. The Intl extension requires locale-specific data to function properly. This data is stored in an ICU data file named
icudt74l.dat, which is generated during the static build process in thecompile/libintlDockerfile. Note that this file is quite large [ ~30MB ].3. After PHP compilation, the ICU data file, similar to the
.wasmfiles tied to the PHP version, is copied into a platform-specific newly createdshareddirectory:packages/php-wasm/node/src/lib/data/sharedfor Nodepackages/php-wasm/web/public/sharedfor WebNote: The original location for the Node version was at the root of the
php-wasm/nodedirectory, but this caused issues (see point 6 below).5. During the build process, the ICU data file is copied from
srctodist:For Node, this is handled using
esbuildinnode/build.js.For Web, since the
shareddirectory is placed underpublic, Vite automatically copies its contents to thedistdirectory.6. Once the file is in place, we need to load it into the PHP-WASM filesystem and keep track of its path. This must be done before completing the
loadNodeRuntimeorloadWebRuntimeprocess.This logic is handled by the
data/with-icu-data.tsfile, which exports a function returning a Promise that resolves to EmscriptenOptions:Note: The original location of the data file couldn't remain in the root
php-wasm/nodedirectory because thefilePathmust remain consistent betweensrcanddist:The simplest solution was to move the
shareddirectory next to thewith-icu-data.tsfile.Next, this Promise is executed within the
loadPHPRuntimemethod, called inside eitherloadWebRuntimeorloadNodeRuntime. The key difference lies in how and when it’s executed:For Node, loading the file is straightforward and synchronous.
For Web, the ICU data file must be loaded concurrently with the
.wasmfile due to its size and performance implications. Furthermore, this loading is optional, if thewithICUoption isfalseor absent from theloaderOptions, the file will not be loaded.7. While loading the file in Node is easy via
readFileSync, the Web version requires more setup due to the build system.To load the data using
fetch, a helper JavaScript file [icudt74l.js] is added to theshareddirectory :web/public/shared/icudt74l.jsThis enables the Vite to locate the
.datfile. However, since we want to load the data at runtime and not during build, we must tell Vite to ignore.datfile imports. This is done invite.config.ts:vite.config.tsat line37This ensures the
.datfile is excluded from the bundle and instead loaded dynamically at runtime, only if thewithICUoption is enabled. Therefore simulating the data import.10. Due to resource limitations, several Node test files were split and distributed across multiple jobs to ensure tests pass reliably:
Increasing
test-unit-asyncifyto 8 jobs.Build process
Testing Instructions
Node
scripts/node.jscommands
JSPI :
node --experimental-wasm-stack-switching scripts/node.jsAsyncify :
node scripts/node.jsresults
Web
scripts/web.jscommands
JSPI : chrome://flags > Search for JSPI > Enable &&
npm run devAsyncify :
npm run devresults