Use `AutoTokenizer.from()` for faster tokenizer loading by DePasqualeOrg · Pull Request #33 · ml-explore/mlx-swift-lm

DePasqualeOrg · 2025-12-27T16:37:58Z

swift-transformers PR #303 offers significantly faster tokenizer loading when using AutoTokenizer.from(). It also covers the tokenizer remapping and registration that is currently done in mlx-swift-lm, so we can remove that and use the fast path here after that PR is merged.

Changes

MLXLMCommon/Tokenizer.swift: loadTokenizer now uses AutoTokenizer.from() directly instead of manually loading configs and calling PreTrainedTokenizer init
Embedders/Tokenizer.swift: Same change, now passes revision to AutoTokenizer.from()
Embedders/Models.swift: Added revision parameter to ModelConfiguration for consistency with MLXLMCommon
Embedders/Load.swift: Now passes revision to hub.snapshot()
Embedders/EmbeddingModel.swift: Uses loadTokenizer instead of inline config loading

API Changes

Deprecated:

loadTokenizerConfig: Use LanguageModelConfigurationFromHub from swift-transformers directly, which allows users to opt in to the fast path with stripVocabForPerformance: true.

Unavailable (breaking change):

TokenizerReplacementRegistry / replacementTokenizers: Use AutoTokenizer.register(_:for:) from swift-transformers instead. These no longer function with the new AutoTokenizer.from() code path.

Offline Mode

The offline fallback logic has been removed, as it's handled automatically by the swift-transformers Hub API. When offline, HubApi.snapshot() detects the network state via NWPathMonitor and falls back to cached files if available.

davidkoski · 2026-01-05T17:19:39Z

This looks good -- is is ready to merge?

DePasqualeOrg · 2026-01-05T17:24:25Z

I think we should wait for huggingface/swift-transformers#303 to be merged. I'll mark this as ready for review at that time.

davidkoski · 2026-01-05T17:25:50Z

I think we should wait for huggingface/swift-transformers#303 to be merged. I'll mark this as ready for review at that time.

~~Awesome, looking at that one now!~~ Ooops, that is one the swift-transformers side :-)

DePasqualeOrg mentioned this pull request Dec 28, 2025

Optimize model loading performance #34

Merged

DePasqualeOrg force-pushed the fast-tokenizer-loading branch from 8e741f3 to 607860b Compare January 6, 2026 20:06

DePasqualeOrg mentioned this pull request Feb 11, 2026

Optimizations for significantly faster tokenizer loading huggingface/swift-transformers#303

Closed

DePasqualeOrg force-pushed the fast-tokenizer-loading branch 3 times, most recently from 8d50a4d to e9f0f0d Compare February 17, 2026 17:08

DePasqualeOrg added 6 commits February 19, 2026 08:51

Add revision option to embedding model ID

e50b41e

Use AutoTokenizer.from() for faster tokenizer loading

6909ad9

Deprecate overrideTokenizer (now handled by swift-transformers)

58faf2f

Temporary pin to swift-transformers (remove before merging)

5704366

Remove overrideTokenizer (handled by swift-transformers)

d4ffcdf

Clean up

70a6635

DePasqualeOrg force-pushed the fast-tokenizer-loading branch from e9f0f0d to 70a6635 Compare February 19, 2026 07:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Use `AutoTokenizer.from()` for faster tokenizer loading#33

Use `AutoTokenizer.from()` for faster tokenizer loading#33
DePasqualeOrg wants to merge 6 commits intoml-explore:mainfrom
DePasqualeOrg:fast-tokenizer-loading

DePasqualeOrg commented Dec 27, 2025

Uh oh!

davidkoski commented Jan 5, 2026

Uh oh!

DePasqualeOrg commented Jan 5, 2026

Uh oh!

davidkoski commented Jan 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

DePasqualeOrg commented Dec 27, 2025

Changes

API Changes

Offline Mode

Uh oh!

davidkoski commented Jan 5, 2026

Uh oh!

DePasqualeOrg commented Jan 5, 2026

Uh oh!

davidkoski commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

davidkoski commented Jan 5, 2026 •

edited

Loading