Skip to content

Commit 7749a4a

Browse files
authored
Add support for SmolLM3 (#1359)
1 parent 4b7a3aa commit 7749a4a

File tree

4 files changed

+12
-0
lines changed

4 files changed

+12
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -416,6 +416,7 @@ You can refine your search by selecting the task you're interested in (e.g., [te
416416
1. **[SegFormer](https://huggingface.co/docs/transformers/model_doc/segformer)** (from NVIDIA) released with the paper [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https://huggingface.co/papers/2105.15203) by Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo.
417417
1. **[Segment Anything](https://huggingface.co/docs/transformers/model_doc/sam)** (from Meta AI) released with the paper [Segment Anything](https://huggingface.co/papers/2304.02643v1.pdf) by Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alex Berg, Wan-Yen Lo, Piotr Dollar, Ross Girshick.
418418
1. **[SigLIP](https://huggingface.co/docs/transformers/main/model_doc/siglip)** (from Google AI) released with the paper [Sigmoid Loss for Language Image Pre-Training](https://huggingface.co/papers/2303.15343) by Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, Lucas Beyer.
419+
1. **[SmolLM3](https://huggingface.co/docs/transformers/main/model_doc/smollm3) (from Hugging Face) released with the blog post [SmolLM3: smol, multilingual, long-context reasoner](https://huggingface.co/blog/smollm3) by the Hugging Face TB Research team.
419420
1. **[SmolVLM](https://huggingface.co/docs/transformers/main/model_doc/smolvlm) (from Hugging Face) released with the blog posts [SmolVLM - small yet mighty Vision Language Model](https://huggingface.co/blog/smolvlm) and [SmolVLM Grows Smaller – Introducing the 250M & 500M Models!](https://huggingface.co/blog/smolervlm) by the Hugging Face TB Research team.
420421
1. **SNAC** (from Papla Media, ETH Zurich) released with the paper [SNAC: Multi-Scale Neural Audio Codec](https://huggingface.co/papers/2410.14411) by Hubert Siuzdak, Florian Grötschla, Luca A. Lanzendörfer.
421422
1. **[SpeechT5](https://huggingface.co/docs/transformers/model_doc/speecht5)** (from Microsoft Research) released with the paper [SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing](https://huggingface.co/papers/2110.07205) by Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei.

docs/snippets/6_supported-models.snippet

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -130,6 +130,7 @@
130130
1. **[SegFormer](https://huggingface.co/docs/transformers/model_doc/segformer)** (from NVIDIA) released with the paper [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https://huggingface.co/papers/2105.15203) by Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo.
131131
1. **[Segment Anything](https://huggingface.co/docs/transformers/model_doc/sam)** (from Meta AI) released with the paper [Segment Anything](https://huggingface.co/papers/2304.02643v1.pdf) by Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alex Berg, Wan-Yen Lo, Piotr Dollar, Ross Girshick.
132132
1. **[SigLIP](https://huggingface.co/docs/transformers/main/model_doc/siglip)** (from Google AI) released with the paper [Sigmoid Loss for Language Image Pre-Training](https://huggingface.co/papers/2303.15343) by Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, Lucas Beyer.
133+
1. **[SmolLM3](https://huggingface.co/docs/transformers/main/model_doc/smollm3) (from Hugging Face) released with the blog post [SmolLM3: smol, multilingual, long-context reasoner](https://huggingface.co/blog/smollm3) by the Hugging Face TB Research team.
133134
1. **[SmolVLM](https://huggingface.co/docs/transformers/main/model_doc/smolvlm) (from Hugging Face) released with the blog posts [SmolVLM - small yet mighty Vision Language Model](https://huggingface.co/blog/smolvlm) and [SmolVLM Grows Smaller – Introducing the 250M & 500M Models!](https://huggingface.co/blog/smolervlm) by the Hugging Face TB Research team.
134135
1. **SNAC** (from Papla Media, ETH Zurich) released with the paper [SNAC: Multi-Scale Neural Audio Codec](https://huggingface.co/papers/2410.14411) by Hubert Siuzdak, Florian Grötschla, Luca A. Lanzendörfer.
135136
1. **[SpeechT5](https://huggingface.co/docs/transformers/model_doc/speecht5)** (from Microsoft Research) released with the paper [SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing](https://huggingface.co/papers/2110.07205) by Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei.

src/configs.js

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,7 @@ function getNormalizedConfig(config) {
109109
mapping['hidden_size'] = 'hidden_size';
110110
break;
111111
case 'llama':
112+
case 'smollm3':
112113
case 'olmo':
113114
case 'olmo2':
114115
case 'mobilellm':

src/models.js

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4586,6 +4586,13 @@ export class LlamaModel extends LlamaPreTrainedModel { }
45864586
export class LlamaForCausalLM extends LlamaPreTrainedModel { }
45874587
//////////////////////////////////////////////////
45884588

4589+
//////////////////////////////////////////////////
4590+
// SmolLM3 models
4591+
export class SmolLM3PreTrainedModel extends PreTrainedModel { }
4592+
export class SmolLM3Model extends SmolLM3PreTrainedModel { }
4593+
export class SmolLM3ForCausalLM extends SmolLM3PreTrainedModel { }
4594+
//////////////////////////////////////////////////
4595+
45894596
//////////////////////////////////////////////////
45904597
// Helium models
45914598
export class HeliumPreTrainedModel extends PreTrainedModel { }
@@ -7796,6 +7803,7 @@ const MODEL_MAPPING_NAMES_DECODER_ONLY = new Map([
77967803
['gpt_neox', ['GPTNeoXModel', GPTNeoXModel]],
77977804
['codegen', ['CodeGenModel', CodeGenModel]],
77987805
['llama', ['LlamaModel', LlamaModel]],
7806+
['smollm3', ['SmolLM3Model', SmolLM3Model]],
77997807
['exaone', ['ExaoneModel', ExaoneModel]],
78007808
['olmo', ['OlmoModel', OlmoModel]],
78017809
['olmo2', ['Olmo2Model', Olmo2Model]],
@@ -7900,6 +7908,7 @@ const MODEL_FOR_CAUSAL_LM_MAPPING_NAMES = new Map([
79007908
['gpt_neox', ['GPTNeoXForCausalLM', GPTNeoXForCausalLM]],
79017909
['codegen', ['CodeGenForCausalLM', CodeGenForCausalLM]],
79027910
['llama', ['LlamaForCausalLM', LlamaForCausalLM]],
7911+
['smollm3', ['SmolLM3ForCausalLM', SmolLM3ForCausalLM]],
79037912
['exaone', ['ExaoneForCausalLM', ExaoneForCausalLM]],
79047913
['olmo', ['OlmoForCausalLM', OlmoForCausalLM]],
79057914
['olmo2', ['Olmo2ForCausalLM', Olmo2ForCausalLM]],

0 commit comments

Comments
 (0)