You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Improved Model card for Gemma2
* Made changes in gemma2 as suggested
* Made more changes in the doc (adding image, notes, closing hfoptions)
* minor fixes
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
[Gemma 2](https://huggingface.co/papers/2408.00118) is a family of language models with pretrained and instruction-tuned variants, available in 2B, 9B, 27B parameters. The architecture is similar to the previous Gemma, except it features interleaved local attention (4096 tokens) and global attention (8192 tokens) and grouped-query attention (GQA) to increase inference performance.
31
+
32
+
The 2B and 9B models are trained with knowledge distillation, and the instruction-tuned variant was post-trained with supervised fine-tuning and reinforcement learning.
33
+
34
+
You can find all the original Gemma 2 checkpoints under the [Gemma 2](https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315) collection.
35
+
36
+
> [!TIP]
37
+
> Click on the Gemma 2 models in the right sidebar for more examples of how to apply Gemma to different language tasks.
38
+
39
+
The example below demonstrates how to chat with the model with [`Pipeline`] or the [`AutoModel`] class, and from the command line.
from transformers import AutoTokenizer, AutoModelForCausalLM
27
65
28
-
The Gemma2 model was proposed in [Gemma2: Open Models Based on Gemini Technology and Research](https://blog.google/technology/developers/google-gemma-2/) by Gemma2 Team, Google.
29
-
Two Gemma2 models are released, with parameters sizes of 9 billion (9B) and 27 billion (27B).
*Now we’re officially releasing Gemma 2 to researchers and developers globally. Available in both 9 billion (9B) and 27 billion (27B) parameter sizes, Gemma 2 is higher-performing and more efficient at inference than the first generation, with significant safety advancements built in. In fact, at 27B, it offers competitive alternatives to models more than twice its size, delivering the kind of performance that was only possible with proprietary models as recently as December.*
Quantization reduces the memory burden of large models by representing the weights in a lower precision. Refer to the [Quantization](../quantization/overview) overview for more available quantization backends.
92
+
93
+
The example below uses [bitsandbytes](../quantization/bitsandbytes) to only quantize the weights to int4.
94
+
95
+
```python
96
+
import torch
97
+
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
Use the [AttentionMaskVisualizer](https://github.com/huggingface/transformers/blob/beb9b5b02246b9b7ee81ddf938f93f44cfeaad19/src/transformers/utils/attention_visualizer.py#L139) to better understand what tokens the model can and cannot attend to.
116
+
117
+
118
+
```python
119
+
from transformers.utils.attention_visualizer import AttentionMaskVisualizer
-Gemma2 uses sliding window attention every second layer, which makes it unsuitable for typical kvcaching with [`~DynamicCache`] or tuples of tensors. To enable caching in Gemma2 forward call, you must initialize a [`~HybridCache`] instance and pass it as `past_key_values` to the forward call. Note, that you also have to prepare `cache_position` if the `past_key_values` already contains previous keys and values.
130
+
-Use a [`HybridCache`] instance to enable caching in Gemma 2. Gemma 2 doesn't support kv-caching strategies like [`DynamicCache`] or tuples of tensors because it uses sliding window attention every second layer.
42
131
43
-
</Tip>
132
+
```python
133
+
from transformers import AutoTokenizer, AutoModelForCausalLM, HybridCache
44
134
45
-
This model was contributed by [Arthur Zucker](https://huggingface.co/ArthurZ), [Pedro Cuenca](https://huggingface.co/pcuenq) and [Tom Arsen]().
135
+
model = AutoModelForCausalLM.from_pretrained("google/gemma-2-2b")
0 commit comments