openvinotoolkit · eaidova · May 20, 2025 · May 20, 2025 · May 20, 2025
diff --git a/.ci/ignore_treon_docker.txt b/.ci/ignore_treon_docker.txt
@@ -86,4 +86,5 @@ notebooks/omniparser/omniparser.ipynb
 notebooks/olmocr-pdf-vlm/olmocr-pdf-vlm.ipynb
 notebooks/minicpm-o-omnimodal-chatbot/minicpm-o-omnimodal-chatbot.ipynb
 notebooks/kokoro/kokoro.ipynb
-notebooks/qwen2.5-omni-chatbot/qwen2.5-omni-chatbot.ipynb
+notebooks/qwen2.5-omni-chatbot/qwen2.5-omni-chatbot.ipynb
+notebooks/intern-video2-classiciation/intern-video2-classification.ipynb
diff --git a/.ci/skipped_notebooks.yml b/.ci/skipped_notebooks.yml
@@ -530,9 +530,15 @@
         - macos-13
         - ubuntu-22.04
         - windows-2019
-- notebook: "notebooks/deepseek-vl2/deepseek-vl2.ipynb"
+- notebook: notebooks/deepseek-vl2/deepseek-vl2.ipynb
   skips:
     - os:
         - macos-13
         - ubuntu-22.04
-        - windows-2019
+        - windows-2019
+- notebook: notebooks/intern-video2-classiciation/intern-video2-classification.ipynb
+  skips:
+    - os:
+        - macos-13
+        - ubuntu-22.04
+        - windows-2019
diff --git a/.ci/spellcheck/.pyspelling.wordlist.txt b/.ci/spellcheck/.pyspelling.wordlist.txt
@@ -85,6 +85,7 @@ BLACKBOX
 boolean
 CatVTON
 CentOS
+centric
 CFG
 charlist
 charlists
@@ -403,6 +404,7 @@ intel
 interactable
 InternLM
 internlm
+InternVideo
 Interpolative
 interpretable
 invertible
@@ -1074,6 +1076,7 @@ vec
 VegaRT
 verovio
 videpth
+ViFM
 VIO
 virtualenv
 VisCPM

diff --git a/notebooks/intern-video2-classiciation/README.md b/notebooks/intern-video2-classiciation/README.md
@@ -0,0 +1,26 @@
+# Video Classification with InternVideo2 and OpenVINO
+
+InternVideo2 is family of video foundation models (ViFM) that achieve the state-of-the-art results in video recognition, video-text tasks, and video-centric dialogue.
+You can find more information about model in [model card](https://huggingface.co/OpenGVLab/InternVideo2-Stage2_6B), [paper](https://arxiv.org/pdf/2403.15377) and original [repository](https://github.com/OpenGVLab/InternVideo/tree/main/InternVideo2/multi_modality).
+
+In this tutorial we consider how to convert, optimize and run InternVideo2 Stage2 model for video classification using OpenVINO.
+
+## Notebook contents
+The tutorial consists from following steps:
+
+- Install requirements
+- Convert and Optimize model
+- Run OpenVINO model inference
+- Launch Interactive demo
+
+In this demonstration, you'll create text-to-video retrieval pipeline which is responsible to find the most suitable text caption for video content.
+
+The image bellow illustrates example of model inference result.
+![example.png](https://github.com/user-attachments/assets/6720efe0-ab24-4d73-a22f-a8a0499558d8)
+
+## Installation instructions
+This is a self-contained example that relies solely on its own code.</br>
+We recommend running the notebook in a virtual environment. You only need a Jupyter server to start.
+For details, please refer to [Installation Guide](../../README.md).
+
+<img referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=5b5a4db0-7875-4bfb-bdbd-01698b5b1a77&file=notebooks/intern-video2-classiciation/README.md" />
diff --git a/notebooks/intern-video2-classiciation/gradio_helper.py b/notebooks/intern-video2-classiciation/gradio_helper.py
@@ -0,0 +1,16 @@
+import gradio as gr
+
+
+def make_demo(classify):
+    demo = gr.Interface(
+        classify,
+        [
+            gr.Video(label="Video"),
+            gr.Textbox(label="Labels", info="Comma-separated list of class labels"),
+        ],
+        gr.Label(label="Result"),
+        examples=[["coco.mp4", "airplane, dog, car"]],
+        allow_flagging="never",
+    )
+
+    return demo
diff --git a/notebooks/intern-video2-classiciation/intern-video2-classification.ipynb b/notebooks/intern-video2-classiciation/intern-video2-classification.ipynb