Skip to content

add intern video2 #2958

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 20, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .ci/ignore_treon_docker.txt
Original file line number Diff line number Diff line change
Expand Up @@ -86,4 +86,5 @@ notebooks/omniparser/omniparser.ipynb
notebooks/olmocr-pdf-vlm/olmocr-pdf-vlm.ipynb
notebooks/minicpm-o-omnimodal-chatbot/minicpm-o-omnimodal-chatbot.ipynb
notebooks/kokoro/kokoro.ipynb
notebooks/qwen2.5-omni-chatbot/qwen2.5-omni-chatbot.ipynb
notebooks/qwen2.5-omni-chatbot/qwen2.5-omni-chatbot.ipynb
notebooks/intern-video2-classiciation/intern-video2-classification.ipynb
10 changes: 8 additions & 2 deletions .ci/skipped_notebooks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -530,9 +530,15 @@
- macos-13
- ubuntu-22.04
- windows-2019
- notebook: "notebooks/deepseek-vl2/deepseek-vl2.ipynb"
- notebook: notebooks/deepseek-vl2/deepseek-vl2.ipynb
skips:
- os:
- macos-13
- ubuntu-22.04
- windows-2019
- windows-2019
- notebook: notebooks/intern-video2-classiciation/intern-video2-classification.ipynb
skips:
- os:
- macos-13
- ubuntu-22.04
- windows-2019
3 changes: 3 additions & 0 deletions .ci/spellcheck/.pyspelling.wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,7 @@ BLACKBOX
boolean
CatVTON
CentOS
centric
CFG
charlist
charlists
Expand Down Expand Up @@ -403,6 +404,7 @@ intel
interactable
InternLM
internlm
InternVideo
Interpolative
interpretable
invertible
Expand Down Expand Up @@ -1074,6 +1076,7 @@ vec
VegaRT
verovio
videpth
ViFM
VIO
virtualenv
VisCPM
Expand Down
26 changes: 26 additions & 0 deletions notebooks/intern-video2-classiciation/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Video Classification with InternVideo2 and OpenVINO

InternVideo2 is family of video foundation models (ViFM) that achieve the state-of-the-art results in video recognition, video-text tasks, and video-centric dialogue.
You can find more information about model in [model card](https://huggingface.co/OpenGVLab/InternVideo2-Stage2_6B), [paper](https://arxiv.org/pdf/2403.15377) and original [repository](https://github.com/OpenGVLab/InternVideo/tree/main/InternVideo2/multi_modality).

In this tutorial we consider how to convert, optimize and run InternVideo2 Stage2 model for video classification using OpenVINO.

## Notebook contents
The tutorial consists from following steps:

- Install requirements
- Convert and Optimize model
- Run OpenVINO model inference
- Launch Interactive demo

In this demonstration, you'll create text-to-video retrieval pipeline which is responsible to find the most suitable text caption for video content.

The image bellow illustrates example of model inference result.
![example.png](https://github.com/user-attachments/assets/6720efe0-ab24-4d73-a22f-a8a0499558d8)

## Installation instructions
This is a self-contained example that relies solely on its own code.</br>
We recommend running the notebook in a virtual environment. You only need a Jupyter server to start.
For details, please refer to [Installation Guide](../../README.md).

<img referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=5b5a4db0-7875-4bfb-bdbd-01698b5b1a77&file=notebooks/intern-video2-classiciation/README.md" />
16 changes: 16 additions & 0 deletions notebooks/intern-video2-classiciation/gradio_helper.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
import gradio as gr


def make_demo(classify):
demo = gr.Interface(
classify,
[
gr.Video(label="Video"),
gr.Textbox(label="Labels", info="Comma-separated list of class labels"),
],
gr.Label(label="Result"),
examples=[["coco.mp4", "airplane, dog, car"]],
allow_flagging="never",
)

return demo

Large diffs are not rendered by default.

Loading