- Shot detection using PySceneDetect
- Face detection with MediaPipe
- Object detection using YOLO (Ultralytics)
- Smooth camera path optimization to avoid jitter
- Output encoded with OpenCV
-
Install Python Dependencies: The script requires Python 3.8+ and the packages listed in
requirements.txt
.pip install -r requirements.txt
-
Install FFmpeg (for audio processing): FrameShift uses FFmpeg to process and include audio from the original video into the reframed output. If FFmpeg is not installed or not found in your system's PATH, the script will still process the video but the output will not contain audio (a warning will be displayed).
- Download FFmpeg: You can download FFmpeg from ffmpeg.org.
- Installation: Follow the installation instructions for your operating system. Ensure the directory containing the
ffmpeg
(orffmpeg.exe
) executable is added to your system's PATH environment variable. - Verify Installation: Open a new terminal/command prompt and type
ffmpeg -version
. If it's installed correctly, you should see version information.
FrameShift automatically reframes videos, using a stationary (fixed) crop for each detected scene.
1. Default Behavior: Fill Frame (Pan & Scan) Converts a landscape video to portrait, filling the new 9:16 frame by cropping and centering.
python -m frameshift.main my_landscape_video.mp4 video_for_stories.mp4 --ratio 9:16
2. Padding with Black Bars:
If you want to see the entire optimally cropped region without parts being cut off to fill the frame, use --padding
. This will add black bars by default.
python -m frameshift.main input_video.mp4 square_video_fit.mp4 --ratio 1:1 --padding
3. Padding with Blurred Background: To use blurred background bars instead of black:
python -m frameshift.main input_video.mp4 output_blur_padded.mp4 --ratio 16:9 --padding --padding_type blur --blur_amount 5
(Adjust --blur_amount
from 0 (minimal) to 10 (maximal) for desired blur intensity.)
4. Padding with a Custom Color: To use solid green bars:
python -m frameshift.main input_video.mp4 output_color_padded.mp4 --ratio 4:3 --padding --padding_type color --padding_color_value green
Or with a specific RGB color (e.g., blue):
python -m frameshift.main input_video.mp4 output_rgb_padded.mp4 --ratio 4:3 --padding --padding_type color --padding_color_value "(0,0,255)"
5. Batch Processing:
Reframes all videos in my_video_folder
to 4:5, saving to output_folder
, using default fill behavior.
python -m frameshift.main my_video_folder/ output_folder/ --ratio 4:5 --batch
6. Running the Test Suite:
To run a predefined suite of tests using my_video.mp4
as input, saving test outputs relative to test_run_output.mp4
, and generating a detailed log:
python -m frameshift.main my_video.mp4 test_run_output.mp4 --ratio 16:9 --test
(Check test_run_output_test_outputs/
directory and test_run_output_test_suite.log
or similar for results. If an error occurs in one test scenario, the suite will log it and continue to the next.)
FrameShift intelligently reframes videos using a stationary (fixed) crop per scene approach:
- Scene Detection: Divides the video into scenes using PySceneDetect.
- Content Analysis:
Both detection models (
yolo11n.pt
andyolov11n-face.pt
) are automatically downloaded to a localmodels/
subdirectory if not found during the first run.- Faces: Detected using a specialized model,
yolov11n-face.pt
.- This model is trained to detect one primary class:
{0: 'face'}
. - If this model fails to load or operate, MediaPipe's face detection is used as a fallback.
- This model is trained to detect one primary class:
- Other Objects: Detected using the general-purpose
yolo11n.pt
model.- This model can identify a wide range of objects from the COCO dataset. Common detectable classes include:
person
,bicycle
,car
,motorcycle
,airplane
,bus
,train
,truck
,boat
,traffic light
,fire hydrant
,stop sign
,parking meter
,bench
,bird
,cat
,dog
,horse
,sheep
,cow
,elephant
,bear
,zebra
,giraffe
,backpack
,umbrella
,handbag
,tie
,suitcase
,frisbee
,skis
,snowboard
,sports ball
,kite
,baseball bat
,baseball glove
,skateboard
,surfboard
,tennis racket
,bottle
,wine glass
,cup
,fork
,knife
,spoon
,bowl
,banana
,apple
,sandwich
,orange
,broccoli
,carrot
,hot dog
,pizza
,donut
,cake
,chair
,couch
,potted plant
,bed
,dining table
,toilet
,tv
,laptop
,mouse
,remote
,keyboard
,cell phone
,microwave
,oven
,toaster
,sink
,refrigerator
,book
,clock
,vase
,scissors
,teddy bear
,hair drier
,toothbrush
. (This is not an exhaustive list, but covers the 80 standard COCO classes). - This object detection step is performed only if you specify weights for object classes (other than 'face' and 'default') with a weight greater than 0 in the
--object_weights
argument. This optimizes performance when only face-centric reframing is needed.
- This model can identify a wide range of objects from the COCO dataset. Common detectable classes include:
- The importance of all detected elements in guiding the crop is determined by the
--object_weights
argument.
- Faces: Detected using a specialized model,
- Optimal Stationary Crop: For each scene, calculates a fixed crop window that best frames the weighted area of interest at the target aspect ratio.
- Output Generation (Cropping/Padding):
- Default (Fill/Pan & Scan): If
--padding
is NOT specified, the determined crop is scaled to completely fill the output frame. Excess parts of the crop are trimmed to match the target aspect ratio without deforming the image. No bars are added. - With Padding (
--padding
is specified): The determined crop is scaled to fit entirely within the output frame, preserving its aspect ratio.- If
--padding_type
isblack
(default when--padding
is used) or an unrecognized type: Black bars are added. - If
--padding_type
isblur
: Blurred bars from the original video background are added. Intensity is controlled by--blur_amount
. - If
--padding_type
iscolor
: Bars of the color specified by--padding_color_value
are added. The script will attempt to preserve the audio track from the original video using FFmpeg. If FFmpeg is not found, the output video will be silent.
- If
- Default (Fill/Pan & Scan): If
Future Development Ideas: Enhancements could include automatic selection of reframing strategies (like dynamic tracking or panning) and more advanced path smoothing.
input
: Path to the input video file or input directory (if--batch
is used).output
: Path to the output video file or output directory (if--batch
is used).--ratio R
: Target aspect ratio for the output video (e.g.,9:16
,1:1
, or0.5625
). Default:9/16
.--padding
: (Flag, default:False
) If set, enables padding to ensure the entire optimally cropped content is visible. Defaults to black bars if no other padding type is specified. If not set, the content will be cropped to fill the output frame (Pan & Scan).--padding_type TYPE
: (Default:black
) Type of padding if--padding
is enabled.black
: Solid black bars.blur
: Blurred background bars.color
: Solid color bars (use with--padding_color_value
).
--blur_amount INT
: (Default:5
) Integer from 0 (minimal) to 10 (maximal) for blur intensity whenpadding_type
isblur
.--padding_color_value STR
: (Default:black
) Color for padding ifpadding_type
iscolor
.- Accepts names:
black
,white
,red
,green
,blue
,yellow
,cyan
,magenta
. - Accepts RGB tuple as string:
"(R,G,B)"
(e.g.,"(255,0,0)"
for red).
- Accepts names:
--output_height H
: (Default:1080
, integer) Target height for the output video in pixels (e.g., 720, 1080, 1280, 1920). The width will be calculated automatically based on the target aspect ratio specified by--ratio
. This allows control over the final output resolution.--interpolation METHOD
: (Default:lanczos
) Specifies the interpolation algorithm used for resizing video frames.- Choices:
nearest
,linear
,cubic
,area
,lanczos
. lanczos
(Lanczos over 8x8 neighborhood) orcubic
are generally recommended for upscaling (enlarging images) as they can produce sharper results.area
is often best for downscaling (shrinking images).linear
is faster but might be less sharp thancubic
orlanczos
.nearest
is the fastest but produces blocky results, usually not recommended for video.
- Choices:
--content_opacity O
: (Default:1.0
) Opacity of the main video content (0.0-1.0). If < 1.0, the content (including any padding) is blended with a blurred version of the full original frame.--object_weights "label:w,..."
: (Default:"face:1.0,person:0.8,default:0.5"
) Comma-separatedlabel:weight
pairs.- Assigns importance weights to detected elements. The label
'face'
refers to faces detected byyolov11n-face.pt
(or MediaPipe fallback). Other labels (e.g.,'person'
,'car'
,'dog'
, see list under "Content Analysis") correspond to objects detected byyolo11n.pt
. - Detection of these other objects with
yolo11n.pt
is only run if weights are specified for their labels with a weight > 0. - Example:
"--object_weights \"face:1.0,dog:0.7,default:0.2\""
(this would triggeryolo11n.pt
to look for dogs).
- Assigns importance weights to detected elements. The label
--log_file FILE_PATH
: (Optional) Path to a file where verbose (DEBUG level) logs will be written. If not specified, logs (INFO level and above for FrameShift's own messages, plus any output from underlying libraries) are printed to the console.--test
: (Flag, default:False
) Run in test mode. This will execute a predefined suite of scenarios based on the providedinput
,output
, andratio
arguments, varying other parameters like padding, interpolation, etc. Test outputs are saved to a subdirectory (e.g.,*_test_outputs/
) relative to the specified output path. A detailed log of the test suite is also generated (e.g.,*_test_suite.log
).--batch
: (Flag) Process all videos in the input directory.
The cropping logic determines an optimal stationary (fixed) crop for each scene, prioritizing important content based on --object_weights
. How this crop is presented in the final output (filled, or with padding) is controlled by --padding
and its related arguments.
For users who prefer a graphical interface over the command line, FrameShift now includes a GUI.
- To Run:
(Ensure you are in the project's root directory and have installed all dependencies from
python frameshift_gui.py
requirements.txt
). - Features: The GUI provides access to most of FrameShift's command-line functionalities, including input/output selection, aspect ratio, padding options, object weights, and more, all through an intuitive interface.
- Documentation: For detailed instructions on using the GUI, please refer to the FrameShift GUI User Manual.
FrameShift is released under the MIT License.