VideoShop: Localized Semantic Video Editing with Noise-Extrapolated Diffusion Inversion

Supplementary Material

 

Sections


Tips:

This page was adopted from the excellent work by diffusion-motion-transfer, licensed under CC-BY-SA 4.0.


Introduction Video


Demo


Original video Edited video Instruction detail

 


More Results


Input video Edited video Instruction (text)

make her wear a baseball hat

remove the person

swap the cupcake with a piece of cake

replace the table with a bonfire

change it into a chocolate train

add zebras and river

Make the piece of paper hanging on the wall a mirror

change space suit

change the flag of the united states for that of england

Make it a black sheep.

remove the clouds

make the phone booth red and add a person

remove middle fruit and put a cat in place

Put sunglasses on the girl.

remove tie adjusting action, change outfit

make him hold a cup

remove the tomato from one sandwich

make the cat lick its nose

make the ramp cement

place a cat in the counter

Have the instructor's jacket say "4" on it

 


Comparison to Baselines


Input Mask Prompt Ours

add a mountain in the background

Fate/Zero Spacetime Pix2Video RAVE BDIA



Input Mask Prompt Ours

add chandeliers

Fate/Zero Spacetime Pix2Video RAVE BDIA



Input Mask Prompt Ours

add a parrot on his shoulder

Fate/Zero Spacetime Pix2Video RAVE BDIA



 


Ablation


Input video w/o Noise Extrapolation w/o Latent Norm w/o Latent Rescaling Full Method

 


Limitations


Limitation 1: Large group of people, multiple heterogeneous movements. The model does not reproduce all the movements.

Input video Edited video Instruction

change the people on the left,
in the background

Limitation 2: Light flickering. The model does not reproduce the lighting effect accurately.

Input video Edited video Instruction

turn the light ball into a planet