PanFlow: Decoupled Motion Control for Panoramic Video Generation

AAAI 2026

Code Video arXiv Paper

Video


Abstract

Panoramic video generation has attracted growing attention due to its applications in virtual reality and immersive media. However, existing methods lack explicit motion control and struggle to generate scenes with large and complex motions. We propose PanFlow, a novel approach that exploits the spherical nature of panoramas to decouple the highly dynamic camera rotation from the input optical flow condition, enabling more precise control over large and dynamic motions. We further introduce a spherical noise warping strategy to promote loop consistency in motion across panorama boundaries. To support effective training, we curate a large-scale, motion-rich panoramic video dataset with frame-level pose and flow annotations. We also showcase the effectiveness of our method in various applications, including motion transfer and video editing. Extensive experiments demonstrate that PanFlow significantly outperforms prior methods in motion fidelity, visual quality, and temporal coherence.


Method

**Spherical Camera Optical Flow.** The optical flow from a panoramic video (left) can be interpreted as a spherical camera optical flow (right). For complex motion **f**, the camera rotation yields an analytic rotation flow **f**r on the sphere. By decomposing **f** into **f**r and its residual, we obtain a derotated flow **f**d that more clearly captures camera translation and object dynamics.
Spherical Camera Optical Flow. The optical flow from a panoramic video (left) can be interpreted as a spherical camera optical flow (right). For complex motion f, the camera rotation yields an analytic rotation flow fr on the sphere. By decomposing f into fr and its residual, we obtain a derotated flow fd that more clearly captures camera translation and object dynamics.
**Our proposed PanFlow pipeline.** Given an input image and text prompt, PanFlow uses a decoupled motion from a video as reference to generate a panoramic video. We first estimate a decoupled optical flow from the reference video, of which the derotated flow is used to generate a latent noise with spherical noise warping. The latent noise then serves as a motion condition for a video diffusion transformer with LoRA fine-tuning to generate derotated videos. Finally, the decoupled rotation is accumulated and applied to the generated video frames to recover the full motion.
Our proposed PanFlow pipeline. Given an input image and text prompt, PanFlow uses a decoupled motion from a video as reference to generate a panoramic video. We first estimate a decoupled optical flow from the reference video, of which the derotated flow is used to generate a latent noise with spherical noise warping. The latent noise then serves as a motion condition for a video diffusion transformer with LoRA fine-tuning to generate derotated videos. Finally, the decoupled rotation is accumulated and applied to the generated video frames to recover the full motion.

Applications

By conditioning diffusion on spherical-warped motion noise, PanFlow enables precise motion control, produces loop-consistent panoramas, and supports applications such as motion transfer:

transfer

and panoramic video editing:

editing

Cheng Zhang
Cheng Zhang
Ph.D Student

My research interests include 3D generation, novel view synthesis, image generation etc.