Diffusion Implicit Policy for Unpaired Scene-aware Motion Synthesis

Diffusion Implicit Policy for Unpaired Scene-aware Motion synthesis

¹East China Normal University
²Shanghai Jiao Tong University

Abstract

Human motion generation is a long-standing problem, and scene-aware motion synthesis has been widely researched recently due to its numerous applications. Prevailing methods rely heavily on paired motion-scene data whose quantity is limited. Meanwhile, it is difficult to generalize to diverse scenes when trained only on a few specific ones. Thus, we propose a unified framework, termed Diffusion Implicit Policy (DIP), for scene-aware motion synthesis, where paired motion-scene data are no longer necessary. In this framework, we disentangle human-scene interaction from motion synthesis during training and then introduce an interaction-based implicit policy into motion diffusion during inference. Synthesized motion can be derived through iterative diffusion denoising and implicit policy optimization, thus motion naturalness and interaction plausibility can be maintained simultaneously. The proposed implicit policy optimizes the intermediate noised motion in a GAN Inversion manner to maintain motion continuity and control keyframe poses though the ControlNet branch and motion inpainting. For long-term motion synthesis, we introduce motion blending for stable transitions between multiple sub-tasks, where motions are fused in rotation power space and translation linear space. The proposed method is evaluated on synthesized scenes with ShapeNet furniture, and real scenes from PROX and Replica. Results show that our framework presents better motion naturalness and interaction plausibility than cutting-edge methods. This also indicates the feasibility of utilizing the DIP for motion synthesis in more general tasks and versatile scenes.

Pipeline & Method

The left subfigure indicates the overall pipeline of scene-aware motion synthesis. Any feasible command will be first decomposed into sub-task with action-object pairs. Then, we will synthesize the future motion according to historical motion and current sub-task. Last, the synthesized motions will be fused into the historical motion to obtain the final long-term motion. The right subfigure presents the framework of Diffusion Implicit Policy (DIP). In each iteration of the DIP, the diffusion model will denoise the motion and enable the synthesized motion to appear more natural, and implicit policy optimization from reward will endow the motion with plausible interaction. The random sampling step can help the framework synthesize motion with diverse styles.

BibTeX

@article{gong2024dip, title={Diffusion Implicit Policy for Unpaired Scene-aware Motion Synthesis}, author={Gong, Jingyu and Zhang, Chong and Liu, Fengqi and Fan, Ke and Zhou, Qianyu and Tan, Xin and Zhang, Zhizhong and Xie, Yuan and Ma, Lizhuang}, journal={arXiv e-prints}, year={2024} }

Diffusion Implicit Policy for Unpaired Scene-aware Motion synthesis

Abstract

Pipeline & Method

Qualitative Results

BibTeX