Publications
You can also find my articles on my Google Scholar Profile.Research Topics:
- Lifelong Learning
- Self-supervised Learning
- Robot Learning
- Visual Learning
- Multimodal Machine Learning
- Theoritical Machine Learning
- Immersive Computing
Multimodal Machine Learning
![]() | AlignNet: A Unifying Approach to Audio-Visual Alignment Jianren Wang*, Zhaoyuan Fang*, Hang Zhao (* indicates equal contribution) 2020 Winter Conference on Applications of Computer Vision [Project Page] [Code] [Data] [Abstract] [Bibtex] We present AlignNet, a model designed to synchronize a video with a reference audio under non-uniform and irregular misalignment. AlignNet learns the end-to-end dense correspondence between each frame of a video and an audio. Our method is designed according to simple and well-established principles: attention, pyramidal processing, warping, and affinity function. Together with the model, we release a dancing dataset Dance50 for training and evaluation. Qualitative, quantitative and subjective evaluation results on dance-music alignment and speech-lip alignment demonstrate that our method far outperforms the state-of-the-art methods. @inproceedings{jianren20alignnet, Author = {Wang, Jianren and Fang, Zhaoyuan and Zhao, Hang}, Title = {AlignNet: A Unifying Approach to Audio-Visual Alignment}, Booktitle = {WACV}, Year = {2020} } |