Gang YU (俞刚)

I am a Principal Research Scientist at StepFun (阶跃星辰). My research interests focus on the computer vision and artificical intelligence, specifically on the topic of generative AI, object detection, segmentation, human keypoint, human action recognition, and 3D reconstruction. I obtained PhD from NTU in 2014 supervised by Prof. Junsong Yuan. Before joining StepFun, I worked as a research director at Tencent for four years and another five years at Megvii (Face++).

Google Scholar / CV / Zhihu



Gang Yu
News

I will serve as an Area Chair for CVPR2025.

One paper accepted by Neurips2024.

Two papers accepted by ECCV2024.

Two papers accepted by CVPR2024.

I will serve as an Innovation Program (Industry) Chair for ICME2024.

I will serve as an Area Chair for CVPR2024.

Three papers accepted by NeurIPS2023.

Three papers accepted by ICCV2023.

Three papers accepted by CVPR2023.

We have organized a tutorial Mobile Visual Analytics in CVPR2021.

Our team obtained the first place of Mobile AI Challenge in the Depth Estimation Challenge (CVPR 2021).

We have organized a tutorial Human Pose Estimation and Action Recognition [Skeleton, Action] (ICIP 2019).

We have organized a tutorial Object Detection in Recent Three Years [Detection, AutoML, Fine-Grained] (ICME 2019).

We organized the Detection In the Wild (DIW2019) Challenge in CVPR2019.

Our team obtained the first place of nuScenes 3D Detection and BDD100K & D²-City Detection Domain Adaptation in the Workshop on Autonomous Driving (CVPR 2019).

Our team obtained the first place of COCO Detecction, COCO Keypoint Detection, COCO Panoptic Segmentation, Mapillary Panoptic Segmentation (four Champions) in the COCO + Mapillary Joint Challenge. ChinaMedia 1 ChinaMedia 2(ECCV 2018) .

Our team obtained the first place of WiderFace Detecction in the Wider Challenge (ECCV 2018).

Our team obtained the first place of Video Instance Segmentation [Report][Slides] in the WAD2018 (Workshop on Autonomous Driving) Challenge (CVPR 2018).

Our team obtained the first place of AVA [Report][Slides] and second place of Moments in time [Report] in the ActivityNet2018 Challenge (CVPR 2018).

Presentation: Beyond RetinaNet and Mask R-CNN, Jiangmen (将门), 2018

Presentation: Introduction to Object Detection, PKU & CAS, 2018

Our team obtained the first place of COCO 2017 Challenge (Detection Track & Keypoint Track) (ICCV 2017).



Pre-prints

AppAgent: Multimodal Agents as Smartphone Users
Chi Zhang, Zhao Yang, Jiaxuan Liu, Yucheng Han, Xin Chen, Zebiao Huang, Bin Fu, Gang Yu
Arxiv, 2023

ChartLlama: A Multimodal LLM for Chart Understanding and Generation
Yucheng Han, Chi Zhang, Xin Chen, Xu Yang, Zhibin Wang, Gang Yu, Bin Fu, Hanwang Zhang
Arxiv, 2023

FaceStudio: Put Your Face Everywhere in Seconds
Yuxuan Yan, Chi Zhang, Rui Wang, Yichao Zhou, Gege Zhang, Pei Cheng, Gang Yu, Bin Fu
Arxiv, 2023

ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model
Fukun Yin, Xin Chen, Chi Zhang, Biao Jiang, Zibo Zhao, Jiayuan Fan, Gang Yu, Taihao Li, Tao Chen
Arxiv, 2023

Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer
Zilong Huang, Youcheng Ben, Guozhong Luo, Pei Cheng, Gang Yu, Bin Fu
Arxiv, 2021



Conference

MeshXL: Neural Coordinate Field for Generative 3D Foundation Models
Sijin Chen, Xin Chen, Anqi Pang, Xianfang Zeng, Wei Cheng, Yijun Fu, Fukun Yin, Yanru Wang, Zhibin Wang, Chi Zhang, Jingyi Yu, Gang Yu, Bin Fu, Tao Chen
Neurips, 2024

M3DBench: Towards Omni 3D Assistant with Interleaved Multi-modal Instructions
Mingsheng Li, Xin Chen, Chi Zhang, Sijin Chen, Hongyuan Zhu, Fukun Yin, Gang Yu, Tao Chen
ECCV, 2024

MotionChain: Conversational Motion Controllers via Multimodal Prompts
Biao Jiang, Xin Chen, Chi Zhang, Fukun Yin, Zhuoyuan Li, Gang Yu, Jiayuan Fan
ECCV, 2024

Generative Motion Stylization of Cross-structure Characters within Canonical Motion Space
Jiaxu Zhang, Xin Chen, Gang Yu, Zhigang Tu
ACM Multimedia, 2024

Enhanced Visual Instruction Tuning with Synthesized Image-Dialogue Data
Yanda Li, Chi Zhang, Gang Yu, Wanqi Yang, Zhibin Wang, BIN FU, Guosheng Lin, Chunhua Shen, Ling Chen, Yunchao Wei
ACL Findings, 2024

Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models
Xianfang Zeng, Xin Chen, Zhongqi Qi, Wen Liu, Zibo Zhao, Zhibin Wang, Bin Fu, Yong Liu, Gang Yu
CVPR, 2024

LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning
Sijin Chen, Xin Chen, Chi Zhang, Mingsheng Li, Gang Yu, Hao Fei, Hongyuan Zhu, Jiayuan Fan, Tao Chen
CVPR, 2024

TapMo: Shape-aware Motion Generation of Skeleton-free Characters
Jiaxu Zhang, Shaoli Huang, Zhigang Tu, Xin Chen, Xiaohang Zhan, Gang Yu, Ying Shan
ICLR, 2024

IT3D: Improved Text-to-3D Generation with Explicit View Synthesis
Yiwen Chen, Chi Zhang, Xiaofeng Yang, Zhongang Cai, Gang Yu, Lei Yang, Guosheng Lin
AAAI, 2024

PM-INR: Prior-Rich Multi-Modal Implicit Large-Scale Scene Neural Representation
Yiying Yang, Fukun Yin, Wen Liu, Jiayuan Fan, Xin Chen, Gang Yu Tao Chen
AAAI, 2024

MotionGPT: Human Motion as a Foreign Language
Biao Jiang, Xin Chen, Wen Liu, Jingyi Yu, Gang Yu, Tao Chen
NeurIPS, 2023

Michelangelo: Conditional 3D Shape Generation based on Shape-Image-Text Aligned Latent Representation
Zibo Zhao, Wen Liu, Xin Chen, Xianfang Zeng, Rui Wang, Pei Cheng, Bin Fu, Tao Chen, Gang Yu, Shenghua Gao
NeurIPS, 2023

PDF: Point Diffusion Implicit Function for Large-scale Scene Neural Representation
Yuhan Ding, Fukun Yin, Jiayuan Fan, Hui Li, Xin Chen, Wen Liu, Chongshan Lu, Gang Yu, Tao Chen
NeurIPS, 2023

A Large-Scale Outdoor Multi-modal Dataset and Benchmark for Novel View Synthesis and Implicit Scene Reconstruction
Chongshan Lu, Fukun Yin, Xin Chen, Wen Liu, Tao Chen, Gang Yu, Jiayuan Fan
ICCV, 2023

Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image
Wei Yin, Chi Zhang, Hao Chen, Zhipeng Cai, Xiaozhi Chen, Kaixuan Wang,Gang Yu, Chunhua Shen
ICCV, 2023

Robust Geometry-Preserving Depth Estimation Using Differentiable Rendering
Chi Zhang, Wei Yin, Gang Yu, Zhibin Wang, Tao Chen, Bin Fu, Tianyi Zhou, Chunhua Shen
ICCV, 2023

Executing your Commands via Motion Diffusion in Latent Space
Xin Chen, Biao Jiang, Wen Liu, Zilong Huang, BIN FU, Tao Chen, Gang Yu
CVPR, 2023

STAR Loss: Reducing Semantic Ambiguity in Facial Landmark Detection
Zhenglin Zhou, Huaxia Li, Hong Liu, Nanyang Wang, Gang Yu , Rongrong Ji
CVPR, 2023

End-to-End 3D Dense Captioning with Vote2Cap-DETR
Sijin Chen, Hongyuan Zhu, Xin Chen, Yinjie Lei, Tao Chen, Gang Yu
CVPR, 2023

Capturing the motion of every joint: 3D human pose and shape estimation with independent tokens
Sen Yang, Wen Heng, Gang Liu, Guozhong Luo, Wankou Yang, Gang Yu
ICLR, 2023

SeaFormer: Squeeze-enhanced Axial Transformer for Mobile Semantic Segmentation
Qiang Wan, Zilong Huang, Jiachen Lu, Gang Yu, Li Zhang
ICLR, 2023

Hierarchical Normalization for Robust Monocular Depth Estimation
Chi Zhang, Wei Yin, Zhibin Wang, Gang Yu, Bin Fu, Chunhua Shen
NeurIPS, 2022

Coordinates Are NOT Lonely - Codebook Prior Helps Implicit Neural 3D Representations
Fukun Yin, Wen Liu, Zilong Huang, Pei Cheng, Tao Chen, Gang Yu
NeurIPS, 2022

D&D: Learning Human Dynamics from Dynamic Camera
Jiefeng Li, Siyuan Bian, Chao Xu, Gang Liu, Gang Yu, Cewu Lu
ECCV, 2022 (ORAL)

TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation
Wenqiang Zhang, Zilong Huang, Guozhong Luo, Tao Chen, Xinggang Wang, Wenyu Liu, Gang Yu, Chunhua Shen
CVPR, 2022

Object-aware Long-short-range Spatial Alignment for Few-Shot Fine-Grained Image Classification
Yike Wu, Bo Zhang, Gang Yu, Weixi Zhang, Bin Wang, Tao Chen, Jiayuan Fan
ACM MM, 2021

Attribute-specific Control Units in StyleGAN for Fine-grained Image Manipulation
Rui Wang, Jian Chen, Gang Yu, Li Sun, Changqian Yu, Changxin Gao, Nong Sang
ACM MM, 2021

State-Aware Tracker for Real-Time Video Object Segmentation
Xi Chen, Zuoxin Li, Ye Yuan, Gang Yu, Jian-Xin Shen, Donglian Qi
CVPR, 2020

High-Order Information Matters: Learning Relation and Topology for Occluded Person Re-Identification
Guan'an Wang, Shuo Yang, Huanyu Liu, Zhicheng Wang, Yang Yang, Shuliang Wang, Gang Yu, Erjin Zhou, Jian Sun
CVPR, 2020

Context Prior for Scene Segmentation
Changqian Yu, Jingbo Wang, Changxin Gao, Gang Yu, Chunhua Shen, Nong Sang
CVPR, 2020

SiamFC++: Towards Robust and Accurate Visual Tracking with Target Estimation Guidelines
Yinda Xu, Zeyu Wang, Zuoxin Li, Ye Yuan, Gang Yu
AAAI, 2020

Learnable Tree Filter for Structure-preserving Feature Transform
Lin Song, Yanwei Li, Zeming Li, Gang Yu, Hongbin Sun, Jian Sun, Nanning Zheng
NIPS, 2019

ThunderNet: Towards Real-time Generic Object Detection
Zheng Qin, Zeming Li, Zhaoning Zhang, Yiping Bao, Gang Yu, Yuxing Peng, Jian Sun
ICCV, 2019

Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network
Wenhai Wang, Enze Xie, Xiaoge Song, Yuhang Zang, Wenjia Wang, Tong Lu, Gang Yu, Chunhua Shen
ICCV, 2019

Objects365: A Large-scale, High-quality Dataset for Object Detection
Shuai Shao, Zeming Li, Tianyuan Zhang, Chao Peng, Gang Yu, Jing Li, Xiangyu Zhang, Jian Sun
ICCV, 2019

An End-to-end Network for Panoptic Segmentation
Huanyu Liu, Chao Peng, Changqian Yu, Jingbo Wang, Xu Liu, Gang Yu, Wei Jiang
CVPR, 2019

TACNet: Transition-Aware Context Network for Spatio-Temporal Action Detection
Lin Song, Shiwei Zhang, Gang Yu, Hongbin Sun
CVPR, 2019

Modeling Local Geometric Structure of 3D Point Clouds using Geo-CNN
Shiyi Lan, Ruichi Yu, Gang Yu, Larry Davis
CVPR, 2019

Shape Robust Text Detection with Progressive Scale Expansion Network
Wenhai Wang, Xiang Li, Enze Xie, Wenbo Hou, Tong Lu, Gang Yu, Shuai Shao
CVPR, 2019

Scene Text Detection with Supervised Pyramid Context Network
Enze Xie, Yuhang Zang, Shuai Shao, Gang Yu, Cong Yao, Guangyao Li
AAAI, 2019

Attention-based Multi-Context Guiding for Few-Shot Semantic Segmentation
Tao Hu, Pengwan Yang, Chiliang Zhang, Gang Yu, Yadong Mu, Cees Snoek
AAAI, 2019

DetNet: A Backbone network for Object Detection
Zeming Li, Chao Peng, Gang Yu, Xiangyu Zhang, Yangdong Deng, Jian Sun
ECCV, 2018

BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation
Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, Nong Sang
ECCV, 2018

Associating Inter-Image Salient Instances forWeakly Supervised Semantic Segmentation
Ruochen Fan, Qibin Hou, Ming-ming Chen, Gang Yu, Ralph R. Martin, Shi-min Hu
ECCV, 2018

MegDet: A Large Mini-Batch Object Detector
Chao Peng, Tete Xiao, Zeming Li, Yuning Jiang, Xiangyu Zhang, Kai Jia, Gang Yu, Jian Sun
CVPR, 2018

Cascaded Pyramid Network for Multi-Person Pose Estimation [Code]
Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun
CVPR, 2018

Learning a Discriminative Feature Network for Semantic Segmentation
Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, Nong Sang
CVPR, 2018

R-FCN++: Towards Accurate Region-based Fully Convolutional Networks for Object Detection
Zeming Li, Yilun Chen, Gang Yu, Xiangyu Zhang, Jian Sun
AAAI, 2018

Large Kernel Matters -- Improve Semantic Segmentation by Global Convolutional Network
Chao Peng, Xiangyu Zhang, Gang Yu, Guiming Luo, Jian Sun
CVPR, 2017

Fast Action Proposals for Human Action Detection and Search
Gang Yu, Junsong Yuan
CVPR, 2015

Discriminative Orderlet Mining For Real-time Recognition of Human-Object Interaction [Project]
Gang Yu, Zicheng Liu, Junsong Yuan
ACCV, 2014

Scalable Forest Hashing for Fast Similarity Search
Gang Yu, Junsong Yuan
ICME, 2014

Propagative Hough Voting for Human Activity Recognition
Gang Yu, Junsong Yuan, Zicheng Liu
ECCV, 2012

Randomized Spatial Partition for Scene Recognition
Yuning Jiang, Junsong Yuan, Gang Yu
ECCV, 2012

Predicting Human Activities using Spatio-Temporal Structure of Interest Points
Gang Yu, Junsong Yuan, Zicheng Liu
ACM MM, 2012

Unsupervised Random Forest Indexing for Fast Action Search
Gang Yu, Junsong Yuan, Zicheng Liu
CVPR, 2011

Real-time HumanAction Search using Random Forest based Hough Voting
Gang Yu, Junsong Yuan, Zicheng Liu
ACM MM, 2011



Journal

Lightweight Model Pre-Training Via Language Guided Knowledge Distillation
Mingsheng Li, Lin Zhang, Mingzhen Zhu, Zilong Huang, Gang Yu, Jiayuan Fan, Tao Chen
IEEE Trans. on Multimedia, 2024

Vote2cap-detr++: Decoupling localization and describing for end-to-end 3d dense captioning
Sijin Chen, Hongyuan Zhu, Mingsheng Li, Xin Chen, Peng Guo, Yinjie Lei, Gang Yu, Taihao Li, Tao Chen
IEEE Trans. on Pattern Analysis and Machine Intelligence, 2024

BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation
Changqian Yu, Changxin Gao, Jingbo Wang, Gang Yu, Chunhua Shen, Nong Sang
International Journal of Computer Vision, 2021

Propagative Hough Voting for Human Activity Detection and Recognition
Gang Yu, Junsong Yuan, Zicheng Liu
IEEE Trans. on Circuits and Systems for Video Technology, Vol.25, Issue 1, pp.87-98, 2014

Action Search by Example using Randomized Visual Vocabularies
Gang Yu, Junsong Yuan, Zicheng Liu
IEEE Trans. on Image Processing, Vol.22, Issue 1, pp. 377-390, 2013

Fast Action Detection via Discriminative Random Forest Voting and Top-K Subvolume Search
Gang Yu, Norberto A., Junsong Yuan, Zicheng Liu
IEEE Trans. on Multimedia, Vol.13, Issue 3, pp. 507-517, 2013

Book

Human Action Analysis with Randomized Trees
Gang Yu, Junsong Yuan, Zicheng Liu
SpringerBriefs, Springer, 2014