Gang YU (俞刚)
I am a Principal Research Scientist at StepFun (阶跃星辰). My research interests focus on the computer vision and artificical intelligence, specifically on the topic of generative AI, object detection, segmentation, human keypoint, human action recognition, and 3D reconstruction. I obtained PhD from NTU in 2014 supervised by Prof. Junsong Yuan. Before joining StepFun, I worked as a research director at Tencent for four years and another five years at Megvii (Face++).
Google Scholar /
CV / Zhihu
|
|
News
I will serve as an Area Chair for CVPR2025.
One paper accepted by Neurips2024.
Two papers accepted by ECCV2024.
Two papers accepted by CVPR2024.
I will serve as an Innovation Program (Industry) Chair for ICME2024.
I will serve as an Area Chair for CVPR2024.
Three papers accepted by NeurIPS2023.
Three papers accepted by ICCV2023.
Three papers accepted by CVPR2023.
We have organized a tutorial Mobile Visual Analytics in CVPR2021.
Our team obtained the first place of Mobile AI Challenge in the Depth Estimation Challenge (CVPR 2021).
We have organized a tutorial Human Pose Estimation and Action Recognition [Skeleton, Action] (ICIP 2019).
We have organized a tutorial Object Detection in Recent Three Years [Detection, AutoML, Fine-Grained] (ICME 2019).
We organized the Detection In the Wild (DIW2019) Challenge in CVPR2019.
Our team obtained the first place of nuScenes 3D Detection and BDD100K & D²-City Detection Domain Adaptation in the Workshop on Autonomous Driving (CVPR 2019).
Our team obtained the first place of COCO Detecction, COCO Keypoint Detection, COCO Panoptic Segmentation, Mapillary Panoptic Segmentation (four Champions) in the COCO + Mapillary Joint Challenge. ChinaMedia 1 ChinaMedia 2(ECCV 2018) .
Our team obtained the first place of WiderFace Detecction in the Wider Challenge (ECCV 2018).
Our team obtained the first place of Video Instance Segmentation [Report][Slides] in the WAD2018 (Workshop on Autonomous Driving) Challenge (CVPR 2018).
Our team obtained the first place of AVA [Report][Slides] and second place of Moments in time [Report] in the ActivityNet2018 Challenge (CVPR 2018).
Presentation: Beyond RetinaNet and Mask R-CNN, Jiangmen (将门), 2018
Presentation: Introduction to Object Detection, PKU & CAS, 2018
Our team obtained the first place of COCO 2017 Challenge (Detection Track & Keypoint Track) (ICCV 2017).
|
Pre-prints
|
AppAgent: Multimodal Agents as Smartphone Users
Chi Zhang, Zhao Yang, Jiaxuan Liu, Yucheng Han, Xin Chen, Zebiao Huang, Bin Fu, Gang Yu
Arxiv, 2023
|
ChartLlama: A Multimodal LLM for Chart Understanding and Generation
Yucheng Han, Chi Zhang, Xin Chen, Xu Yang, Zhibin Wang, Gang Yu, Bin Fu, Hanwang Zhang
Arxiv, 2023
|
FaceStudio: Put Your Face Everywhere in Seconds
Yuxuan Yan, Chi Zhang, Rui Wang, Yichao Zhou, Gege Zhang, Pei Cheng, Gang Yu, Bin Fu
Arxiv, 2023
|
ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model
Fukun Yin, Xin Chen, Chi Zhang, Biao Jiang, Zibo Zhao, Jiayuan Fan, Gang Yu, Taihao Li, Tao Chen
Arxiv, 2023
|
Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer
Zilong Huang, Youcheng Ben, Guozhong Luo, Pei Cheng, Gang Yu, Bin Fu
Arxiv, 2021
|
Conference
|
MeshXL: Neural Coordinate Field for Generative 3D Foundation Models
Sijin Chen, Xin Chen, Anqi Pang, Xianfang Zeng, Wei Cheng, Yijun Fu, Fukun Yin, Yanru Wang, Zhibin Wang, Chi Zhang, Jingyi Yu, Gang Yu, Bin Fu, Tao Chen
Neurips, 2024
|
M3DBench: Towards Omni 3D Assistant with Interleaved Multi-modal Instructions
Mingsheng Li, Xin Chen, Chi Zhang, Sijin Chen, Hongyuan Zhu, Fukun Yin, Gang Yu, Tao Chen
ECCV, 2024
|
MotionChain: Conversational Motion Controllers via Multimodal Prompts
Biao Jiang, Xin Chen, Chi Zhang, Fukun Yin, Zhuoyuan Li, Gang Yu, Jiayuan Fan
ECCV, 2024
|
Generative Motion Stylization of Cross-structure Characters within Canonical Motion Space
Jiaxu Zhang, Xin Chen, Gang Yu, Zhigang Tu
ACM Multimedia, 2024
|
Enhanced Visual Instruction Tuning with Synthesized Image-Dialogue Data
Yanda Li, Chi Zhang, Gang Yu, Wanqi Yang, Zhibin Wang, BIN FU, Guosheng Lin, Chunhua Shen, Ling Chen, Yunchao Wei
ACL Findings, 2024
|
Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models
Xianfang Zeng, Xin Chen, Zhongqi Qi, Wen Liu, Zibo Zhao, Zhibin Wang, Bin Fu, Yong Liu, Gang Yu
CVPR, 2024
|
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning
Sijin Chen, Xin Chen, Chi Zhang, Mingsheng Li, Gang Yu, Hao Fei, Hongyuan Zhu, Jiayuan Fan, Tao Chen
CVPR, 2024
|
TapMo: Shape-aware Motion Generation of Skeleton-free Characters
Jiaxu Zhang, Shaoli Huang, Zhigang Tu, Xin Chen, Xiaohang Zhan, Gang Yu, Ying Shan
ICLR, 2024
|
IT3D: Improved Text-to-3D Generation with Explicit View Synthesis
Yiwen Chen, Chi Zhang, Xiaofeng Yang, Zhongang Cai, Gang Yu, Lei Yang, Guosheng Lin
AAAI, 2024
|
PM-INR: Prior-Rich Multi-Modal Implicit Large-Scale Scene Neural
Representation
Yiying Yang, Fukun Yin, Wen Liu, Jiayuan Fan, Xin Chen, Gang Yu Tao Chen
AAAI, 2024
|
MotionGPT: Human Motion as a Foreign Language
Biao Jiang, Xin Chen, Wen Liu, Jingyi Yu, Gang Yu, Tao Chen
NeurIPS, 2023
|
Michelangelo: Conditional 3D Shape Generation based on Shape-Image-Text Aligned Latent Representation
Zibo Zhao, Wen Liu, Xin Chen, Xianfang Zeng, Rui Wang, Pei Cheng, Bin Fu, Tao Chen, Gang Yu, Shenghua Gao
NeurIPS, 2023
|
PDF: Point Diffusion Implicit Function for Large-scale Scene Neural Representation
Yuhan Ding, Fukun Yin, Jiayuan Fan, Hui Li, Xin Chen, Wen Liu, Chongshan Lu, Gang Yu, Tao Chen
NeurIPS, 2023
|
A Large-Scale Outdoor Multi-modal Dataset and Benchmark for Novel View Synthesis and Implicit Scene Reconstruction
Chongshan Lu, Fukun Yin, Xin Chen, Wen Liu, Tao Chen, Gang Yu, Jiayuan Fan
ICCV, 2023
|
Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image
Wei Yin, Chi Zhang, Hao Chen, Zhipeng Cai, Xiaozhi Chen, Kaixuan Wang,Gang Yu, Chunhua Shen
ICCV, 2023
|
Robust Geometry-Preserving Depth Estimation Using Differentiable Rendering
Chi Zhang, Wei Yin, Gang Yu, Zhibin Wang, Tao Chen, Bin Fu, Tianyi Zhou, Chunhua Shen
ICCV, 2023
|
Executing your Commands via Motion Diffusion in Latent Space
Xin Chen, Biao Jiang, Wen Liu, Zilong Huang, BIN FU, Tao Chen, Gang Yu
CVPR, 2023
|
STAR Loss: Reducing Semantic Ambiguity in Facial Landmark Detection
Zhenglin Zhou, Huaxia Li, Hong Liu, Nanyang Wang, Gang Yu , Rongrong Ji
CVPR, 2023
|
End-to-End 3D Dense Captioning with Vote2Cap-DETR
Sijin Chen, Hongyuan Zhu, Xin Chen, Yinjie Lei, Tao Chen, Gang Yu
CVPR, 2023
|
Capturing the motion of every joint: 3D human pose and shape estimation with independent tokens
Sen Yang, Wen Heng, Gang Liu, Guozhong Luo, Wankou Yang, Gang Yu
ICLR, 2023
|
SeaFormer: Squeeze-enhanced Axial Transformer for Mobile Semantic Segmentation
Qiang Wan, Zilong Huang, Jiachen Lu, Gang Yu, Li Zhang
ICLR, 2023
|
Hierarchical Normalization for Robust Monocular Depth Estimation
Chi Zhang, Wei Yin, Zhibin Wang, Gang Yu, Bin Fu, Chunhua Shen
NeurIPS, 2022
|
Coordinates Are NOT Lonely - Codebook Prior Helps Implicit Neural 3D Representations
Fukun Yin, Wen Liu, Zilong Huang, Pei Cheng, Tao Chen, Gang Yu
NeurIPS, 2022
|
D&D: Learning Human Dynamics from Dynamic Camera
Jiefeng Li, Siyuan Bian, Chao Xu, Gang Liu, Gang Yu, Cewu Lu
ECCV, 2022 (ORAL)
|
TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation
Wenqiang Zhang, Zilong Huang, Guozhong Luo, Tao Chen, Xinggang Wang, Wenyu Liu, Gang Yu, Chunhua Shen
CVPR, 2022
|
Object-aware Long-short-range Spatial Alignment for Few-Shot Fine-Grained Image Classification
Yike Wu, Bo Zhang, Gang Yu, Weixi Zhang, Bin Wang, Tao Chen, Jiayuan Fan
ACM MM, 2021
|
Attribute-specific Control Units in StyleGAN for Fine-grained Image Manipulation
Rui Wang, Jian Chen, Gang Yu, Li Sun, Changqian Yu, Changxin Gao, Nong Sang
ACM MM, 2021
|
State-Aware Tracker for Real-Time Video Object Segmentation
Xi Chen, Zuoxin Li, Ye Yuan, Gang Yu, Jian-Xin Shen, Donglian Qi
CVPR, 2020
|
High-Order Information Matters: Learning Relation and Topology for Occluded Person Re-Identification
Guan'an Wang, Shuo Yang, Huanyu Liu, Zhicheng Wang, Yang Yang, Shuliang Wang, Gang Yu, Erjin Zhou, Jian Sun
CVPR, 2020
|
Context Prior for Scene Segmentation
Changqian Yu, Jingbo Wang, Changxin Gao, Gang Yu, Chunhua Shen, Nong Sang
CVPR, 2020
|
SiamFC++: Towards Robust and Accurate Visual Tracking with Target Estimation Guidelines
Yinda Xu, Zeyu Wang, Zuoxin Li, Ye Yuan, Gang Yu
AAAI, 2020
|
Learnable Tree Filter for Structure-preserving Feature Transform
Lin Song, Yanwei Li, Zeming Li, Gang Yu, Hongbin Sun, Jian Sun, Nanning Zheng
NIPS, 2019
|
ThunderNet: Towards Real-time Generic Object Detection
Zheng Qin, Zeming Li, Zhaoning Zhang, Yiping Bao, Gang Yu, Yuxing Peng, Jian Sun
ICCV, 2019
|
Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network Wenhai Wang, Enze Xie, Xiaoge Song, Yuhang Zang, Wenjia Wang, Tong Lu, Gang Yu, Chunhua Shen
ICCV, 2019
|
Objects365: A Large-scale, High-quality Dataset for Object Detection Shuai Shao, Zeming Li, Tianyuan Zhang, Chao Peng, Gang Yu, Jing Li, Xiangyu Zhang, Jian Sun
ICCV, 2019
|
An End-to-end Network for Panoptic Segmentation
Huanyu Liu, Chao Peng, Changqian Yu, Jingbo Wang, Xu Liu, Gang Yu, Wei Jiang
CVPR, 2019
|
TACNet: Transition-Aware Context Network for Spatio-Temporal Action Detection
Lin Song, Shiwei Zhang, Gang Yu, Hongbin Sun
CVPR, 2019
|
Modeling Local Geometric Structure of 3D Point Clouds using Geo-CNN
Shiyi Lan, Ruichi Yu, Gang Yu, Larry Davis
CVPR, 2019
|
Shape Robust Text Detection with Progressive Scale Expansion Network
Wenhai Wang, Xiang Li, Enze Xie, Wenbo Hou, Tong Lu, Gang Yu, Shuai Shao
CVPR, 2019
|
Scene Text Detection with Supervised Pyramid Context Network
Enze Xie, Yuhang Zang, Shuai Shao, Gang Yu, Cong Yao, Guangyao Li
AAAI, 2019
|
Attention-based Multi-Context Guiding for Few-Shot Semantic Segmentation
Tao Hu, Pengwan Yang, Chiliang Zhang, Gang Yu, Yadong Mu, Cees Snoek
AAAI, 2019
|
DetNet: A Backbone network for Object
Detection
Zeming Li, Chao Peng, Gang Yu, Xiangyu Zhang, Yangdong Deng, Jian Sun
ECCV, 2018
|
BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation
Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, Nong Sang
ECCV, 2018
|
Associating Inter-Image Salient Instances forWeakly Supervised Semantic Segmentation
Ruochen Fan, Qibin Hou, Ming-ming Chen, Gang Yu, Ralph R. Martin, Shi-min Hu
ECCV, 2018
|
MegDet: A Large Mini-Batch Object Detector
Chao Peng, Tete Xiao, Zeming Li, Yuning Jiang, Xiangyu Zhang, Kai Jia, Gang Yu, Jian Sun
CVPR, 2018
|
Cascaded Pyramid Network for Multi-Person Pose Estimation [Code]
Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun
CVPR, 2018
|
Learning a Discriminative Feature Network for Semantic Segmentation
Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, Nong Sang
CVPR, 2018
|
R-FCN++: Towards Accurate Region-based Fully Convolutional Networks for Object Detection
Zeming Li, Yilun Chen, Gang Yu, Xiangyu Zhang, Jian Sun
AAAI, 2018
|
Large Kernel Matters -- Improve Semantic Segmentation by Global Convolutional Network
Chao Peng, Xiangyu Zhang, Gang Yu, Guiming Luo, Jian Sun
CVPR, 2017
|
Fast Action Proposals for Human Action Detection and Search
Gang Yu, Junsong Yuan
CVPR, 2015
|
Discriminative Orderlet Mining For Real-time Recognition of Human-Object Interaction [Project]
Gang Yu, Zicheng Liu, Junsong Yuan
ACCV, 2014
|
Scalable Forest Hashing for Fast Similarity Search
Gang Yu, Junsong Yuan
ICME, 2014
|
Propagative Hough Voting for Human Activity Recognition
Gang Yu, Junsong Yuan, Zicheng Liu
ECCV, 2012
|
Randomized Spatial Partition for Scene Recognition
Yuning Jiang, Junsong Yuan, Gang Yu
ECCV, 2012
|
Predicting Human Activities using Spatio-Temporal Structure of Interest Points
Gang Yu, Junsong Yuan, Zicheng Liu
ACM MM, 2012
|
Unsupervised Random Forest Indexing for Fast Action Search
Gang Yu, Junsong Yuan, Zicheng Liu
CVPR, 2011
|
Real-time HumanAction Search using Random Forest based Hough Voting
Gang Yu, Junsong Yuan, Zicheng Liu
ACM MM, 2011
|
Journal
|
Lightweight Model Pre-Training Via Language Guided Knowledge Distillation
Mingsheng Li, Lin Zhang, Mingzhen Zhu, Zilong Huang, Gang Yu, Jiayuan Fan, Tao Chen
IEEE Trans. on Multimedia, 2024
|
Vote2cap-detr++: Decoupling localization and describing for end-to-end 3d dense captioning
Sijin Chen, Hongyuan Zhu, Mingsheng Li, Xin Chen, Peng Guo, Yinjie Lei, Gang Yu, Taihao Li, Tao Chen
IEEE Trans. on Pattern Analysis and Machine Intelligence, 2024
|
BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation
Changqian Yu, Changxin Gao, Jingbo Wang, Gang Yu, Chunhua Shen, Nong Sang
International Journal of Computer Vision, 2021
|
Propagative Hough Voting for Human Activity Detection and Recognition
Gang Yu, Junsong Yuan, Zicheng Liu
IEEE Trans. on Circuits and Systems for Video Technology, Vol.25, Issue 1, pp.87-98, 2014
|
Action Search by Example using Randomized Visual Vocabularies
Gang Yu, Junsong Yuan, Zicheng Liu
IEEE Trans. on Image Processing, Vol.22, Issue 1, pp. 377-390, 2013
|
Fast Action Detection via Discriminative Random Forest Voting and Top-K Subvolume Search
Gang Yu, Norberto A., Junsong Yuan, Zicheng Liu
IEEE Trans. on Multimedia, Vol.13, Issue 3, pp. 507-517, 2013
|
|