|
Gang YU (俞刚)
I am a Principal Research Scientist at StepFun
(阶跃星辰). My research interests focus on the computer vision and artificical
intelligence, specifically on the topic of generative AI, object detection,
segmentation, human keypoint, human action recognition, and 3D reconstruction. I
obtained PhD from NTU in 2014
supervised by Prof. Junsong Yuan. Before
joining StepFun, I worked as a research director at Tencent for four years and another five years at Megvii (Face++).
Google
Scholar /
CV / Zhihu
|
|
|
News
Five papers accepted by CVPR2026.
Six papers accepted by ICLR2026.
I will serve as an Area Chair for CVPR2026.
Five papers accepted by Neurips2025.
Three papers accepted by ICCV2025.
Two papers accepted by CVPR2025.
I will serve as an Area Chair for CVPR2025.
One paper accepted by Neurips2024.
Two papers accepted by ECCV2024.
Two papers accepted by CVPR2024.
I will serve as an Innovation Program (Industry) Chair for ICME2024.
I will serve as an Area Chair for CVPR2024.
Three papers accepted by NeurIPS2023.
Three papers accepted by ICCV2023.
Three papers accepted by CVPR2023.
We have organized a tutorial Mobile Visual
Analytics in CVPR2021.
Our team obtained the first place of Mobile AI
Challenge in the Depth Estimation Challenge (CVPR
2021).
We have organized a tutorial Human Pose Estimation and
Action Recognition [Skeleton, Action]
(ICIP 2019).
We have organized a tutorial Object Detection in Recent Three
Years [Detection,
AutoML,
Fine-Grained]
(ICME 2019).
We organized the Detection In the Wild
(DIW2019) Challenge in CVPR2019.
Our team obtained the first place of nuScenes
3D Detection and BDD100K & D²-City Detection
Domain Adaptation in the Workshop on Autonomous
Driving (CVPR 2019).
Our team obtained the first place of COCO Detecction, COCO Keypoint Detection, COCO Panoptic Segmentation,
Mapillary Panoptic Segmentation (four Champions) in the COCO +
Mapillary Joint Challenge. ChinaMedia 1 ChinaMedia
2(ECCV 2018).
Our team obtained the first place of WiderFace Detecction in the Wider Challenge (ECCV 2018).
Our team obtained the first place of Video
Instance Segmentation [Report][Slides] in the WAD2018 (Workshop on Autonomous Driving)
Challenge (CVPR 2018).
Our team obtained the first place of AVA
[Report][Slides] and second place of Moments in
time [Report] in the ActivityNet2018
Challenge (CVPR 2018).
Presentation: Beyond RetinaNet and Mask
R-CNN, Jiangmen (将门),
2018
Presentation: Introduction to Object Detection,
PKU & CAS, 2018
Our team obtained the first place of COCO 2017 Challenge (Detection
Track & Keypoint
Track) (ICCV 2017).
|
|
Pre-prints
|
|
Step-Audio-R1 Technical
Report
Fei Tian, Xiangyu Tony Zhang, Yuxin Zhang, Haoyang Zhang, Yuxin Li, Daijiao Liu, Yayue
Deng, Donghang Wu, Jun Chen, Liang Zhao, Chengyuan Yao, Hexin Liu, Eng Siong Chng,
Xuerui Yang, Xiangyu Zhang, Daxin Jiang, Gang Yu
Arxiv, 2025
|
|
Step-Audio-EditX Technical
Report
Chao Yan, Boyong Wu, Peng Yang, Pengfei Tan, Guoqiang Hu, Li Xie, Yuxin Zhang, Xiangyu
(Tony)Zhang, Fei Tian, Xuerui Yang, Xiangyu Zhang, Daxin Jiang, Shuchang Zhou,
Gang Yu
Arxiv, 2025
|
|
Step-Audio 2 Technical
Report
Boyong Wu, Chao Yan, Chen Hu, Cheng Yi, Chengli Feng, Fei Tian, Feiyu Shen, Gang
Yu, Haoyang Zhang, etc
Arxiv, 2025
|
|
Step1X-Edit: A Practical
Framework for General Image Editing
Shiyu Liu, Yucheng Han, Peng Xing, Fukun Yin, Rui Wang, Wei Cheng, Jiaqi Liao, Yingming
Wang, Honghao Fu, Chunrui Han, Guopeng Li, Yuang Peng, Quan Sun, Jingwei Wu, Yan Cai,
Zheng Ge, Ranchen Ming, Lei Xia, Xianfang Zeng, Yibo Zhu, Binxing Jiao, Xiangyu Zhang,
Gang Yu, Daxin Jiang
Arxiv, 2025
|
|
Step-Video-T2V Technical
Report: The Practice, Challenges, and Future of Video Foundation Model
Guoqing Ma, Haoyang Huang, Kun Yan, Liangyu Chen, Nan Duan, Shengming Yin, Changyi Wan,
Ranchen Ming, Xiaoniu Song, Xing Chen, Yu Zhou, Deshan Sun, Deyu Zhou, Jian Zhou, Kaijun
Tan, Kang An, Mei Chen, Wei Ji, Qiling Wu, Wen Sun, Xin Han, Yanan Wei, Zheng Ge, Aojie
Li, Bin Wang, Bizhu Huang, Bo Wang, Brian Li, Changxing Miao, Chen Xu, Chenfei Wu,
Chenguang Yu, Dapeng Shi, Dingyuan Hu, Enle Liu, Gang Yu, Ge Yang,
Guanzhe Huang, etc
Arxiv, 2025
|
|
AppAgent: Multimodal
Agents as Smartphone Users
Chi Zhang, Zhao Yang, Jiaxuan Liu, Yucheng Han, Xin Chen, Zebiao Huang, Bin Fu,
Gang Yu
Arxiv, 2023
|
|
Conference
|
|
ReasonEdit: Towards
Reasoning-Enhanced Image Editing Models
Fukun Yin, Shiyu Liu, Yucheng Han, Zhibo Wang, Peng Xing, Rui Wang, Wei Cheng, Yingming
Wang, Aojie Li, Zixin Yin, Pengtao Chen, Xiangyu Zhang, Daxin Jiang, Xianfang Zeng,
Gang Yu
CVPR, 2026
|
|
OmniLottie: Generating Vector Animations via Parameterized
Lottie Tokens
Yiying Yang, Wei Cheng, Sijin Chen, Honghao Fu, Xianfang Zeng, Yujun Cai, Gang
Yu, Xingjun Ma,
CVPR, 2026
|
|
ViStoryBench:
Comprehensive Benchmark Suite for Story Visualization
Cailin Zhuang, Ailin Huang, Yaoqi Hu, Jingwei Wu, Wei Cheng, Jiaqi Liao, Hongyuan Wang,
Xinyao Liao, Weiwei Cai, Hengyuan Xu, Xuanyang Zhang, Xianfang Zeng, Zhewei Huang,
Gang Yu, Chi Zhang
CVPR, 2026
|
|
Dropping Anchor and Spherical Harmonics for Gaussian
Splatting
Shuangkang Fang, I-Chao Shen, Xuanyang Zhang, ZeSheng Wang, Yufeng Wang, Wenrui Ding,
Gang Yu, Takeo Igarashi
CVPR, 2026
|
|
iMontage: Unified,
Versatile, Highly Dynamic Many-to-many Image Generation
Zhoujie Fu, Xianfang Zeng, Jinghong Lan, Xinyao Liao, Cheng Chen, Junyi Chen, Jiacheng
Wei, Wei Cheng, Shiyu Liu, Yunuo Chen, Gang Yu, Guosheng Lin
CVPR, 2026
|
|
RegionE: Adaptive
Region-Aware Generation for Efficient Image Editing
Pengtao Chen, Xianfang Zeng, Maosen Zhao, Mingzhu Shen, Peng Ye, Bangyin Xiang, Zhibo
Wang, Wei Cheng, Gang Yu, Tao Chen
ICLR, 2026
|
|
WithAnyone: Toward
Controllable and ID Consistent Image Generation
Hengyuan Xu, Wei Cheng, Peng Xing, Yixiao Fang, Shuhan Wu, Rui Wang, Xianfang Zeng,
Daxin Jiang, Gang Yu, Xingjun Ma, Yu-Gang Jiang
ICLR, 2026
|
|
Training-Free Text-Guided
Color Editing with Multi-Modal Diffusion Transformer
Zixin Yin, Xili Dai, Ling-Hao Chen, Deyu Zhou, Jianan Wang, Duomin Wang, Gang
Yu, Lionel Ni, Lei Zhang, Heung-Yeung Shum
ICLR, 2026
|
|
LazyDrag: Enabling Stable
Drag-Based Editing on Multi-Modal Diffusion Transformers via Explicit
Correspondence
Zixin Yin, Xili Dai, Duomin Wang, Xianfang Zeng, Lionel Ni, Gang Yu,
Heung-Yeung Shum
ICLR, 2026
|
|
IGGT: Instance-Grounded
Geometry Transformer for Semantic 3D Reconstruction
Hao Li, Zhengyu Zou, Fangfu Liu, Xuanyang Zhang, Fangzhou Hong, Yukang Cao, Yushi LAN,
Manyuan Zhang, Gang Yu, Dingwen Zhang, Ziwei Liu
ICLR, 2026
|
|
SpeakerVid-5M: A
Large-Scale High-Quality Dataset for Audio-Visual Dyadic Interactive Human
Generation
Youliang Zhang, Zhaoyang Li, Duomin Wang, jiahe zhang, Deyu Zhou, Zixin Yin, Xili Dai,
Gang Yu, Xiu Li
ICLR, 2026
|
|
Sparse-vDiT: Unleashing
the Power of Sparse Attention to Accelerate Video Diffusion Transformers
Pengtao Chen, Xianfang Zeng, Maosen Zhao, Peng Ye, Mingzhu Shen, Wei Cheng, Gang
Yu, Tao Chen
AAAI, 2025
|
|
OmniSVG: A Unified
Scalable Vector Graphics Generation Model
Yiying Yang, Wei Cheng, Sijin Chen, Xianfang Zeng, Fukun Yin, Jiaxu Zhang, Liao Wang,
Gang Yu, Xingjun Ma, Yu-Gang Jiang
Neurips, 2025
|
|
Vision Foundation Models
as Effective Visual Tokenizers for Autoregressive Generation
Anlin Zheng, Xin Wen, Xuanyang Zhang, Chuofan Ma, Tiancai Wang, Gang
Yu, Xiangyu Zhang, Xiaojuan Qi
Neurips, 2025
|
|
KRIS-Bench: Benchmarking
Next-Level Intelligent Image Editing Models
Yongliang Wu, Zonghui Li, Xinting Hu, Xinyu Ye, Xianfang Zeng, Gang Yu,
Wenbo Zhu, Bernt Schiele, Ming-Hsuan Yang, Xu Yang
Neurips, DB Track, 2025
|
|
FAVOR-Bench: A
Comprehensive Benchmark for Fine-Grained Video Motion Understanding
Chongjun Tu, Lin Zhang, Pengtao Chen, Peng Ye, Xianfang Zeng, Wei Cheng, Gang
Yu, Tao Chen
Neurips, DB Track, 2025
|
|
OneIG-Bench:
Omni-dimensional Nuanced Evaluation for Image Generation
Jingjing Chang, Yixiao Fang, Peng Xing, Shuhan Wu, Wei Cheng, Rui Wang, Xianfang Zeng,
Gang Yu, Hai-Bao Chen
Neurips, DB Track, 2025
|
|
MeshAnything:
Artist-Created Mesh Generation with Autoregressive Transformers
Yiwen Chen, Tong He, Di Huang, Weicai Ye, Sijin Chen, Jiaxiang Tang, Xin Chen, Zhongang
Cai, Lei Yang, Gang Yu, Guosheng Lin, Chi Zhang
ICLR, 2025
|
|
MotionAgent: Fine-grained
Controllable Video Generation via Motion Field Agent
Xinyao Liao, Xianfang Zeng, Liao Wang, Gang Yu, Guosheng Lin, Chi
Zhang
ICCV, 2025
|
|
MikuDance: Animating
Character Art with Mixed Motion Dynamics
Jiaxu Zhang, Xianfang Zeng, Xin Chen, Wei Zuo, Gang Yu, Zhigang Tu
ICCV, 2025 (ORAL)
|
|
SC-Captioner: Improving
Image Captioning with Self-Correction by Reinforcement Learning
Lin Zhang, Xianfang Zeng, Kangcong Li, Gang Yu, Tao Chen
ICCV, 2025
|
|
DeRS: Towards Extremely
Efficient Upcycled Mixture-of-Experts Models
Yongqi Huang, Peng Ye, Chenyu Huang, Jianjian Cao, Lin Zhang, Baopu Li, Gang
Yu, Tao Chen
CVPR, 2025
|
|
MVPaint: 3D Texture
Generation with Multi-View Consistency
Wei Cheng, Juncheng Mu, Xianfang Zeng, Xin Chen, Anqi Pang, Chi Zhang, Zhibin Wang, Bin
Fu, Gang Yu, Ziwei Liu, Liang Pan
CVPR, 2025
|
|
MeshXL: Neural Coordinate
Field for Generative 3D Foundation Models
Sijin Chen, Xin Chen, Anqi Pang, Xianfang Zeng, Wei Cheng, Yijun Fu, Fukun Yin, Yanru
Wang, Zhibin Wang, Chi Zhang, Jingyi Yu, Gang Yu, Bin Fu, Tao Chen
Neurips, 2024
|
|
M3DBench: Towards Omni 3D
Assistant with Interleaved Multi-modal Instructions
Mingsheng Li, Xin Chen, Chi Zhang, Sijin Chen, Hongyuan Zhu, Fukun Yin, Gang
Yu, Tao Chen
ECCV, 2024
|
|
MotionChain:
Conversational Motion Controllers via Multimodal Prompts
Biao Jiang, Xin Chen, Chi Zhang, Fukun Yin, Zhuoyuan Li, Gang Yu,
Jiayuan Fan
ECCV, 2024
|
|
Generative Motion
Stylization of Cross-structure Characters within Canonical Motion Space
Jiaxu Zhang, Xin Chen, Gang Yu, Zhigang Tu
ACM Multimedia, 2024
|
|
Enhanced Visual
Instruction Tuning with Synthesized Image-Dialogue Data
Yanda Li, Chi Zhang, Gang Yu, Wanqi Yang, Zhibin Wang, BIN FU, Guosheng
Lin, Chunhua Shen, Ling Chen, Yunchao Wei
ACL Findings, 2024
|
|
Paint3D: Paint Anything 3D
with Lighting-Less Texture Diffusion Models
Xianfang Zeng, Xin Chen, Zhongqi Qi, Wen Liu, Zibo Zhao, Zhibin Wang, Bin Fu, Yong Liu,
Gang Yu
CVPR, 2024
|
|
LL3DA: Visual Interactive
Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning
Sijin Chen, Xin Chen, Chi Zhang, Mingsheng Li, Gang Yu, Hao Fei,
Hongyuan Zhu, Jiayuan Fan, Tao Chen
CVPR, 2024
|
|
TapMo: Shape-aware Motion
Generation of Skeleton-free Characters
Jiaxu Zhang, Shaoli Huang, Zhigang Tu, Xin Chen, Xiaohang Zhan, Gang
Yu, Ying Shan
ICLR, 2024
|
|
IT3D: Improved Text-to-3D
Generation with Explicit View Synthesis
Yiwen Chen, Chi Zhang, Xiaofeng Yang, Zhongang Cai, Gang Yu, Lei Yang,
Guosheng Lin
AAAI, 2024
|
|
PM-INR: Prior-Rich Multi-Modal Implicit Large-Scale Scene Neural
Representation
Yiying Yang, Fukun Yin, Wen Liu, Jiayuan Fan, Xin Chen, Gang Yu Tao
Chen
AAAI, 2024
|
|
MotionGPT: Human Motion as
a Foreign Language
Biao Jiang, Xin Chen, Wen Liu, Jingyi Yu, Gang Yu, Tao Chen
NeurIPS, 2023
|
|
Michelangelo: Conditional
3D Shape Generation based on Shape-Image-Text Aligned Latent Representation
Zibo Zhao, Wen Liu, Xin Chen, Xianfang Zeng, Rui Wang, Pei Cheng, Bin Fu, Tao Chen,
Gang Yu, Shenghua Gao
NeurIPS, 2023
|
|
PDF: Point Diffusion Implicit Function for Large-scale
Scene Neural Representation
Yuhan Ding, Fukun Yin, Jiayuan Fan, Hui Li, Xin Chen, Wen Liu, Chongshan Lu,
Gang Yu, Tao Chen
NeurIPS, 2023
|
|
A Large-Scale Outdoor
Multi-modal Dataset and Benchmark for Novel View Synthesis and Implicit Scene
Reconstruction
Chongshan Lu, Fukun Yin, Xin Chen, Wen Liu, Tao Chen, Gang Yu, Jiayuan
Fan
ICCV, 2023
|
|
Metric3D: Towards
Zero-shot Metric 3D Prediction from A Single Image
Wei Yin, Chi Zhang, Hao Chen, Zhipeng Cai, Xiaozhi Chen, Kaixuan Wang,Gang
Yu, Chunhua Shen
ICCV, 2023
|
|
Robust Geometry-Preserving Depth Estimation Using
Differentiable Rendering
Chi Zhang, Wei Yin, Gang Yu, Zhibin Wang, Tao Chen, Bin Fu, Tianyi
Zhou, Chunhua Shen
ICCV, 2023
|
|
Executing your Commands
via Motion Diffusion in Latent Space
Xin Chen, Biao Jiang, Wen Liu, Zilong Huang, BIN FU, Tao Chen, Gang Yu
CVPR, 2023
|
|
STAR Loss: Reducing
Semantic Ambiguity in Facial Landmark Detection
Zhenglin Zhou, Huaxia Li, Hong Liu, Nanyang Wang, Gang Yu , Rongrong Ji
CVPR, 2023
|
|
End-to-End 3D Dense
Captioning with Vote2Cap-DETR
Sijin Chen, Hongyuan Zhu, Xin Chen, Yinjie Lei, Tao Chen, Gang Yu
CVPR, 2023
|
|
Capturing the motion
of every joint: 3D human pose and shape estimation with independent tokens
Sen Yang, Wen Heng, Gang Liu, Guozhong Luo, Wankou Yang, Gang Yu
ICLR, 2023
|
|
SeaFormer:
Squeeze-enhanced Axial Transformer for Mobile Semantic Segmentation
Qiang Wan, Zilong Huang, Jiachen Lu, Gang Yu, Li Zhang
ICLR, 2023
|
|
Hierarchical
Normalization for Robust Monocular Depth Estimation
Chi Zhang, Wei Yin, Zhibin Wang, Gang Yu, Bin Fu, Chunhua Shen
NeurIPS, 2022
|
|
Coordinates Are NOT
Lonely - Codebook Prior Helps Implicit Neural 3D Representations
Fukun Yin, Wen Liu, Zilong Huang, Pei Cheng, Tao Chen, Gang Yu
NeurIPS, 2022
|
|
D&D: Learning Human
Dynamics from Dynamic Camera
Jiefeng Li, Siyuan Bian, Chao Xu, Gang Liu, Gang Yu, Cewu Lu
ECCV, 2022 (ORAL)
|
|
TopFormer: Token
Pyramid Transformer for Mobile Semantic Segmentation
Wenqiang Zhang, Zilong Huang, Guozhong Luo, Tao Chen, Xinggang Wang, Wenyu Liu,
Gang Yu, Chunhua Shen
CVPR, 2022
|
|
Object-aware
Long-short-range Spatial Alignment for Few-Shot Fine-Grained Image
Classification
Yike Wu, Bo Zhang, Gang Yu, Weixi Zhang, Bin Wang, Tao Chen, Jiayuan
Fan
ACM MM, 2021
|
|
Attribute-specific
Control Units in StyleGAN for Fine-grained Image Manipulation
Rui Wang, Jian Chen, Gang Yu, Li Sun, Changqian Yu, Changxin Gao, Nong
Sang
ACM MM, 2021
|
|
State-Aware Tracker for
Real-Time Video Object Segmentation
Xi Chen, Zuoxin Li, Ye Yuan, Gang Yu, Jian-Xin Shen, Donglian Qi
CVPR, 2020
|
|
High-Order Information
Matters: Learning Relation and Topology for Occluded Person
Re-Identification
Guan'an Wang, Shuo Yang, Huanyu Liu, Zhicheng Wang, Yang Yang, Shuliang Wang,
Gang Yu, Erjin Zhou, Jian Sun
CVPR, 2020
|
|
Context Prior for Scene
Segmentation
Changqian Yu, Jingbo Wang, Changxin Gao, Gang Yu, Chunhua Shen, Nong
Sang
CVPR, 2020
|
|
SiamFC++: Towards Robust
and Accurate Visual Tracking with Target Estimation Guidelines
Yinda Xu, Zeyu Wang, Zuoxin Li, Ye Yuan, Gang Yu
AAAI, 2020
|
|
Learnable Tree Filter for
Structure-preserving Feature Transform
Lin Song, Yanwei Li, Zeming Li, Gang Yu, Hongbin Sun, Jian Sun, Nanning
Zheng
NIPS, 2019
|
|
ThunderNet: Towards
Real-time Generic Object Detection
Zheng Qin, Zeming Li, Zhaoning Zhang, Yiping Bao, Gang Yu, Yuxing Peng,
Jian Sun
ICCV, 2019
|
|
Efficient and
Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network
Wenhai Wang, Enze Xie, Xiaoge Song, Yuhang Zang, Wenjia Wang, Tong Lu, Gang
Yu, Chunhua Shen
ICCV, 2019
|
|
Objects365: A Large-scale,
High-quality Dataset for Object Detection
Shuai Shao, Zeming Li, Tianyuan Zhang, Chao Peng, Gang Yu, Jing Li,
Xiangyu Zhang, Jian Sun
ICCV, 2019
|
|
An End-to-end Network
for Panoptic Segmentation
Huanyu Liu, Chao Peng, Changqian Yu, Jingbo Wang, Xu Liu, Gang Yu, Wei
Jiang
CVPR, 2019
|
|
TACNet: Transition-Aware Context
Network for Spatio-Temporal Action Detection
Lin Song, Shiwei Zhang, Gang Yu, Hongbin Sun
CVPR, 2019
|
|
Modeling Local
Geometric Structure of 3D Point Clouds using Geo-CNN
Shiyi Lan, Ruichi Yu, Gang Yu, Larry Davis
CVPR, 2019
|
|
Shape Robust Text
Detection with Progressive Scale Expansion Network
Wenhai Wang, Xiang Li, Enze Xie, Wenbo Hou, Tong Lu, Gang Yu, Shuai
Shao
CVPR, 2019
|
|
Scene Text Detection
with Supervised Pyramid Context Network
Enze Xie, Yuhang Zang, Shuai Shao, Gang Yu, Cong Yao, Guangyao Li
AAAI, 2019
|
|
Attention-based
Multi-Context Guiding for Few-Shot Semantic Segmentation
Tao Hu, Pengwan Yang, Chiliang Zhang, Gang Yu, Yadong Mu, Cees Snoek
AAAI, 2019
|
|
DetNet: A Backbone
network for Object
Detection
Zeming Li, Chao Peng, Gang Yu, Xiangyu Zhang, Yangdong Deng, Jian Sun
ECCV, 2018
|
|
BiSeNet: Bilateral
Segmentation Network for Real-time Semantic Segmentation
Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, Nong
Sang
ECCV, 2018
|
|
Associating
Inter-Image Salient Instances forWeakly Supervised Semantic Segmentation
Ruochen Fan, Qibin Hou, Ming-ming Chen, Gang Yu, Ralph R. Martin,
Shi-min Hu
ECCV, 2018
|
|
MegDet: A Large Mini-Batch Object
Detector
Chao Peng, Tete Xiao, Zeming Li, Yuning Jiang, Xiangyu Zhang, Kai Jia, Gang
Yu, Jian Sun
CVPR, 2018
|
|
Cascaded Pyramid Network for
Multi-Person Pose Estimation [Code]
Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian
Sun
CVPR, 2018
|
|
Learning a Discriminative Feature
Network for Semantic Segmentation
Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, Nong
Sang
CVPR, 2018
|
|
R-FCN++: Towards Accurate
Region-based Fully Convolutional Networks for Object Detection
Zeming Li, Yilun Chen, Gang Yu, Xiangyu Zhang, Jian Sun
AAAI, 2018
|
|
Large Kernel Matters -- Improve
Semantic Segmentation by Global Convolutional Network
Chao Peng, Xiangyu Zhang, Gang Yu, Guiming Luo, Jian Sun
CVPR, 2017
|
|
Fast
Action Proposals for Human Action Detection and Search
Gang Yu, Junsong Yuan
CVPR, 2015
|
|
Discriminative
Orderlet Mining For Real-time Recognition of Human-Object Interaction
[Project]
Gang Yu, Zicheng Liu, Junsong Yuan
ACCV, 2014
|
|
Scalable
Forest Hashing for Fast Similarity Search
Gang Yu, Junsong Yuan
ICME, 2014
|
|
Propagative
Hough Voting for Human Activity Recognition
Gang Yu, Junsong Yuan, Zicheng Liu
ECCV, 2012
|
|
Randomized
Spatial Partition for Scene Recognition
Yuning Jiang, Junsong Yuan, Gang Yu
ECCV, 2012
|
|
Predicting
Human Activities using Spatio-Temporal Structure of Interest Points
Gang Yu, Junsong Yuan, Zicheng Liu
ACM MM, 2012
|
|
Unsupervised
Random Forest Indexing for Fast Action Search
Gang Yu, Junsong Yuan, Zicheng Liu
CVPR, 2011
|
|
Real-time
HumanAction Search using Random Forest based Hough Voting
Gang Yu, Junsong Yuan, Zicheng Liu
ACM MM, 2011
|
|
Journal
|
|
Lightweight Model
Pre-Training Via Language Guided Knowledge Distillation
Mingsheng Li, Lin Zhang, Mingzhen Zhu, Zilong Huang, Gang Yu, Jiayuan
Fan, Tao Chen
IEEE Trans. on Multimedia, 2024
|
|
Vote2cap-detr++:
Decoupling localization and describing for end-to-end 3d dense
captioning
Sijin Chen, Hongyuan Zhu, Mingsheng Li, Xin Chen, Peng Guo, Yinjie Lei, Gang
Yu, Taihao Li, Tao Chen
IEEE Trans. on Pattern Analysis and Machine Intelligence, 2024
|
|
BiSeNet V2: Bilateral
Network with Guided Aggregation for Real-time Semantic Segmentation
Changqian Yu, Changxin Gao, Jingbo Wang, Gang Yu, Chunhua Shen, Nong
Sang
International Journal of Computer Vision, 2021
|
|
Propagative
Hough Voting for Human Activity Detection and Recognition
Gang Yu, Junsong Yuan, Zicheng Liu
IEEE Trans. on Circuits and Systems for Video Technology, Vol.25, Issue 1,
pp.87-98, 2014
|
|
Action
Search by Example using Randomized Visual Vocabularies
Gang Yu, Junsong Yuan, Zicheng Liu
IEEE Trans. on Image Processing, Vol.22, Issue 1, pp. 377-390, 2013
|
|
Fast
Action Detection via Discriminative Random Forest Voting and Top-K Subvolume
Search
Gang Yu, Norberto A., Junsong Yuan, Zicheng Liu
IEEE Trans. on Multimedia, Vol.13, Issue 3, pp. 507-517, 2013
|
|