Beyond Self-Driving: Exploring Three Levels of Driving Automation
ICCV 2025 Tutorial
TBD
TBD
Introduction
Self-driving technologies have demonstrated significant potential to transform human mobility. However, single-agent systems face inherent limitations in perception and decision-making capabilities. Transitioning from self-driving vehicles to cooperative multi-vehicle systems and large-scale intelligent transportation systems is essential to enable safer and more efficient mobility. However, realizing such sophisticated mobility systems introduces significant challenges, requiring comprehensive tools and models, simulation environments, real-world datasets, and deployment frameworks. This tutorial will delve into key areas of driving automation, beginning with advanced end-to-end self-driving techniques such as vision-language-action (VLA) models, interactive prediction and planning, and scenario generation. The tutorial emphasizes V2X communication and cooperative perception in real-world settings, as well as datasets including V2X-Real and V2XPnP. The tutorial also covers simulation and deployment frameworks for urban mobility, such as MetaDrive, MetaUrban, and UrbanSim. By bridging foundational research with real-world deployment, this tutorial offers practical insights into developing future-ready autonomous mobility systems.

Schedule
Time (GMT-10) | Programme |
---|---|
08:50 - 09:00 | Opening Remarks |
09:00 - 09:40 |
Placeholder Abstract_placeholder Zhiyu Huang is a postdoctoral scholar at the UCLA Mobility Lab, working under the guidance of Prof. Jiaqi Ma. He was previously a research intern at NVIDIA Research's Autonomous Vehicle Group and a visiting student researcher at UC Berkeley's Mechanical Systems Control (MSC) Lab. He received his Ph.D. from Nanyang Technological University (NTU), where he conducted research in the Automated Driving and Human-Machine System (AutoMan) Lab under the supervision of Prof. Chen Lyu. ![]()
Zhiyu Huang
Postdoctoral Researcher, UCLA |
09:40 - 10:20 |
Towards End-to-End Cooperative Systems across Multiple Agents and Temporal Frames Vehicle-to-Everything (V2X) technologies offer a promising paradigm to mitigate the limitations of constrained observability in single-vehicle systems through information exchange. However, existing cooperative systems are limited in cooperative perception tasks with single-frame multi-agent fusion, leading to a constrained scenario understanding without temporal cues. This tutorial will explore how cooperative systems can achieve a comprehensive spatio-temporal scene understanding and be jointly optimized for the full autonomy stack: perception, prediction, and planning. The tutorial will begin by introducing V2XPnP-Seq, the first real-world sequential dataset supporting all V2X collaboration modes (vehicle-centric, infrastructure-centric, V2V, and I2I). Attendees will learn how to leverage this dataset and its comprehensive benchmark, which evaluates 11 distinct fusion methods, to validate their own cooperative models. Next, the tutorial will delve into V2XPnP, a novel intermediate fusion end-to-end framework that operates within a single communication step. Compared to traditional multi-step strategies, this framework achieves a 12% gain in perception and prediction accuracy while reducing communication overhead by 5×. Training such a complex multi-agent, multi-frame, and multi-task system, however, poses significant challenges. To address this, TurboTrain will be presented, an efficient training paradigm that integrates spatio-temporal pretraining with balanced fine-tuning. Participants will gain insights into how this approach achieves 2× faster convergence and improved performance, while preserving the deployment of task-agnostic spatio-temporal features. Finally, the discussion will extend to the crucial task of planning. Here, Risk Map as Middleware (RiskMM) is introduced as an interpretable cooperative end-to-end planning framework that explicitly models agent interactions and risks. This approach enhances the transparency and trustworthiness of autonomous driving systems. Through this progressive exploration, attendees will gain a holistic understanding of the state-of-the-art in multi-agent cooperative systems and will be equipped with the knowledge to build, train, and evaluate their own end-to-end spatio-temporal V2X solutions. Zewei Zhou is a Ph.D. student in the UCLA Mobility Lab at the University of California, Los Angeles (UCLA), advised by Prof. Jiaqi Ma. He received his master’s degree from Tongji University with the honor of Shanghai Outstanding Graduate, and conducted research at the Institute of Intelligent Vehicles (TJU-IIV) under the supervision of Prof. Yanjun Huang and Prof. Zhuoping Yu. ![]()
Zewei Zhou
PhD Candidate, UCLA |
10:20 - 10:40 | Coffee Break |
10:40 - 11:20 |
Towards Efficient and Scalable Cooperative Driving System Recent advances in cooperative perception have shown significant performance gains for autonomous systems via Vehicle-to-Everything (V2X) communication, yet deployment in real-world settings remains constrained by data requirements, training cost, and real-time inference latency under bandwidth-limited conditions. This tutorial bridges algorithmic innovation with system-level feasibility, drawing from four recent lines of work. CooPre introduces data-efficient protocols that maximize perception benefits under limited annotation budgets, leveraging curriculum design and selective information sharing to reduce labeling costs while preserving multi-agent synergy. TurboTrain accelerates large-scale cooperative model training through architectural refinements and curriculum-based distributed learning, achieving up to 4× faster convergence while retaining competitive accuracy, enabling rapid iteration across diverse agent configurations. QuantV2X pioneers a fully quantized multi-agent perception pipeline, compressing both computation and communication to reduce system-level latency by 2.7× while retaining 99.8% of full-precision accuracy via alignment modules robust to spatial and distributional shifts. Finally, V2X-ReaLO delivers the first real-world, open cooperative perception testbed, enabling rigorous evaluation of V2X algorithms under realistic network latency, packet loss, and heterogeneous agent capabilities—bridging the gap between simulation and deployment. Together, these contributions chart an end-to-end path—from data curation and training acceleration to real-time deployment—equipping attendees with practical insights and open-source tools for building scalable, resource-aware multi-agent systems ready for real-world, bandwidth-constrained environments. Attendees will learn how to design data-efficient, resource-aware cooperative perception pipelines and gain hands-on experience with real-world evaluation and benchmarking techniques for V2X systems, including methods for handling network latency, edge memory constraint, and heterogeneous agent configurations. Seth Z. Zhao is a second-year Ph.D. student in Computer Science at UCLA, advised by Professors Bolei Zhou and Jiaqi Ma. He previously earned his M.S. and B.A. in Computer Science from UC Berkeley, where he conducted research under the guidance of Professors Masayoshi Tomizuka, Allen Yang, and Constance Chang-Hasnain. ![]()
Seth Z. Zhao
PhD Candidate, UCLA |
11:20 - 12:00 |
Placeholder Abstract_placeholder Wayne Wu is a Research Associate in the Department of Computer Science at the University of California, Los Angeles, working with Prof. Bolei Zhou. Prior to this, he was a Research Scientist at Shanghai AI Lab, where he led the Virtual Human Group. He also served as a Visiting Scholar at Nanyang Technological University, collaborating with Prof. Chen Change Loy. He earned his Ph.D. in June 2022 from the Department of Computer Science and Technology at Tsinghua University. ![]()
Wayne Wu
Research Associate, UCLA |
12:00 - 12:10 | Ending Remarks |
Resources
Project | Description | Link |
---|---|---|
OpenCDA | An open co-simulation-based research/engineering framework integrated with prototype cooperative driving automation pipelines. | github.com/ucla-mobility/OpenCDA |
V2X-Real | The first large-scale real-world dataset for Vehicle-to-Everything (V2X) cooperative perception. | mobility-lab.seas.ucla.edu/v2x-real |
V2XPnP | The first open-source V2X spatio-temporal fusion framework for cooperative perception and prediction. | mobility-lab.seas.ucla.edu/v2xpnp |
MetaDrive | An Open-source Driving Simulator for AI and Autonomy Research. | github.com/metadriverse/metadrive |
MetaUrban | An embodied AI simulation platform for urban micromobility | github.com/metadriverse/metaurban |
UrbanSim | A large-scale robot learning platform for urban spaces, built on NVIDIA Omniverse. | github.com/metadriverse/urban-sim |