Skip to main content
Workshops & seminars

From Low-level Vision to Spatial Intelligence: A First-Principle Approach


Date & time
Monday, March 18, 2024
10 a.m. – 11:30 a.m.
Speaker(s)

Chengzhou Tang

Cost

This event is free

Website

Contact

Tristan Glatard

Where

ER Building
2155 Guy St.
Room ER-1072

Wheel chair accessible

Yes

Abstract:



Recent advancements led by large-scale language models (LLMs) show that Artificial General Intelligence (AGI) is potentially within reach, as demonstrated in numerous applications such as AI agents and code generation. LLMs have also been leveraged for high-level task planning of robotics in the real world. These progresses are contributed by the logistic reasoning capabilities embedded in the LLMs. However, it remains an open question how we can achieve intelligent systems that can interact with the physical world and understand its underlying principles.


In this talk, I will first briefly review the previous efforts in computer vision, focusing particularly on how these pioneering works perceive the physical world computationally and how they advanced the development of intelligent autonomous systems. Then, I will introduce the idea of integrating the first-principle rules within neural networks through optimization with learned objective function and regularization. This idea, first adopted for the task of 3D reconstruction of static scenes, bridges the gap between end-to-end learning-based methods and conventional multi-view geometry methods. Following this, I will introduce subsequent works that generalize this idea to a variety of other low-level vision tasks. Specifically, I will discuss how to decouple the task-specific objective function from the model, making it possible to learn a single model for multiple tasks with shared model weights, and even enabling zero-shot transfer between tasks.

 

Biography:

Chengzhou Tang is a research scientist at Meta Reality Labs. He obtained his Ph.D. from Simon Fraser University under the guidance of Dr. Ping Tan in 2020. His research encompasses low-level and 3D computer vision, along with their applications such as SLAM (Simultaneous Localization and Mapping) in robotics. A cornerstone of his works is the integration of first-principle rules within neural networks through differentiable optimization. This technique facilitates both learning with physical inductive bias and optimization with learned regularization, offering a unified and efficient framework for tackling a variety of low- and mid-level computer vision tasks. His works, which are at the intersection of computer vision, computer graphics, deep learning, and robotics, have been published in corresponding venues including CVPR, SIGGRAPH, ICLR, and ICRA.

Back to top

© Concordia University