Multi-Agent Perception

Yen-Cheng Liu

Junjiao Tian

Nathan Glaser

Chih-Yao Ma

Chia-Wen Kuo

Zsolt Kira

Figure: Multi-Agent Perception via Communication.

Motivation

While significant advances have been made for single-agent perception (i.e., computer vision tasks from a single perspective), single-agent perception suffers from occlusion and sensor degradation. This motivates us to develop a multi-agent perception system, where one agent improves perception accuracy by communicating and sharing information with other agents.

Challenges

Bandwidth usage: Each agent is required to transmit encoded visual data (high-dimensional feature maps) to other agents, and high bandwidth usage may result in network congestion and latency issues. Therefore, an efficient multi-agent perception system should be bandwidth-limited.
Tradeoff between bandwidth and accuracy: Transmission bandwidth limitation might decrease the perception accurcay improvement.

Figure: A fully-connected communication network consumes high transmission bandwidth.

Our Methods

Learnable 3-way handshake communication:

Request: The degraded agent sends compact query to normal agents.
Match: Normal agents compute and return the matching scores.
Connect: Based on the matching scores, the degraded agent decide who to communicate with.

Therefore, with the assumption of known degraded agent(s), our model learns who to commmunicate with [1].

Figure: Learnable 3-way handshaking communication.

However, an ideal communication mechanism is to switch on transmission when the agent requires information from other agents to improve its perception ability, while it should also switch off the transmission when it has sufficient information for its own perception task. Thus, with the assumption of unknown degraded agents, our model applies self-attention to learn when to communicate [2].

Figure: When2com model.

Reducing network complexity: As mentioned above, using the fully-connect communication network results in high bandwidth usage. We thus prune the less important links according to edge weights (i.e., matching scores).

Figure: Network complexity reduction.

1-min Video

Paper

ICRA'20

CVPR'20

Code

AirSim-MAP Dataset

Our AirSim-MAP dataset provides RGB images, depth maps, camera poses, and semantic segmentation labels for 5-6 agents/views. All images are captured simultaneously. This dataset enables several under-explored multi-agent perception tasks, such as 3D semantic segmentation, depth estimation, pose estimation, etc. Note that our AirSim-CP in ICRA paper has an asynchronous issue between views, so we recommend to use AirSim-MAP for further applications. The major differences between AirSim-MAP and AirSim-CP are trajectories and motions of drones. Feel free to contact @YenCheng for further details.

Figure: Visualization of our collected datasets.

References

[1] "Who2com: Collaborative Perception via Learnable Handshake Communication", Yen-Cheng Liu, Junjiao Tian, Chih-Yao Ma, Nathaniel Glaser, Chia-Wen Kuo, Zsolt Kira, ICRA 2020
[2] "When2com: Multi-Agent Perception via Communication Graph Grouping", Yen-Cheng Liu, Junjiao Tian*, Nathaniel Glaser*, Zsolt Kira, CVPR 2020 (* equal contribution)

Team Members & Collaborators