State Estimation with YOLO

In this project, I built an MOT pipeline using YOLO models, a powerful Extended Kalman Filter (EKF), and an Intersection Over Union (IoU) association approach to track objects using public driving data. I deployed the algorithm in the Robot Operating System (ROS) middleware, leveraging its extensive ecosystem of packages and tools as an off-the-shelf C++ solution.

YOLO MOT Comparison Video

Check out the video above to see the working demonstration!

Background & Motivation

Where Kalman Filters Shine

YOLO MOT

In Multi-Object Tracking (MOT), we often deal with data from multiple sources, such as camera feeds or LiDAR scans shown above, where each object needs a unique identifier to be tracked across frames. However, using raw sensor data directly for tracking has several limitations:

To address these challenges, we introduce the Kalman Filter. This filter maintains a state (e.g., location, speed) for each tracked object and continuously updates this state as new detections are received. Here’s how the Kalman Filter enhances MOT:

MOT in Late-Stage Track Fusion

Late fusion, in the context of Multi-Object Tracking (MOT), is a technique for combining information from various tracks after each sensor has independently processed its data.

Late Fusion

Above is an example from my previous work. I used YOLO to detect and classify objects in video feeds and fused these with 3D tracks from LiDAR. By combining the tracks of each sensor we get a more refined understanding of the objects in the scene to allow for more complex decision making.

An advantage of late fusion is that it enables flexibility and adaptability, allowing each detection system to operate optimally in specific conditions.

Multi-Object Tracking Pipeline

Here’s a breakdown of the key steps in my MOT pipeline:

Tuning

A big advantage of integrating this system in ROS is that it allows developers to tune parameters on the fly using playback or live data. The picture above highlights the crucial configurations that were needed to tune the MOT system. I explain these in more detail in the relevant section of my project Github page.

Challenges, Alternatives, and Final Thoughts

Building a Multi-Object Tracking (MOT) pipeline with the Extended Kalman Filter (EKF) and YOLO was both rewarding and challenging. While the Kalman Filter provides robust tracking capabilities, it has limitations, particularly during extended occlusions or situations where objects exhibit irregular motion. These can lead to issues like ID swaps, where object identities are lost or confused during tracking.

Alternatives to classical methods like EKF include approaches such as SORT (Simple Online and Realtime Tracking), which leverages the Hungarian Algorithm for better association, reducing the likelihood of ID swaps. For a more resilient solution, DeepSORT incorporates feature embeddings from a Recurrent Neural Network (RNN), significantly enhancing tracking accuracy, especially in dense and complex scenes. You can learn more about DeepSORT here.

Implementing and refining these techniques gave me a deep appreciation for the challenges of real-world tracking and the powerful open-source tools that support innovation in this field. I hope my project can inspire others!