Learning about Kalman filters
This is my weekend project about Kalman filters, which I read about while looking into SLAM. SLAM is a family of algorithms used to figure out where a robot and the things around it are. You can do SLAM with Kalman filters, though in busy realtime environments you might prefer FastSLAM 2 or some other more scalable method. I chose to focus on Kalman filters as a starting point since they show up as components of those more advanced SLAM methods, but also in classic applications like radar, underscoring their ubiquity and generality. That's a good reason to invest some time in building an intuition for the math behind them, and a good excuse to make some fun visual demos/animations.
Some disclaimers:
-
My background is in compilers and CPU architecture. If I come across as having an outsiderish point of view on this topic, it's because I do! I may use odd terminology or just plain get things wrong. Let me know about any bugs you spot.
-
These notes are based on my Googling and Wikipedia diving: I'm not working through a specific paper or textbook.
Up next:
-
Visual demo environment
-
Kalman filter with nonlinear estimation (radar demo)
-
Kalman filters and SLAM
-
Wrap up
Visual demo environment
Throughout this post I've included simple 2D animations to illustrate key ideas.
In the demos, there's a "mouse" agent that can move and rotate on a bounded
<canvas>. The point
Kalman filter with nonlinear estimation (radar demo)
State estimation
The Kalman filter is a loop that can quickly crush uncertainty about the state of a changing system, at least in theory. In order to use it, you need to write down two prediction functions:
-
State prediction function:
is your state vector (e.g., position of an object), is a control vector that your estimator knows about (say, velocity), and . This function lets us ask: given some previous state and control input,
, what would I predict the next state to be? We don't try to predict though, so you can't stash things in there that you'd need to measure. -
Measurement prediction function:
The way to read this is: I've already predicted the state (
) and I'm about to measure it for real, so what should my measurement be if matched reality exactly? The slickest feature of
is that the vector space for measurements, , isn't required to be the same as the state vector space . That means it's OK to make measurements that don't live in state space: all you need is a way to relate the two. Later that'll let us define a change-of-basis matrix to "pull" measurements into state space, which is very handy for updating our knowledge of state uncertainty (covariances).
The advantages of having these specific predictors are:
-
If measurements are temporarily unavailable or excessively noisy,
lets you update state based on first-principles (e.g., physics) instead of freezing state in place. -
If
is temporarily inaccurate and we have relatively higher certainty in empirical measurements, the machinery around lets us prioritize those, and vice versa. It's like a self-balancing see-saw.
Kalman filters are theoretically optimal at reducing uncertainty when prediction functions are linear. I'm interested in nonlinear motion, so we'll look at an extended Kalman filter that trades away theoretical optimality for a little extra generality.
Aside: A downside of having two possibly-nonlinear prediction functions is that they can be complicated or expensive to compute. Sometimes a simpler alternative like the alpha-beta filter can be a better fit.
Writing down the prediction functions
Let's make the Kalman filter concrete by writing down the setup for a simple radar demo. We'll treat our mouse agent like a plane and have it move around in a circle in the sky. Then we'll point a radar gun at the mouse to estimate its position. Due to air turbulence, there's random jitter in both the mouse's true motion and in the radar sensor measurements.
The mouse's state includes position and orientation:
For smooth circular motion, we'll fix a constant angular speed (
When making a sensor reading, we'll get an angle and distance from the
stationary radar gun, so given the true position
Wiring it all up, the system looks like this. Try cranking the motion and radar noise up/down to see how the system reacts:
Expressing and updating uncertainty
In this section I'm aiming to develop intuition for both how and why the Kalman filter evolves state estimates and its own internal uncertainty. I value this because I like knowing where I might hit limitations or encounter tradeoffs. To that end I'm going to focus on the details that matter the most for a clean conceptual understanding, including the most interesting (to me) derivations, while avoiding exhaustively deriving everything.
To express our uncertainty about the state of the system, we need to make a
distinction between the "true" state of the system,
But naturally, we don't add any noise to our predictions:
The state uncertainty
We don't have a way to actually know the true state
Technically, the Taylor series expansion of
Thanks to the first-order approximation, there are no awkward power-of-
Now we make the reasonable assumption that
At this point we've fully covered a priori state prediction for time step
If we take a measurement,
If we add the residual
Ideally, we'd have a Kalman scaling factor for
To define the real Kalman scaling factor, we need a way to express measurement
uncertainty,
At this point I'm explicitly prioritizing an intuitive picture over a rigorous
formal derivation. It's possible to get the same result by writing down a
function to estimate the minimum mean-squared error for a linear Kalman filter,
taking the gradient, and solving for the critical point. For me, it was
more helpful to think about
The fraction metaphor highlights an edge case: what happens if you divide by
zero, i.e.
Measurement changes the state estimate
I think of the
Another way to think about the posterior uncertainty update is that the new
This is called the "Joseph form", which is used sometimes if you have a custom
Kalman scaling factor
Radar demo
We've covered how to set up and update a Kalman filter. To make a demo we need
the Jacobians for
Here
We also need to define the initial system state. Let's position
A cool feature here is that once the estimated uncertainty reaches a low level, you can ratchet up the radar noise really high and uncertainty tends to remain fixed. This is the see-saw in action, up-weighting the state prediction when the measurement becomes unreliable.
Kalman filters and SLAM
A robot can maintain Kalman filters for all the objects in its environment and itself. One way I can see this working is:
-
Combine IMU and less-frequent GPS readings to establish the robot's position and orientation.
-
Use camera, radar, and/or lidar sensors to observe the environment. Use mesh-building and segmentation to identify objects.
-
Register a Kalman filter for each object in the environment.
This hand-waves away a fair bit of complexity related to sensor fusion and object registration. There are a few issues with using a Kalman filter for each object I want to unpack first.
When we measure an object, we might get a
While it seems plausible that you could find a neat, closed-form way to make all these updates, it also seems fairly computationally expensive. Quoting from this 2006 survey:
The observation update step requires that all landmarks and the joint covariance matrix be updated every time an observation is made. Naively, this means computation grows quadratically with the number of landmarks.
In addition to the computational cost, you're restricted to a unimodal Gaussian estimate of the robot's state. In some contexts that's perfectly fine. But we might not always want that. There's a famous "kidnapped robot problem" where a Roomba gets dropped into a different room that looks identical to the one it was in. In that case, we want a probability spike for the Roomba being in either one of the two rooms, instead of a diffuse blob that could mean it's anywhere.
Particle filters are mentioned as one of the tools used to address these problem. I'd be interested in covering that next.
Here are some more open-ended problems I'd like to pull the thread on:
-
Say you're driving behind Car A. You and Car A ahead of you approach an intersection. You have the right of way, but you see another car perpendicular to your path, Car B, accelerate into the intersection. Car A momentarily accelerates as well and occludes your line of sight to Car B. If you maintain a Kalman filter for Car B, how do you update it while your line of sight is occluded? Do you expect Car B to yield and slow down, or is its driver likely to swerve 90°, parallel to your path, at speed? The answer affects whether you maintain your current speed or decelerate to reduce the risk of catastrophic collision.
Within the Kalman filter framework, you might say you make a policy choice about how to handle this situation by encoding it into your state prediction model
. But how? Maybe you hardcode a detector for this scenario and plan for the worst-case, or split the EKF into two to model the different outcomes probabilistically and use the fork which minimizes , or implement using a neural net. -
Edge cases pop up out of the woodwork when doing path planning. Clasically, you can plan paths using A* search. Layering on real-world obstacle avoidance makes this significantly more involved than the basic A* implementation I wrote in school. For one, your cost function has to model the fact that you can't rotate the ideal amount at some positions because the rotation would induce collision. And if you get close to bumping into an object, you need to make a Kalman filter update that very quickly drives down uncertainty about that object's position. It can feel like we're hardcoding a lot of details. Is there an alternative?
-
Since we mentioned algorithmic approaches to reducing computational cost, I wonder how much mileage we can get out of systems engineering optimizations. Chiefly, throwing a big GPU at the problem. We already saw that Kalman filter updates are fairly heavy on linear algebra. Are there enough divergent branches and data dependencies to make GPU offload impractical?
Wrap up
The big idea about Kalman filters is that they fuse predictions and measurements with quick (in some cases optimal) reduction in uncertainty. Their structure provides resiliency to noisy measurements and bogus state predictions, letting us describe an evolving system coherently.
Having written and debugged an extended Kalman filter, this feels like a good stopping point. We took the time to build deep intuition for how and why it works, and we saw where things can go wrong:
- Excessive measurement noise
- Linear approximation error (possibly compounding)
- Sudden loss of invertibility
- Unimodal assumption breaking down
This gives a decent feel for when Kalman filters are (or aren't) a good fit for a problem. That's a good foundation for chasing more advanced SLAM methods that build on the same ideas around managing uncertainty in tractable ways.