Where the hell am I, and where is stuff around me?
Core Idea: This lecture focuses on fusing information from various sensors (like GPS, IMU, odometry, and 2D/3D marker detectors) to estimate a 2D robot’s position and orientation (and potentially a map of landmarks) using a probabilistic framework. The key is to represent sensor data and motion as probability distributions and then find the robot’s pose (and map) that maximizes the posterior probability – the MAP estimate. The lecture also covers the basics of representing motion in SE(2) and introduces iterative closest point (ICP) algorithms.
1. Sensors for Localization
1.1. Sensors for SLAM (Proprioceptive)
These sensors provide information about the robot’s internal state and movements:
- Motor encoders: Measure wheel/joint position and velocity.
- Accelerometer: Measures linear acceleration.
- Gyroscope: Measures angular velocity.
- Magnetometer: Measures the angle relative to magnetic north.
- IMU (Inertial Measurement Unit): Combines accelerometer, gyroscope, and sometimes magnetometer (9DOF - 9 Degrees of Freedom).
1.2. Sensors for Localization (Exteroceptive)
These sensors provide information about the environment:
- Camera (RGB): Captures images (spectral responses projected onto an image plane).
- Stereo Camera: Two cameras for depth perception.
- RGBD Camera: Captures both color (RGB) and depth (D) information (e.g., Kinect, RealSense).
- Lidar: Uses laser pulses to measure distances to surrounding objects.
- Sonar: Uses sound waves to measure distances.
- Radar: Uses radio waves to detect objects and measure their distance and velocity.
- Satellite Navigation (GPS/GNSS): Provides absolute position estimates.
- UWB (Ultra Wideband Radio): Uses radio signals for precise localization, often with fixed beacons (also referred to as “2D markers” in the lecture).
2. Localization Problem Definition
The localization problem is formally defined as follows:
- States:
x_0, x_1, ..., x_t ∈ R^n(typically 2D robot poses in this lecture, son=2for position only, orn=3if orientation is included. Later, we’ll extend this to SE(2)). These are the unknowns we want to estimate. - Actions:
u_1, ..., u_t ∈ R^m. These are control inputs or commands given to the robot (e.g., desired velocity, steering angle). They are generated by an external source. - Measurements:
z_1, ..., z_t ∈ R^k. Sensor readings from various sensors. - Unknown The position of the robot and/or location of the landmarks.
- Goal: Find the MAP estimate of the states given the measurements and actions.
MAP Estimate:
The MAP estimate is the state x* that maximizes the posterior probability:
x* = arg max_x p(x | z, u)
where:
xrepresents the sequence of statesx_0, ..., x_t.zrepresents the sequence of measurementsz_1, ..., z_t.urepresents the sequence of actionsu_1, ..., u_t.
Using Bayes’ theorem, and due to conditional independence, this can be rewritten (up to a proportionality constant) as:
x* = arg max_{x_0...x_t} p(x_0, ..., x_t | z_1, ..., z_t, u_1, ..., u_t)
∝ arg max_{x_0...x_t} p(x_0, ..., x_t, z_1, ..., z_t, u_1, ..., u_t)
3. Localization with GPS and Prior Pose (Simple Example)
This section introduces a simplified example:
- Estimating robot position (
x_1) given a GPS measurement (z_1^{GPS}) and a prior on the initial position (x_0).
x_1* = arg max_{x_1} p(x_1 | z_1^{GPS}, x_0)
- Using Bayes’ theorem and assuming Gaussian prior and likelihood:
x_1* = arg max_{x_1} N(z_1^{GPS}; x_1, Σ_{GPS}) * N(x_1; x_0, Σ_0)
Where:
N(μ, Σ): Gaussian distribution (mean μ, covariance Σ).Σ_{GPS}: GPS measurement covariance.-
Σ_0: prior covariance. - Further assuming
Σ_{GPS} = Σ_0 = [[1, 0], [0, 1]](equal and isotropic uncertainty), the optimization becomes a least-squares problem:
x_1* = arg min_{x_1} ||x_1 - z_1^{GPS}||^2 + ||x_1 - x_0||^2
Factor Graph Representation: Two unary factors:
x_1toz_1^{GPS}(GPS likelihood).x_1tox_0(prior).
||x_1 - x_0||^2
|
x_1
|
||x_1 - z_1^{GPS}||^2
Mechanical Analogy (Springs): The core intuition.
||x_1 - z_1^{GPS}||^2: A spring between robot position (x_1) and GPS reading (z_1^{GPS}). Stiffness is inversely proportional toΣ_{GPS}.||x_1 - x_0||^2: A spring between robot position (x_1) and the prior (x_0). Stiffness is inversely proportional toΣ_0.
The MAP estimate is the equilibrium point where the spring system’s potential energy is minimized. The conserved energy is:
\[E = \frac{1}{2} \cdot k \cdot ||x_t-x_0||^2 + \frac{1}{2} \cdot k \cdot ||x_t - z_t^{GPS}||^2\]Where k represents the spring constant. The optimization finds the x_t that minimizes E.
4. Localization with Multiple GPS Measurements
Extending the example, multiple GPS measurements (each with its own covariance) are like multiple springs, each with different stiffness and potentially orientation, all pulling on the robot’s position.
5. Localization with GPS and IMU (Odometry)
Adding an IMU/odometry measurement introduces a binary factor in the factor graph:
||x_2 - x_1 - z_{12}^{odom}||^2_{Σ_{12}^{odom}}
/ \
x_1 x_2
| |
||h^{GPS}(x_1) - z_1^{GPS}||^2_{Σ_{1}^{GPS}} ||h^{GPS}(x_2) - z_2^{GPS}||^2_{Σ_{2}^{GPS}}
Where:
z_{12}^{odom}: Odometry measurement (relative motion) betweenx_1andx_2.Σ_{12}^{odom}: Covariance of the odometry measurement.h^{GPS}(x_t): Measurement function mapping the robot’s state to the expected GPS measurement. In the simplest case, this is just the identity function (i.e.,h^{GPS}(x_t) = x_t).
Mechanical Analogy: The odometry acts as a spring connecting consecutive robot poses.
6. Motion Model
Incorporates the robot’s actions (u_t).
State Transition Probability: p(x_t | x_{t-1}, u_t, x_{t-2}, ..., x_0, u_{t-1}, ..., u_1, z_{t-1}, ..., z_1)
Markov Assumption: Simplifies to p(x_t | x_{t-1}, u_t). The current state depends only on the previous state and current action.
Motion Model Equation: x_t = g(x_{t-1}, u_t) + ε_{noise}
g(x_{t-1}, u_t): Predicts the next state. Can be nonlinear.ε_{noise}: Gaussian noise.
Transition Probability (Gaussian):
p(x_t | x_{t-1}, u_t) = N(x_t; g(x_{t-1}, u_t), Σ_g)
Example (Linear): p(x_t | x_{t-1}, u_t) = N(x_t; x_{t-1} + u_t, Σ_g)
7. Combining GPS, IMU, and Motion Model
The factor graph now includes:
- Unary factors: GPS measurements.
- Binary factors: Odometry measurements.
- Binary factors: Motion model.
The overall MAP estimation becomes a larger least-squares problem:
\[x_{1}^{*},x_{2}^{*} = argmin_{x_{1},x_{2}} ||x_1 - z_1^{GPS}||^{2}_{Σ^{GPS}_{1}} + ||x_2 - z_2^{GPS}||^{2}_{Σ^{GPS}_{2}} + ||x_2 + z_{12}^{odom} - x_2||^{2}_{Σ^{odom}_{12}} + ||g(x_1,u_2) - x_2||^{2}_{Σ^{g}_{t}}\]Mechanical Analogy: A network of springs connecting poses to measurements and to each other via odometry and the motion model.
8. Factor Graphs - A Powerful Tool
- Sparse Model: The Markov and conditional independence assumptions result in a sparse model, which is computationally beneficial.
- Visualization: Factor graphs help visualize the probabilistic dependencies.
- Optimization: The MAP problem becomes a (non-linear) least squares optimization.
9. Examples and Practical Considerations
- Motion Sickness: Caused by inconsistencies between sensory inputs (visual vs. IMU).
- Optimizer: The lecture used
pinv(A)*bin MATLAB (linear least-squares). - Covariance: Lower covariance = stiffer spring = more influence on the estimate.
- Inconsistent Measurements: The lecture demonstrates the behavior of the system with conflicting measurements, highlighting the importance of robust norms.
- Mechanical Machine Analogy: The system can be viewed as a mechanical system reaching equilibrium.
- What does human do when his residuals are high? - Get sick, vomit.
10. Detailed Derivation of 2D Absolute Orientation (with Known Correspondences)
This section derives the closed-form solution for finding the optimal rotation and translation between two sets of 2D points, given known correspondences.
Given two point clouds, p (before motion) and q (after motion), with known correspondences (i.e., we know which point in p corresponds to which point in q):
-
Center the Point Clouds: Subtract the centroid (mean) from each point in both point clouds:
\[p_i' = p_i - \tilde{p}, \quad \text{where } \tilde{p} = \frac{1}{N} \sum_i p_i\] \[q_i' = q_i - \tilde{q}, \quad \text{where } \tilde{q} = \frac{1}{N} \sum_i q_i\] -
Optimal Rotation: Find the rotation angle
\[\theta^{\star}=\arg \min _\theta \sum_i\left\|\mathbf{R}_\theta \mathbf{p}_i^{\prime}-\mathbf{q}_i^{\prime}\right\|^2\]θ*that minimizes the sum of squared distances between the rotated pointsp_i'and the corresponding pointsq_i':where
\[\mathbf{R}_\theta = \begin{bmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{bmatrix}\]R_θis the 2D rotation matrix:The derivation proceeds by expanding the squared norm, taking the derivative with respect to
θ, setting it to zero, and simplifying using trigonometric identities. The final result is:\(\theta^{\star}=\arctan \left(\frac{\sum_{\mathbf{p}^{\prime}, \mathbf{q}^{\prime}} p_x^{\prime} q_y^{\prime}-p_y^{\prime} q_x^{\prime}}{\sum_{\mathbf{p}^{\prime}, \mathbf{q}^{\prime}} p_x^{\prime} q_x^{\prime}+p_y^{\prime} q_y^{\prime}}\right)=\arctan \left(\frac{H_{x y}-H_{y x}}{H_{x x}+H_{y y}}\right)\) Where covariance matrix \(H\) is defined as: \(H = \Sigma_i p'_i {q'}_{i}^{T} \\ H_{xx} = p'_x {q'}_{x}^{T} \\ H_{xy} = p'_x {q'}_{y}^{T} \\ H_{yx} = p'_y {q'}_{x}^{T} \\ H_{yy} = p'_y {q'}_{y}^{T} \\\)
-
Optimal Translation: Once the optimal rotation
\[\mathbf{t}^{\star}=\arg \min _{\mathbf{t}}\left\|\mathbf{R}_{\theta \star} \tilde{\mathbf{p}}+\mathbf{t}-\tilde{\mathbf{q}}\right\|^2=\tilde{\mathbf{q}}-\mathbf{R}_{\theta \star} \tilde{\mathbf{p}}\]θ*is found, the optimal translationt*is computed as:
11. Absolute Orientation in SE(3)
The problem is extended to 3D. The solution involves using Singular Value Decomposition (SVD) of the covariance matrix.
\[\mathbf{z}^v=\arg \min _{\mathbf{t}, \mathbf{R}} \sum_i\|\mathbf{R p}+\mathbf{t}-\mathbf{q}\|^2=\arg \min _{\mathbf{t}, \mathbf{R}} \sum_i\left\|\mathbf{R} \mathbf{p}_i^{\prime}-\mathbf{q}_i^{\prime}\right\|^2+\|\mathbf{R} \tilde{\mathbf{p}}+\mathbf{t}-\tilde{\mathbf{q}}\|^2\]Solution steps:
-
Calculate the covariance matrix H: \(\mathbf{H}=\sum_i \mathbf{p}_i^{\prime} \mathbf{q}_i^{\prime \top}\).
-
Compute the SVD of H: \(\mathbf{H}=\mathbf{U S V}^{\top}\).
-
Calculate the optimal rotation matrix (using SVD and potentially checking the determinant to ensure it’s a proper rotation): \(\mathbf{R}^{\star}=\mathbf{V} \mathbf{U}^{\top}\).
-
Calculate optimal translation: \(\mathbf{t}^{\star}=\arg \min _{\mathbf{t}}\left\|\mathbf{R}^{\star} \tilde{\mathbf{p}}+\mathbf{t}-\tilde{\mathbf{q}}\right\|^2=\tilde{\mathbf{q}}-\mathbf{R}^{\star} \tilde{\mathbf{p}}\)
12. Iterative Closest Point (ICP) - Relative Motion from Unknown Correspondences
When correspondences are unknown, we use the Iterative Closest Point (ICP) algorithm:
-
Initialize: Start with an initial guess for the rotation
R*and translationt*. -
Nearest Neighbor Search: For each point
\[c(i)^{\star}=\underset{c(i)}{\arg \min } \sum_i\left\|\mathbf{R}^{\star} \mathbf{p}_i+\mathbf{t}^{\star}-\mathbf{q}_{c(i)}\right\|^2\]p_iin the first point cloud, find the closest pointq_{c(i)}in the second point cloud, based on the current estimate ofR*andt*: -
Absolute Orientation: Solve the absolute orientation problem (as described above) using the found correspondences to update
\[\mathbf{R}^{\star}, \mathbf{t}^{\star},=\underset{\mathbf{R} \in S O(3), \mathbf{t}}{\arg \min } \sum_i\left\|\mathbf{R} \mathbf{p}_i+\mathbf{t}-\mathbf{q}_{c(i)^{\star}}\right\|^2\]R*andt*: -
Iterate: Repeat steps 2 and 3 until convergence (or a maximum number of iterations).
Improvements to ICP:
-
Outlier Rejection: Discard correspondences where the distance between the matched points is above a threshold:
if ||R* p_i + t* - q_{c(i)}||^2 >= Θ then reject correspondence -
Robust Norms: Use a robust norm (e.g., Huber norm) instead of the squared Euclidean norm to reduce the influence of outliers:
\[\mathbf{R}^{\star}, \mathbf{t}^{\star},=\underset{\mathbf{R} \in S O(3), \mathbf{t}}{\arg \min } \sum_i \rho \left\|\mathbf{R} \mathbf{p}_i+\mathbf{t}-\mathbf{q}_{c(i)^{\star}}\right\|^2\]
13. Point-to-Plane vs. Point-to-Point ICP
-
Point-to-Point: Simpler, more versatile, faster, works well with sparse data and unstructured environments. Matches individual points.
-
Point-to-Plane: Better for structured environments with planar surfaces, more accurate, better convergence, less sensitive to noise, but requires denser point clouds and more computation. Matches points to the tangent plane at the corresponding point. The point-to-plane distance is calculated using the normal vector of the plane.
14. Differential Drive Motion Model (SE(2))
This section elaborates on the motion model for a differential drive robot, moving in the SE(2) space (2D position and orientation).
- State:
x_t = [x_t, y_t, θ_t](position and orientation). - Control Input:
u_t = [v_t, ω_t](linear velocityv_tand angular velocityω_t).
Continuous-Time Model (Ideal):
The ideal continuous-time model involves integrating the velocities:
\(\begin{bmatrix} x_t \\ y_t \\ \Theta_t \end{bmatrix} = \begin{bmatrix} x_{t-1} + \int_0^{\Delta t} v_t cos(\Theta_{t-1} + \omega_t t) dt \\ y_{t-1} + \int_0^{\Delta t} v_t sin(\Theta_{t-1} + \omega_t t) dt\\ \Theta_{t-1} + \int_0^{\Delta t} \omega_t dt \end{bmatrix}\) Which solves to (for ω_t ≠ 0): \(x_t = x_{t-1} + \frac{v_t}{\omega_t}(sin (\Theta_{t-1}+\omega_t \Delta t) - sin(\Theta_{t-1}))\)
\[y_t = y_{t-1} + \frac{v_t}{\omega_t}(-cos (\Theta_{t-1}+\omega_t \Delta t) + cos(\Theta_{t-1}))\] \[\Theta_t = \Theta_{t-1} + \omega \Delta t\]Discrete-Time Approximations:
Because the integral is often difficult to compute analytically, approximations are used:
-
“Half Turn, Then Move, Then Half Turn” Approximation: This is a common and relatively accurate approximation for small time steps and angular velocities.
\[\begin{bmatrix} x_t \\ y_t \\ \Theta_t \end{bmatrix} = \begin{bmatrix} x_{t-1} + v_t \Delta t \cos (\Theta_{t-1} + \omega_t \frac{\Delta t}{2}) \\ y_{t-1} + v_t \Delta t \sin (\Theta_{t-1} + \omega_t \frac{\Delta t}{2}) \\ \Theta_{t-1} + \omega_t \Delta t \end{bmatrix}\]The robot first rotates by half the total rotation, then moves in a straight line, and then completes the remaining half of the rotation.
-
“Move then Turn” or “Turn then Move”: Simpler, but less accurate approximations. The “Move then Turn” is used in examples in the lecture.
15. Combining Motion Model and Measurements: An Extended Example
The lecture then presents a more complex example using the following:
- Robot Poses:
x_t = [x_t, y_t, θ_t] - Measurements:
- GPS measurements (unary factors).
- Odometry measurements (binary factors).
- Marker measurements (binary factors).
- Motion Model: “Move then Turn” approximation (binary factor).
The factor graph now has multiple nodes representing the robot’s pose at different time steps, and factors connecting them based on the motion model, odometry, and measurements.
Key Point: The covariance matrices associated with each measurement (GPS, odometry, marker detection) influence the “stiffness” of the corresponding springs and how much each measurement affects the final estimate.
16. The Issue of Ambiguity and Fixing a Reference
When the marker positions are unknown, there’s an inherent ambiguity in the problem. The solution forms a subspace – there are multiple equally valid solutions. This ambiguity can be resolved by:
- Fixing one point: Setting the position of one marker (or the initial robot pose) to a known value.
- Adding a prior: Adding a prior probability distribution on the initial robot pose.
17. Factor Graph and Optimization
- Factor Graph: The complete factor graph now includes factors for:
- GPS Measurements
- Odometry Measurements
- Motion Model
- Marker Measurements
- Optimization: The MAP estimation involves minimizing the sum of squared errors, where each error term corresponds to a factor in the factor graph. This is a nonlinear least-squares problem because the motion model and marker measurements involve trigonometric functions (sine and cosine) of the rotation angle.
18. Understanding the Coordinate Frames
- World Coordinate Frame (WCF): A global, fixed coordinate frame.
-
Robot Coordinate Frame (RCF): A coordinate frame attached to the robot. Its origin is typically at the robot’s center, and its orientation changes as the robot rotates.
- Transformations: We need to be able to transform measurements between the RCF and the WCF. The lecture introduces the notation
w2r(m, x_t)to represent a transformation from the world coordinate frame to the robot coordinate frame, andr2w(z,x_t)to represent a transformation from the robot coordinate frame to world frame. These transformations involve both translation and rotation.
Summary of Key Takeaways
- Probabilistic Framework: Localization and SLAM can be effectively formulated as MAP estimation problems within a probabilistic framework.
- Factor Graphs: Factor graphs are a powerful way to represent the structure of the probabilistic model and the dependencies between variables.
- Mechanical Analogy: The spring analogy provides a very helpful intuition for understanding how different measurements and constraints are fused.
- Nonlinear Optimization: The resulting optimization problem is often nonlinear due to rotations and nonlinear motion models, requiring iterative optimization techniques.
- Coordinate Frames: Careful consideration of coordinate frames (robot vs. world) and transformations between them is essential.
- Approximations: The “move then turn” motion model is a simplification, but it’s often sufficient for small time steps.
- Ambiguity: When landmark positions are unknown, there’s an ambiguity that needs to be resolved by fixing a reference point or adding a prior.
- ICP: ICP is a powerful tool for aligning point clouds even without known correspondences.
- Point-to-Point vs Point-to-Plane: The choice depends on the environment and sensor data characteristics.