Connecting Covariance and Rotational Inertia

September 2024

The covariance matrix in statistics represents the covariances between a set of random variables. It has a number of interesting properties, such as:

It is symmetric (equal to its transpose).
Elements along its major diagonal are variances. The variance of a random variable is, loosely speaking, the average of a squared quantity.
Its other elements are covariances between two different variables, which are, loosely speaking, an average product of two variables centered on their expectations.

The inertia tensor in classical mechanics represents the “rotational mass”, or the difficulty of rotating an object around an axis. It also has a number of interesting properties, such as:

It is symmetric (equal to its transpose).
Elements along its major diagonal are, loosely speaking, the average of a squared quantity. Namely, squared distances from an axis weighted by mass.
Its other elements are, loosely speaking, an average product of two variables. Namely, the negative product of two coordinates, weighted by mass.

🤔

Is there a connection here? Or is this similar form just a coincidence?

There are many ways to ask this question. Today, I'll put it this way: is the inertia tensor also the covariance matrix of some set of random variables? And if so, what are those random variables?

Definitions

Let's start by defining the covariance matrix between $n$ random variables, which we will label $Y_1,\dots,Y_n$. The elements of the covariance matrix are given by

$$\mathrm{Cov}_{i,j} = \mathrm{E}\left[(Y_i-\mathrm{E}[Y_i])(Y_j-\mathrm{E}[Y_j])\right] = \mathrm{Cov}[Y_i,Y_j],$$

where $\mathrm{E}$ represents the expected value of a random variable and $\mathrm{Cov}$ represents the covariance of two random variables. In the special case along the major diagonal, this definition becomes $$\mathrm{Cov}_{i,i} = \mathrm{E}\left[(Y_i-\mathrm{E}[Y_i])^2\right] = \mathrm{Var}[Y_i],$$ where $\mathrm{Var}$ represents the variance of a random variable.

Now let's look at the definition of the inertia tensor $I$ of an object around a given point. Intuitively, the inertia tensor represents a kind of “rotational mass”: how difficult it is to rotate something. For example, $I_{yy}$ (also called $I_{22}$) measures the difficulty of rotating an object about the $y$-axis. The off-diagonal elements have a related meaning: for example $I_{xy}$ (also called $I_{12}$) represents the fact that when you attempt to rotate an object around the $x$-axis, it may acquire angular momentum around the $y$-axis as well, depending on its shape.[] As you might expect, the inertia tensor is proportional to mass: heavier things are harder to rotate.

The elements of the inertia tensor of a continuous solid object are given by $$I_{i,j} = \int_V \delta_{ij}|\mathbf{r}|^2 - x_ix_j \,dm,$$ where

The indices $i,j$ represent the $x,y,z$ coordinates for $i,j=1,2,3$ respectively.
The integral here is over the entire volume $V$ of the object.
$dm = \rho(x,y,z)\,dx\,dy\,dz $ is an infinitesimal mass element depending on the density distribution $\rho$ of the object.
$\mathbf{r}$ is the vector from the point around which we are computing $I$ to the current integration point, so $|\mathbf{r}|^2$ is the squared distance from the center point of rotation to the current point within the object.
$\delta_{ij}$ is the Kronecker delta, indicating that we only include the $|\mathbf{r}|^2$ term for elements along the major diagonal, where $i=j$.

Since we are integrating over the volume of the object, this is actually a triple integral. This is reflected in the definition of $dm$.

We can get rid of the Kronecker delta by expressing the on-diagonal and off-diagonal elements separately. Noting that $|\mathbf{r}|^2=x^2+y^2+z^2=x_1^2+x_2^2+x_3^2$, we find $$I_{i,i} = \int_V x_1^2+x_2^2+x_3^2-x_i^2\,dm \\ I_{i,j} = \int_V -x_ix_j\,dm \qquad (i\neq j)$$ We have a bit of a fundamental problem here: the on-diagonal and off-diagonal elements of the inertia tensor are defined differently, which is not the case for the covariance matrix. However, at least in some cases, we'll be able to get around this.

Random Sampling by Mass

In order to compare the inertia tensor and the covariance matrix, it will help to reframe the inertia tensor in terms of random variables. Currently, we are integrating over the volume of an object with respect to a mass element $dm = \rho(x_1,x_2,x_3)\,dx_1\,dx_2\,dx_3$. This is strongly reminiscent of the expected value of a random variable.

Let's imagine a random process in which we randomly sample a point from an object weighted by density, so denser regions are more likely to get picked. For uniform density, this becomes uniformly sampling from an object. Then our probability density function is a literal density function $\rho$ divided by the total mass $M$ of the object! After sampling a point, we get three random variables for the coordinates $X_1,X_2,X_3$ of that point.

Now, we can compute expected values in terms of these random variables. Consider, for example, the variance of $X_2+X_3$. This is equal to the variance of $X_2$ plus the variance of $X_3$, which we can express in integral form as

$$\mathrm{Var}[X_2+X_3] = \int_V \left((x_2 - E[X_2])^2 + (x_3-E[X_3])^2\right)\frac{\rho}{M}\,dx_1\,dx_2\,dx_3.$$

To simplify things, let's place the origin of our coordinate system at the center of mass of our object. Then our expected value for each coordinate is zero: $E[X_1]=E[X_2]=E[X_3]=0$. We now have $$\mathrm{Var}[X_2+X_3] = \frac{1}{M}\int_V x_2^2 + x_3^2\,dm,$$ where $dm=\rho\,dx_1\,dx_2\,dx_3$.

Recall that

$$I_{11} = \int_V |\mathbf{r}|^2-x_1^2\,dm = \int_V x_1^2+x_2^2+x_3^2-x_1^2\,dm = \int_V x_2^2+x_3^2\,dm.$$

From the previous two lines, $\mathrm{Var}[X_2+X_3]$ is exactly equal to $I_{11}$ divided by $M$! And without loss of generality, we also have $\frac{1}{M}I_{22}=\mathrm{Var}[X_1+X_3]$ and $\frac{1}{M}I_{33}=\mathrm{Var}[X_1+X_2]$.

Since the major diagonals of the covariance matrix represent variances, we might be tempted to say $\mathrm{Cov}(Y_1,Y_2,Y_3) = \frac{1}{M}I$ where $\mathrm{Cov}(Y_1,Y_2,Y_3)$ is the covariance matrix of $Y_1,Y_2,Y_3$. Then if we define $Y_1=X_2+X_3$, $Y_2=X_1+X_3$, and $Y_3=X_1+X_2$, the elements along the major diagonal are correct. Let's call these $Y_1,Y_2,Y_3$ our desired random variables, because they are physically interpretable in terms of our random coordinates $X_1,X_2,X_3$ and they also connect the covariance matrix to the inertia tensor. Physically, these represent all three distinct sums of two coordinates of a randomly sampled 3D point.

There's a problem though: if we define our random variables $Y_1,Y_2,Y_3$ in this way, are the off-diagonal elements of the covariance matrix also correct?

The Off-Diagonal Elements

With our desired random variables, the diagonal terms of the covariance matrix matched the inertia tensor! Now for the off-diagonal elements, we hope to find

$$\mathrm{Cov}(Y_1,Y_2,Y_3)_{i,j} = \frac{1}{M}I_{i,j} \implies \mathrm{E}\left[(Y_i-E[Y_i])(Y_j-E[Y_j])\right] = \frac{1}{M}\int_V -x_ix_j \,dm.$$

Since we chose to place our origin at the center of mass of the object, we have the expectation of all $X_i$ equal to $0$. And since each $Y_i$ is a sum of $X_i$ terms, their expectations will also be zero. Then the above simplifies to $$\mathrm{E}[Y_iY_j] = \frac{1}{M}\int_V -x_ix_j\,dm.$$ Let's convert the integral on the right into an expectation.

$$\frac{1}{M}\int_V -x_ix_j\,dm = \int_V -x_ix_j \cdot \frac{\rho(x_1,x_2,x_3)}{M}\,dx_1\,dx_2\,dx_3 = \mathrm{E}[-X_iX_j].$$

So, with the definitions previously chosen for each $Y_i$, we are forced to conclude $\mathrm{E}[Y_iY_j] = \mathrm{E}[-X_iX_j]$! This is not true in general.

A cylinder, symmetric across the $xy$, $xz$, and $yz$ planes. Image source: Grendelkhan at the English Wikipedia, CC BY-SA 3.0.

There is a way around this. First, let's force $\mathrm{E}[X_iX_j] = 0$. This isn't an unreasonable assumption. In fact, this assumption is true for symmetric shapes.

Consider an object that is symmetric across the $x$ and $y$ axes. Then $\mathrm{E}[X_1X_3] = \mathrm{E}[X_2X_3] = 0$ because for every $(x,z)$ pair, there is an equally probable $(x,-z)$ pair; and for every $(y,z)$ pair, there is an equally probable $(y,-z)$ pair. We can make $\mathrm{E}[X_1X_2] = 0$ by adding another coordinate plane of symmetry.

Our next modification will be to sample three independent points rather than just one. Then $Y_1$ is the sum of $y$ and $z$ coordinates of the first point, $Y_2$ is the sum of $x$ and $z$ coordinates for the second point, and so on. This keeps the variances of all $Y_i$ the same while making them all independent, so their covariances become zero.

However, for asymmetric shapes, the assumption of zeros off the diagonals is violated in general. So it looks like our desired random variables won't always work for asymmetric shapes.

So far, we have discovered:

Take an object that is symmetric across at least two of the $xy$, $yz$, and $xz$ planes.
Randomly, independently, and identically sample three points in the object, with sampling weighted by density.
Let $X_{1i},X_{2i},X_{3i}$ be random variables representing the $x$, $y$, and $z$ coordinates of each randomly sampled point number $i$.
Define new random variables $Y_1=X_{21}+X_{31}$, $Y_2=X_{12}+X_{32}$, and $Y_3=X_{13}+X_{23}$. That is, for the $i$th point, let $Y_i$ be the sum of the two coordinates excluding the $i$th.
The covariance matrix of $Y_1,Y_2,Y_3$ will be equal to $\frac{1}{M}I$, where $I$ is the inertia tensor of the object around its center of mass. All off-diagonal elements of $I$ and the covariance matrix will be zero.

Asymmetric Objects

These definitions for each $Y_i$ don't work for asymmetric objects. But can we find definitions that do work? That is, given an asymmetric object of mass $M$ and its inertia tensor $I$, can we define random variables $Y_1,Y_2,Y_3$ such that the covariance matrix of $\mathbf{Y}=(Y_1,Y_2,Y_3)$ is equal to $\frac{1}{M}I$?

First of all, the cop-out, direct answer is yes. If you have a set of random variables, their means, and their covariance matrix, you can quite easily sample values for those random variables. Many computing packages have this implemented. So given an inertia tensor $I$, we can divide it by $M$ and then use the result as a covariance matrix to sample random variables with any means we like.

But this is no fun. What we really want is a physical interpretation: which three interpretable physical random variables have the inertia tensor (up to a scaling factor) as their covariance matrix?

Using Eigenvectors

There's a way we can interpret the inertia tensor of any shape, even asymmetric ones, as a covariance matrix. This is done by choosing a different coordinate system in which that the inertia tensor is diagonal: that is, all off-diagonal elements are zero. The good news is that this is always possible, for any shape, even asymmetric ones![]

How do we find such a coordinate system? By taking the eigenvectors of the inertia tensor. In a physical context, the eigenvectors of the inertia tensor are also called the principal axes of the object.

The three principal axes of a tennis racket. Note that the racket is not symmetric across the $\hat{e_2}$-$\hat{e_3}$ plane! Image source: CMG Lee, Wikimedia, CC BY-SA 4.0.

If we take the inertia tensor around the center of mass of the object, using the principal axes as our three coordinates, we will get a tensor $I$ whose off-diagonal elements are zero. Then we can find three independent random variables whose covariance matrix is $\frac{1}{M}I$. We can even use the independent random variables from our symmetric object procedure, because the covariances were already zero and the variances are still fine.

Now we've found a solution that works for any shape! That is, given any object, we can choose a coordinate system such that the inertia tensor of the object divided by its mass is equal to the covariance matrix of three random variables. In particular, these three random variables should be easy to obtain by randomly sampling three points within the object by density, and using the coordinates of those randomly sampled points.

To make things fully clear, let's go through a process by which we can make the inertia tensor of any object proportional to the covariance matrix of three easy-to-understand random variables.

Take any object.
Set coordinates $x_1,x_2,x_3$ to align with the principal axes of the object. You can find these by taking the eigenvectors of the object's inertia tensor in any coordinate system. Center the coordinate system on the object's center of mass.
Randomly, independently, and identically sample three points in the object, with sampling weighted by density.
Let $X_{1i},X_{2i},X_{3i}$ be random variables representing the $x_1$, $x_2$, and $x_3$ coordinates of each randomly sampled point number $i$.
Define new random variables $Y_1=X_{21}+X_{31}$, $Y_2=X_{12}+X_{32}$, and $Y_3=X_{13}+X_{23}$. That is, for the $i$th point, let $Y_i$ be the sum of the two coordinates excluding the $i$th.
The covariance matrix of $Y_1,Y_2,Y_3$ will be equal to $\frac{1}{M}I$, where $I$ is the inertia tensor of the object around its center of mass. All off-diagonal elements of $I$ and the covariance matrix will be zero.

Although this is a lot of mathematical steps, there is nothing particularly convoluted here: all the steps, such as choosing an appropriate coordinate system for the object, sampling random points, and using sums of coordinates to generate random variables, feel fairly natural. So we've finally found a covariance matrix equal to $\frac{1}{M}I$ for any object!

Decorrelation Transform

Although this technically works for any object, it's a shame we had to force the off-diagonal elements to be zero. In general, covariance matrices don't have all zeros off the diagonals. But there's good news: just as we used principal axes to transform the inertia tensor and remove its diagonals, we can use a decorrelation tranform to remove the off-diagonal terms of the covariance matrix.

And there's even better news: the decorrelation transform consists of changing into the basis of the eigenvectors of the correlation matrix, just as we found a new coordinate system using the eigenvectors (principal axes) of the inertia tensor.[] And the reason we know we can use eigenvectors to remove off-diagonal terms (diagonalize the matrix) is that the inertia tensor and covariance matrix are both symmetric.[] This goes back to one of the similarities we observed at the beginning!

So while both inertia tensors and covariance matrices have nonzero elements off the diagonals in general, we can apply analogous transformations to both and remove those elements. Then we can find that with our desired random variables, the two are equal! (Although we might have to scale the random variables.)

Conclusion

Although they come from different fields, the inertia tensor and the covariance matrix have some superficially similar properties. Is there an interpretation in which the two are essentially the same?

We tried one interpretation by randomly sampling a point within an object, and using the coordinates of the random point to generate random variables. The only restriction on our coordinate system was that we set the origin at the object's center of mass. But we found that the covariance matrix of these random variables had different values from the inertia tensor at those elements, so this was unreliable for objects that are asymmetric across the coordinate planes. We were forced to set the off-diagonal elements to zero (as with a symmetric object) and sample three independent points in order for the equality to hold.

Then we found a solution that works for any shape. By choosing the principal axes of the object as our coordinate system, we get all the off-diagonal elements equal to zero, which matches the working symmetric case. If we also set the object's center of mass as our origin, the same random sampling procedure as the symmetric case gives us a covariance matrix equal to $\frac{1}{M}I$, where $M$ is the object's mass and $I$ is its inertia tensor!

Just as we set the off-diagonal elements of the inertia tensor to zero using principal axes, we can do the same to a covariance matrix using eigenvectors. In fact, the principal axes are the eigenvectors of the inertia tensor, so these two transformations are intimately related. In the covariance case, this transformation is known as the decorrelation transform.

Even if we choose the wrong object or the wrong coordinate system, there is still a cop-out way to get the inertia tensor to match the covariance matrix. We can choose any means we like for our three random variables, and enforce the covariance matrix to be $\frac{1}{M}I$. Although these random variables aren't physically interpretable, it technically fits the bill of three random variables with a covariance matrix equal to $\frac{1}{M}I$ for any object!

Although the connection between the inertia tensor and the covariance matrix might seem suggestive at first, it actually required a lot of steps to get there: using physics, statistics, and even linear algebra. In the end, we found a connection that works for any shape, showing the power of picking the right coordinate system and transforming mathematical objects in the right way. I hope you enjoyed discovering this interesting curiosity of classical mechanics and statistics!

References

3D Rigid Body Dynamics: The Inertia Tensor (J. Peraire, S. Widnall, MIT OCW, 2008) ^
What's the physical significance of the off-diagonal element in the matrix of moment of inertia (safkan, Physics Stack Exchange, 2015) ^
Decorrelating and then Whitening data (Rosalind W. Picard, MIT, 2010) ^
Diagonalizing Symmetric Matrices (UC Davis) ^