PCA Correlation matrix and covariance matrix

Part 4: The Covariance and Correlation Matrix
by Mark Lawrence

It’s going to come up that we have a lot of different investments to consider. When building a portfolio, the first thing you want to know is how the potential investments correlate with each other. We’ve already seen that if a pair of stocks, say Berkshire Hathaway A and Berkshire Hathaway B, have perfect correlation, then there’s no point in owning both. We’ve also seen that if a pair of stocks have a correlation of -1 then their risk cancels out, so this may be a particularly interesting pair of stocks to own.

It will prove central to portfolio theory to know the covariance of proposed investments with each other. Everyone knows that to be safe your investment portfolio should be “diversified.” What does diversified mean? We can see now that diversified means you have a bunch of investments that don’t correlate with each other very well.

How do you choose among possible different investments? You could just throw twenty darts at a stock market page and buy the same dollar value of the twenty stocks that you hit. Actually, this would not be a bad choice. However, we’re going to learn that there is an optimum choice of which stocks to buy and how much of your money should go into each stock. To figure out this optimum mix, we’re going to need to know the covariance of every investment with every other investment. We will keep track of these many covariances in a large table of numbers, called a covariance matrix.

Curiously, we’re going to find that there is one optimum portfolio which works for everyone. The mix and proportion of investments in this optimum portfolio depend only on the 91 day T-Bill rate. Of course, nearly everyone wants to make some choices, like more money in energy and technology, less money in large cap stocks. We’re going to learn that if you make these choices you will be taking on more risk with less likelihood of reward than if you chose the standard portfolio. You may be able to judge how to beat the optimum portfolio in the short run, but in the long run history assures us that very few people can consistently guess correctly about market segments or market timing.

Suppose we are considering a large number of stocks. We’ll call their price histories Sj(i). The variable j runs over each of our stocks, and the variable i runs over the number of days of history we have for each. We already know how to calculate the correlation of any pair of stocks. Suppose we pick a particular pair, Sa and Sb. The covariance and correlation of these two stocks is:

Covariance( Sa, Sb ) = ? (Sa(i) – Aa) * (Sb(i) – Ab) / n.

Correlation( Sa, Sb ) = ? (Sa(i) – Aa) * (Sb(i) – Ab) / (?a * ?a * n).

Correlation( Sa, Sb ) = Covariance( Sa, Sb ) / (?a * ?a).

Here just as Sj is the price history of the jth stock, Aj is the average price of the jth stock and ?j is the standard deviation of the jth stock.

We want to know the covariance of every stock with every other stock. To keep track of this we build a table of numbers, perhaps you would prefer to think of it as a spread sheet. If we are working with j stocks, the table will have j rows and j columns. In row a column b we’ll put the covariance of stock a and stock b. Right away we notice that this table is symmetric: the covariance of stock a and stock b is exactly the same as the covariance of stock b and stock a.

This table of numbers which has the covariance of stock a with stock b in row a column b, and the variance of stock a in row a column a, is called the Covariance Matrix. In mathematics a table of numbers is often called a Matrix. It’s just a name, nothing to be afraid of. The Covariance Matrix has a formula, it’s:

CovarianceMatrix(a,b) = Covariance( stocka, stockb )
CovarianceMatrix(a,b) = ? (Sa(i) – Aa) * (Sb(i) – Ab) / n.

We will also be interested in the correlation of every stock with every other stock. The correlation matrix is just a table of numbers with j rows and j columns. In row a column b we’ll put the correlation of stock a and stock b. The correlation matrix is also symmetric, as the correlation of stock a with stock b is the same as the correlation of stock b with stock a. The correlation of any stock with itself is 1, so there will be a series of 1s running down the diagonal of the correlation table. The Correlation Matrix has a formula, it’s:

CorrelationMatrix(a,b) = Correlation( stocka, stockb )
CorrelationMatrix(a,b) = ? (Sa(i) – Aa) * (Sb(i) – Ab) / (?a * ?a * n).
CorrelationMatrix(a,b) = Covariance( Sa, Sb ) / (?a * ?a).

If you are comfortable with vectors, then you’ll notice that the standard deviation is the “length” of the “difference vector.” The covariance is the “dot product” of the two “difference vectors.” The correlation is the “dot product” of the unit versions of the “difference vectors.”

Related