Johansen Cointegration Test Explained for Algorithmic Trading Strategies
Johansen Cointegration Test Explained for Algorithmic Trading Strategies use the Johansen test. To understand this test, let’s generalize Equation 2.1 to the case where the price variable y(t) are actually vectors representing and EWC are cointegrating with 95 percent certainty.
Multiple price series, and the coefficients λ and α are actually matrices. (Because I do not think it is practical to allow for a constant drift in the price of a stationary portfolio, we will assume βt = 0 for simplicity.) Using English and Greek capital letters to represent vectors and matrices respectively, we can rewrite Equation 2.1 as;
ΔY(t) = ΛY(t − 1) + M + A1ΔY(t − 1) + … + Ak ΔY(t − k) + t (2.7)
Just as in the univariate case, if Λ = 0, we do not have cointegration. (Recall that if the next move of Y doesn’t depend on the current price level, there can be no mean reversion.) Let’s denote the rank (remember this quaint linear algebraic term?) of Λ as r, and the number of price series n. The number of independent portfolios that can be formed by various linear combinations of the cointegrating price series is equal to r. The Johansen test will calculate r for us in two different ways, both based on eigenvector decomposition of Λ. One test produces the so-called trace statistic, and other produces the eigen statistic. (A good exposition can be found in Sorensen, 2005.) We need not worry what they are exactly, since the jplv7 package will provide critical values for each statistic to allow us to test whether we can reject the null hypotheses that r = 0 (no cointegrating relationship), r ≤ 1, …, up to r ≤ n – 1. If all these hypotheses are rejected, then clearly we have r = n. As a useful by-product, the eigenvectors found can be used as our hedge ratios for the individual price series to form a stationary portfolio. We show how to run this test on the EWA-EWC pair in Example 2.7, where we find that the Johansen test confirms the CADF test’s conclusion that this pair is cointegrating. But, more interestingly, we add another ETF to the mix: IGE, an ETF consisting of natural resource stocks. We will see how many cointegrating relations can be found from these three price series. We also use the eigenvectors to form a stationary portfolio, and find out its half-life for mean reversion.
Example 2.7: Using the Johansen Test for Cointegration
We take the EWA and EWC price series that we used in Example 2.6 and apply the Johansen test to them. There are three inputs to the Johansen function of the jplv7 package: y, p, and k. y is the input matrix, with each column vector representing one price series. As in the ADF and CADF tests, we set p = 0 to allow the Equation 2.7 to have a constant off set (M ≠ 0), but not a constant drift term (β = 0). The input k is the number of lags, which we again set to 1. (This code fragment is part of cointegrationTests.m.)
% Combine the two time series into a matrix y2 for input
% into Johansen test y2=[y, x]; results=johansen(y2, 0, 1); % Print out results prt(results);
% Output:
Johansen MLE estimates NULL: Trace Statistic Crit 90% Crit 95% Crit 99% r <= 0 variable 1 19.983 13.429 15.494 19.935 r <= 1 variable 2 3.983 2.705 3.841 6.635
NULL: Eigen Statistic Crit 90% Crit 95% Crit 99% r <= 0 variable 1 16.000 12.297 14.264 18.520 r <= 1 variable 2 3.983 2.705 3.841 6.635
We see that for the Trace Statistic test, the hypothesis r = 0 is rejected at the 99% level, and r ≤ 1 is rejected at the 95 percent level. The Eigen Statistic test concludes that hypothesis r = 0 is rejected at the 95 percent level, and r ≤ 1 is rejected at the 95 percent as well. This means that from both tests, we conclude that there are two cointegrating relationships between EWA and EWC.
What does it mean to have two cointegrating relations when we have only two price series? Isn’t there just one hedge ratio that will allocate capital between EWA and EWC to form a stationary portfolio? Actually, no. Remember when we discussed the CADF test, we pointed out that it is order dependent. If we switched the role of the EWA from the independent to dependent variable, we may get a different conclusion. Similarly, when we use EWA as the dependent variable in a regression against EWC, we will get a different hedge ratio than when we use EWA as the independent variable. These two different hedge ratios, which are not necessarily reciprocal of each other, allow us to form two independent stationary portfolios. With the Johansen test, we do not need to run the regression two times to get those portfolios: Running it once will generate all the independent cointegrating relations that exist. The Johansen test, in other words, is independent of the order of the price series.
Now let us introduce another ETF to the portfolio: IGE, which consists of natural resource stocks. Assuming that its price series is contained in an array z, we will run the Johansen test on all three price series to find out how many cointegrating relationships we can get out of this trio.
y3=[y2, z]; results=johansen(y3, 0, 1); % Print out results prt(results);
% Output:
% Johansen MLE estimates
% NULL: Trace Statistic Crit 90% Crit 95% Crit 99%
% r <= 0 variable 1 34.429 27.067 29.796 35.463
% r <= 1 variable 2 17.532 13.429 15.494 19.935 % r <= 2 variable 3 4.471 2.705 3.841 6.635
%
% NULL: Eigen Statistic Crit 90% Crit 95% Crit 99%
% r <= 0 variable 1 16.897 18.893 21.131 25.865
% r <= 1 variable 2 13.061 12.297 14.264 18.520
% r <= 2 variable 3 4.471 2.705 3.841 6.635
Both Trace statistic and Eigen statistic tests conclude that we should have three cointegrating relations with 95 percent certainty.
The eigenvalues and eigenvectors are contained in the arrays results.eig and results.evec, respectively.
results.eig % Display the eigenvalues % ans =
%
% 0.0112
% 0.0087
% 0.0030 results.evec % Display the eigenvectors % ans =
%
% -1.0460 -0.5797 -0.2647
% 0.7600 -0.1120 -0.0790
% 0.2233 0.5316 0.0952
Notice that the eigenvectors (represented as column vectors in results.evec) are ordered in decreasing order of their corresponding eigenvalues. So we should expect the first cointegrating relation to be the “strongest”; that is, have the shortest half-life for mean reversion. Naturally, we pick this eigenvector to form our stationary portfolio (the eigenvector determines the shares of each ETF), and we can find its half-life by the same method as before when we were dealing with a stationary price series. The only difference is that we now have to compute the T × 1 array yport, which represents the net market value (price) of the portfolio, which is equal to the number of shares of each ETF multiplied by the share price of each ETF, then summed over all ETFs. yport takes the role of y in Example 2.4.
yport=smartsum(repmat(results.evec(:, 1)’, [size(y3, 1) … 1]).*y3, 2);
% Find value of lambda and thus the half-life of mean
% reversion by linear regression fit ylag=lag(yport, 1); % lag is a function in the jplv7 % (spatial-econometrics.com) package. deltaY=yport-ylag; deltaY(1)=[]; % Regression functions cannot handle the NaN % in the first bar of the time series.
Read Also; How the Hurst Exponent and Variance Ratio Test Reveal Market Behavior