MedCalc

Scripts example - Multiple regression using matrix arithmetic

This example shows the matrix formulation of multiple regression and illustrates how to calculate it using MedCalc script.

In general, the matrix formulation of multiple regression is $Y = Xb + e$ where Y is a vector containing the dependent data, X is a matrix containing the independent x data, and e is a vector containing the residuals.

In the example, we will use the following sample data:

MedCalc script:

<data>
varx1  varx2  vary
2.3    2.0    11.2
5.1    2.1    18.8
6.9    4.3    31.6
9.0    5.8    4.2
12.0   3.2    5.1
18.3   6.2    1.1
21.6   8.0    1.6
</data>


Dependent variable vector Y

Y is a (n,1) matrix containing the y values of the dependent variable:

$$Y=\begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_{n} \end{bmatrix}$$

MedCalc script:

y={vary}; // Creates a matrix y from the data variable vary


Augmented matrix X

X is a (n,k+1) matrix containing a column of 1's and columns of the various x variables:

$$X=\begin{bmatrix} 1 & x_{11}&x_{12}&\vdots&x_{1k}\\ 1 & x_{21}&x_{22}&\vdots&x_{2k}\\ 1 & \vdots&\vdots&\vdots&\vdots\\ 1 & x_{n1}&x_{n2}&\vdots&x_{nk}\\ \end{bmatrix}$$

MedCalc script:

x={varx1,varx2}; // Creates a matrix x from the data variables varx1 and varx2
// counts
n = nrow(x);
k = ncol(x);
// add a column with 1s for the constant
// gives the augmented matrix xa
c=ones(n,1);
xa=cbind(c,x);


Vector of regression coefficients b

b is a (k+1,1) matrix containing the regression coefficients

$$b = \begin{bmatrix} b_0 \\ b_1 \\ \vdots \\ b_{k} \end{bmatrix}$$

Vector of residuals e

e is a (n,1) matrix containing the residuals

$$e=\begin{bmatrix} e_1 \\ e_2 \\ \vdots \\ e_{n} \end{bmatrix}$$

Computing b

$$b = (X^{'}X)^{-1}X^{'}Y$$

MedCalc script:

xt=transpose(xa);
xtx=xt*xa;
xtxI=inv(xtx);
b=xtxI*xt*y;


Computing predicted values

The (n,1) matrix $\hat{Y}$ contains the predicted values:

$$\hat{Y} = Xb$$

MedCalc script:

ypred=xa*b;


Computing the residuals e

$$e = Y - \hat{Y}$$

MedCalc script:

e=y-ypred;


Computing the Coefficient of determation R2

$$R^2 = y^{'}Xb(y^{'}y)^{-1}$$

using

$$y = Y - \bar{Y}$$

MedCalc script:

ya=y-avg(y);
yat=transpose(ya);
r=yat*xa*b*inv(yat*y);


Total Sum of Squares

$$SStotal = y^{'}y$$

MedCalc script:

sstotal=yat*y;


Residual Sums of Squares

$$SSresid = e^{'}e$$

MedCalc script:

et=transpose(e);
ssresid=et*e;


Regression Sum of Squares

$$SSregres = SStotal - SSresid$$

MedCalc script:

ssregres=sstotal-ssresid;


Covariance and Standard Errors of regression coefficients

$$C = \frac{SSresid}{n-k-1} (X^{'}X)^{-1}$$

The square roots of the diagonals of C are the standard errors of the regression coefficients.

MedCalc script:

C=ssresid/(n-k-1);
C=C*inv(transpose(xa)*xa);
C=sqrt(C);
se=diag(C);


Calculate T values and F value

MedCalc script:

T=b&/se; // elementwise division
pT=TDIST(T,n-k-1);
F=(ssregres/k)/(ssresid/(n-k-1));
pF=FDIST(f,k,n-k-1);


Test residuals for normal distribution

MedCalc script:

pR=testnorm(shapwilk,e);


Create report

MedCalc script:

print "Coefficient of determation R² : ",r:4; println;
println;
//
print "Coefficients, Standard errors, T, P "; println;
print "constant : ",b[1]:4:8,se[1]:4:8,T[1]:4:8,pT[1]:4:8;println;
print "b1       : ",b[2]:4:8,se[2]:4:8,T[2]:4:8,pT[2]:4:8;println;
print "b2       : ",b[3]:4:8,se[3]:4:8,T[3]:4:8,pT[3]:4:8;println;
println;
//
print "Analysis of variance"; println;
print "SStotal   : ",sstotal:4:8; println;
print "SSregres  : ",ssregres:4:8; println;
print "SSresid   : ",ssresid:4:8; println;
println;
//
print "F-ratio   : ",F:4:8; println;
print "P         : ",pF:4:8; println;
println;
//
print "Test residuals for normal distribution P = ",pR:4; println;


The script output

This script creates the following output:

Coefficient of determation R² : 0.4069262

Coefficients, Standard errors, T, P
constant :  19.6395  9.6496  2.0353  0.1115
b1       :  -1.2889  1.2139 -1.0618  0.3482
b2       :   1.0459  3.7869  0.2762  0.7961

Analysis of variance
SStotal   : 751.0086
SSregres  : 305.6051
SSresid   : 445.4035

F-ratio   :   1.3723
P         :   0.3517

Test residuals for normal distribution P = 0.4909


The complete script

// regional settings
$decimalsymbol=".";$listseparator=",";
//
cls;
clear data;
clear mem;
<data>
varx1 varx2   vary
2.3   2.0     11.2
5.1   2.1     18.8
6.9   4.3     31.6
9.0   5.8     4.2
12.0  3.2     5.1
18.3  6.2     1.1
21.6  8.0    1.6
</data>
// create matrices from data
y={vary};
x={varx1,varx2};
// counts
n = nrow(x);
k = ncol(x);
// add a column with 1s for the constant
// gives the augmented matrix xa
c=ones(n,1);
xa=cbind(c,x);
// b = (X’X)^-1 X’ Y
xt=transpose(xa);
xtx=xt*xa;
xtxI=inv(xtx);
// calculate coefficients b
b=xtxI*xt*y;
// calculate predicted values
ypred=xa*b;
// calculate residuals
e=y-ypred;
//Coefficient of determation R²
ya=y-avg(y);
yat=transpose(ya);
r=yat*xa*b*inv(yat*y);
// Total sum of squares
sstotal=yat*y;
// Residual sum of squares;
et=transpose(e);
ssresid=et*e;
// Regression sum of squares
ssregres=sstotal-ssresid;
// Covariance and
// Standard Errors of regression coefficients
C=ssresid/(n-k-1);
C=C*inv(transpose(xa)*xa);
C=sqrt(C);
se=diag(C);
// T-values and F-value;
T=b&/se; // elementwise division
pT=TDIST(T,n-k-1);
F=(ssregres/k)/(ssresid/(n-k-1));
pF=FDIST(f,k,n-k-1);
// Test residuals for normal distribution
pR=testnorm(shapwilk,e);
// report
print "Coefficient of determation R² : ",r:4; println;
println;
//
print "Coefficients, Standard errors, T, P "; println;
print "constant : ",b[1]:4:8,se[1]:4:8,T[1]:4:8,pT[1]:4:8;println;
print "b1       : ",b[2]:4:8,se[2]:4:8,T[2]:4:8,pT[2]:4:8;println;
print "b2       : ",b[3]:4:8,se[3]:4:8,T[3]:4:8,pT[3]:4:8;println;
println;
//
print "Analysis of variance"; println;
print "SStotal   : ",sstotal:4:8; println;
print "SSregres  : ",ssregres:4:8; println;
print "SSresid   : ",ssresid:4:8; println;
println;
//
print "F-ratio   : ",F:4:8; println;
print "P         : ",pF:4:8; println;
println;
//
print "Test residuals for normal distribution P = ",pR:4; println;