Scripts example - Multiple regression using matrix arithmetic
This example shows the matrix formulation of multiple regression and illustrates how to calculate it using MedCalc script.
In general, the matrix formulation of multiple regression is $ Y = Xb + e $ where Y is a vector containing the dependent data, X is a matrix containing the independent x data, and e is a vector containing the residuals.
In the example, we will use the following sample data:
MedCalc script:
<data> varx1 varx2 vary 2.3 2.0 11.2 5.1 2.1 18.8 6.9 4.3 31.6 9.0 5.8 4.2 12.0 3.2 5.1 18.3 6.2 1.1 21.6 8.0 1.6 </data>
Dependent variable vector Y
Y is a (n,1) matrix containing the y values of the dependent variable:
$$ Y=\begin{bmatrix}
y_1 \\
y_2 \\
\vdots \\
y_{n} \end{bmatrix}$$
MedCalc script:
y={vary}; // Creates a matrix y from the data variable vary
Augmented matrix X
X is a (n,k+1) matrix containing a column of 1's and columns of the various x variables:
$$ X=\begin{bmatrix}1 & x_{11}&x_{12}&\vdots&x_{1k}\\
1 & x_{21}&x_{22}&\vdots&x_{2k}\\
1 & \vdots&\vdots&\vdots&\vdots\\
1 & x_{n1}&x_{n2}&\vdots&x_{nk}\\
\end{bmatrix} $$
MedCalc script:
x={varx1,varx2}; // Creates a matrix x from the data variables varx1 and varx2 // counts n = nrow(x); k = ncol(x); // add a column with 1s for the constant // gives the augmented matrix xa c=ones(n,1); xa=cbind(c,x);
Vector of regression coefficients b
b is a (k+1,1) matrix containing the regression coefficients
$$ b = \begin{bmatrix}b_0 \\
b_1 \\
\vdots \\
b_{k} \end{bmatrix} $$
Vector of residuals e
e is a (n,1) matrix containing the residuals
$$ e=\begin{bmatrix}e_1 \\
e_2 \\
\vdots \\
e_{n} \end{bmatrix}$$
Computing b
$$ b = (X^{'}X)^{-1}X^{'}Y $$MedCalc script:
xt=transpose(xa); xtx=xt*xa; xtxI=inv(xtx); b=xtxI*xt*y;
Computing predicted values
The (n,1) matrix $ \hat{Y} $ contains the predicted values:
$$ \hat{Y} = Xb $$MedCalc script:
ypred=xa*b;
Computing the residuals e
$$ e = Y - \hat{Y} $$MedCalc script:
e=y-ypred;
Computing the Coefficient of determination R2
$$ R^2 = y^{'}Xb(y^{'}y)^{-1} $$using
$$ y = Y - \bar{Y} $$MedCalc script:
ya=y-avg(y); yat=transpose(ya); r=yat*xa*b*inv(yat*y);
Total Sum of Squares
$$ SStotal = y^{'}y $$MedCalc script:
sstotal=yat*y;
Residual Sums of Squares
$$ SSresid = e^{'}e $$MedCalc script:
et=transpose(e); ssresid=et*e;
Regression Sum of Squares
$$ SSregres = SStotal - SSresid $$MedCalc script:
ssregres=sstotal-ssresid;
Covariance and Standard Errors of regression coefficients
$$ C = \frac{SSresid}{n-k-1} (X^{'}X)^{-1} $$The square roots of the diagonals of C are the standard errors of the regression coefficients.
MedCalc script:
C=ssresid/(n-k-1); C=C*inv(transpose(xa)*xa); C=sqrt(C); se=diag(C);
Calculate T values and F value
MedCalc script:
T=b&/se; // elementwise division pT=TDIST(T,n-k-1); F=(ssregres/k)/(ssresid/(n-k-1)); pF=FDIST(f,k,n-k-1);
Test residuals for normal distribution
MedCalc script:
pR=testnorm(shapwilk,e);
Create report
MedCalc script:
print "Coefficient of determination R² : ",r:4; println; println; // print "Coefficients, Standard errors, T, P "; println; print "constant : ",b[1]:4:8,se[1]:4:8,T[1]:4:8,pT[1]:4:8;println; print "b1 : ",b[2]:4:8,se[2]:4:8,T[2]:4:8,pT[2]:4:8;println; print "b2 : ",b[3]:4:8,se[3]:4:8,T[3]:4:8,pT[3]:4:8;println; println; // print "Analysis of variance"; println; print "SStotal : ",sstotal:4:8; println; print "SSregres : ",ssregres:4:8; println; print "SSresid : ",ssresid:4:8; println; println; // print "F-ratio : ",F:4:8; println; print "P : ",pF:4:8; println; println; // print "Test residuals for normal distribution P = ",pR:4; println;
The script output
This script creates the following output:
Coefficient of determination R² : 0.4069262 Coefficients, Standard errors, T, P constant : 19.6395 9.6496 2.0353 0.1115 b1 : -1.2889 1.2139 -1.0618 0.3482 b2 : 1.0459 3.7869 0.2762 0.7961 Analysis of variance SStotal : 751.0086 SSregres : 305.6051 SSresid : 445.4035 F-ratio : 1.3723 P : 0.3517 Test residuals for normal distribution P = 0.4909
The complete script
// regional settings $decimalsymbol="."; $listseparator=","; // cls; clear data; clear mem; <data> varx1 varx2 vary 2.3 2.0 11.2 5.1 2.1 18.8 6.9 4.3 31.6 9.0 5.8 4.2 12.0 3.2 5.1 18.3 6.2 1.1 21.6 8.0 1.6 </data> // create matrices from data y={vary}; x={varx1,varx2}; // counts n = nrow(x); k = ncol(x); // add a column with 1s for the constant // gives the augmented matrix xa c=ones(n,1); xa=cbind(c,x); // b = (X’X)^-1 X’ Y xt=transpose(xa); xtx=xt*xa; xtxI=inv(xtx); // calculate coefficients b b=xtxI*xt*y; // calculate predicted values ypred=xa*b; // calculate residuals e=y-ypred; //Coefficient of determination R² ya=y-avg(y); yat=transpose(ya); r=yat*xa*b*inv(yat*y); // Total sum of squares sstotal=yat*y; // Residual sum of squares; et=transpose(e); ssresid=et*e; // Regression sum of squares ssregres=sstotal-ssresid; // Covariance and // Standard Errors of regression coefficients C=ssresid/(n-k-1); C=C*inv(transpose(xa)*xa); C=sqrt(C); se=diag(C); // T-values and F-value; T=b&/se; // elementwise division pT=TDIST(T,n-k-1); F=(ssregres/k)/(ssresid/(n-k-1)); pF=FDIST(f,k,n-k-1); // Test residuals for normal distribution pR=testnorm(shapwilk,e); // report print "Coefficient of determination R² : ",r:4; println; println; // print "Coefficients, Standard errors, T, P "; println; print "constant : ",b[1]:4:8,se[1]:4:8,T[1]:4:8,pT[1]:4:8;println; print "b1 : ",b[2]:4:8,se[2]:4:8,T[2]:4:8,pT[2]:4:8;println; print "b2 : ",b[3]:4:8,se[3]:4:8,T[3]:4:8,pT[3]:4:8;println; println; // print "Analysis of variance"; println; print "SStotal : ",sstotal:4:8; println; print "SSregres : ",ssregres:4:8; println; print "SSresid : ",ssresid:4:8; println; println; // print "F-ratio : ",F:4:8; println; print "P : ",pF:4:8; println; println; // print "Test residuals for normal distribution P = ",pR:4; println;