rsqdelta Compute R-square change and F-statistics in regression rsqdelta

# SAS Macro Programs: rsqdelta

\$Version:
Michael Friendly
York University

## The rsqdelta macro ( get rsqdelta.sas)

### Compute R-square change and F-statistics in regression

In a regression analysis, the rsqdelta macro computes the change in R-square and the associated F statistics and p-values as variables are added to the model.

### Method

PROC REG will provide the partial R-square as variables are added to the model, if you specify the SCORR1 option on the MODEL statement. However, PROC REG does not include the partial F statistics and p-values and none of this is provided in an output data set.

PROC REG includes the R-square statistic in the OUTEST dataset when the ADJRSQ option is specified on the MODEL statement. In the OUTSTAT= data set, PROC GLM includes the sequential sums of squares (SS1) that will provide the F statistic and associated p-value. However, PROC GLM does not include the R-square statistic in an output dataset.

By combining the information from PROC REG's OUTEST dataset and PROC GLM's OUTSTAT dataset and doing some DATA step programming, we will have a program that will compute the change in R-square as well as the F statistics and p-values as variables are added to a model. The F statistics and p-values in the final table represent a partial F statistic for the general linear model testing approach which is defined as follows:

```              [ SSE(R) - SSE(F) ] / [ df(R) - df(F) ]
F(C) =  --------------------------------------
SSE(F) / df(F)
```
where SSE() is the Error Sum of Squares and df() is the error degrees of freedom for the full (F) or reduced (R) model. The rejection region is defined as
```       F(C)  > F(alpha, df(R)-df(F), df(F))
```
Note the F(C) statistic is a different than an overall F test which tests whether or not there is a regression relationship between the dependent variable and the set of independent variables. Most regression textbooks provide a discussion of tests about the regression coefficients.

## Usage

rsqdelta is a macro program. Values must be supplied for the YVAR= and XVAR= parameters.

The arguments may be listed within parentheses in any order, separated by commas. For example:

```   %rsqdelta(data=inputdataset, yvar=response, xvar=independentvars ..., )
```

### Parameters

Default values are shown after the name of each parameter.
DATA=_LAST_
The name of the input dataset. If not specified, the most recently created dataset is used.
YVAR=
The name of the response (dependent) variable.
XVAR=
A list of the independent variables. List the names of your independent varaibles in the order in which you want them included in the model. Variable list abbreviations (e.g., X1-X10 or FIRST--LAST) are NOT allowed.

### Example

```%include macros(rsqdelta);        *-- or include in an autocall library;
%include data(fitness);

%rsqdelta(data=fitness,
yvar=oxy, xvar=runtime age runpulse maxpulse weight);
```

#### Output

The example creates the following output:
```        Change in R-square & F statistics as variables are added

OBS    _MODEL_      _RSQ_     RSQDELTA          F      PROB

1     INTERCEP    0.00000      .         2451.73    0.00000
2     RUNTIME     0.74338     0.74338      84.01    0.00000
3     AGE         0.76425     0.02087       2.48    0.12666
4     RUNPULSE    0.81109     0.04685       6.70    0.01537
5     MAXPULSE    0.83682     0.02572       4.10    0.05330
6     WEIGHT      0.84800     0.01118       1.84    0.18714
```