rsqdelta Compute R-square change and F-statistics in regression rsqdelta

SAS Macro Programs: rsqdelta

$Version:
Michael Friendly
York University



The rsqdelta macro ( [download] get rsqdelta.sas)

Compute R-square change and F-statistics in regression

In a regression analysis, the rsqdelta macro computes the change in R-square and the associated F statistics and p-values as variables are added to the model.

Method

PROC REG will provide the partial R-square as variables are added to the model, if you specify the SCORR1 option on the MODEL statement. However, PROC REG does not include the partial F statistics and p-values and none of this is provided in an output data set.

PROC REG includes the R-square statistic in the OUTEST dataset when the ADJRSQ option is specified on the MODEL statement. In the OUTSTAT= data set, PROC GLM includes the sequential sums of squares (SS1) that will provide the F statistic and associated p-value. However, PROC GLM does not include the R-square statistic in an output dataset.

By combining the information from PROC REG's OUTEST dataset and PROC GLM's OUTSTAT dataset and doing some DATA step programming, we will have a program that will compute the change in R-square as well as the F statistics and p-values as variables are added to a model. The F statistics and p-values in the final table represent a partial F statistic for the general linear model testing approach which is defined as follows:

              [ SSE(R) - SSE(F) ] / [ df(R) - df(F) ]
      F(C) =  --------------------------------------
                              SSE(F) / df(F)
where SSE() is the Error Sum of Squares and df() is the error degrees of freedom for the full (F) or reduced (R) model. The rejection region is defined as
       F(C)  > F(alpha, df(R)-df(F), df(F))
Note the F(C) statistic is a different than an overall F test which tests whether or not there is a regression relationship between the dependent variable and the set of independent variables. Most regression textbooks provide a discussion of tests about the regression coefficients.

Usage

rsqdelta is a macro program. Values must be supplied for the YVAR= and XVAR= parameters.

The arguments may be listed within parentheses in any order, separated by commas. For example:

   %rsqdelta(data=inputdataset, yvar=response, xvar=independentvars ..., )

Parameters

Default values are shown after the name of each parameter.
DATA=_LAST_
The name of the input dataset. If not specified, the most recently created dataset is used.
YVAR=
The name of the response (dependent) variable.
XVAR=
A list of the independent variables. List the names of your independent varaibles in the order in which you want them included in the model. Variable list abbreviations (e.g., X1-X10 or FIRST--LAST) are NOT allowed.

Example

%include macros(rsqdelta);        *-- or include in an autocall library;
%include data(fitness);

%rsqdelta(data=fitness, 
    yvar=oxy, xvar=runtime age runpulse maxpulse weight);

Output

The example creates the following output:
        Change in R-square & F statistics as variables are added

     OBS    _MODEL_      _RSQ_     RSQDELTA          F      PROB

      1     INTERCEP    0.00000      .         2451.73    0.00000
      2     RUNTIME     0.74338     0.74338      84.01    0.00000
      3     AGE         0.76425     0.02087       2.48    0.12666
      4     RUNPULSE    0.81109     0.04685       6.70    0.01537
      5     MAXPULSE    0.83682     0.02572       4.10    0.05330
      6     WEIGHT      0.84800     0.01118       1.84    0.18714

See also

cpplot Plots of Mallow's C(p) and related statistics for model selection
dummy Construct dummy variables for regression models
resline Resistant line for bivariate data

Author

SAS Institute