|  boxglm | 
Power transformations by Box-Cox method for GLMs | 
 boxglm | 
SAS Macro Programs: boxglm
$Version: 1.1 (10 Dec 1991)
Michael Friendly
York University
 
Power transformations by Box-Cox method for GLMs
The boxglm macro finds power transformations of the response variable in a
general linear model by the Box-Cox method, with graphic display of the
maximum likelihood solution (RMSE plot), F-values for model effects (EFFECT
plot), and the
influence of observations on choice of power (INFL plot).  The program produces
printer plots by default, and can optionally produce
high-resolution versions of any of these plots.
      
Method
The program uses transforms the response to all powers from
the LOPOWER= value to the HIPOWER= value, and fits a linear
model using PROC GLM for each, extracting values to an output dataset from
which the plots are drawn.
The influence plot also implements a score test for the power transformation
due to Atkinson, which provides an alternative estimate of the
power transformation. based on power = 1 - slope of the fitted line
in the partial regression plot for a constructed variable.
Usage 
boxglm is a macro program.  Values must be supplied for the
RESP= and MODEL= parameters. 
The arguments may be listed within parentheses in any order, separated
by commas. For example:
   %boxglm(resp=responsevariable, model=predictors, ..., )
Parameters
- RESP=           
 - The name of the response variable for
                    analysis.
 - MODEL=          
 - The independent variables in the
                    model, i.e., the terms on the right side
                    of the = sign in the MODEL statement for PROC
                    GLM. The MODEL= argument should be spelled out
						  completely, rather than using abbreviated 'bar'
						  notation.
 - CLASS=          
 - Specifies the MODEL= variables which are
classification factors rather than continuous variables.
 - DATA=_LAST_     
 - The name of the data set holding the
                    response and predictor variables. (Default:
                    most recently created)
 - ID=             
 - The name of an ID variable for observations
 - OUT=_DATA_      
 - The name of an output dataset to contain
                    the transformed response.  This dataset
                    contains all original variables, with the
                    transformed response replacing the original
                    variable.
 - OUTPLOT=_PLOT_  
 - The name of the output data set containing
                    _RMSE_, and t-values for each effect in the
                    model, with one observation for each power
                    value tried.
PPLOT=RMSE EFFECT INFL
                    
- Which printer plots should be produced?
                    One or more of RMSE, EFFECT, and INFL, or NONE.
  - GPLOT=NONE      
 - Which high-resolution (PROC GPLOT) plots
                    should be produced?  One or more of RMSE,
                    EFFECT, and INFL, or NONE.
 - LOPOWER=-2      
 - low value for power
 - HIPOWER=2       
 - high value for power
 - NPOWER=21       
 - number of power values in the interval
                    LOPOWER to HIPOWER
 - CONF=.95        
 - confidence coefficient for the confidence
                    interval for the power.
 
Example
The example finds power transformations for the SURVIVAL variable
in a dataset relating survival time to
type of poison and antidote used.
data poisons;
   input antidote $ poison @;
   label survival='Survival time';
   length id $4;
   do rep = 1 to 4;
      id = trim(antidote)||trim(put(poison,1.))||'-'||put(rep,1.);
      input survival @;
      output;
      end;
cards;
A 1  .31  .45  .46  .43
A 2  .36  .29  .40  .23
A 3  .22  .21  .18  .23
B 1  .82 1.10  .88  .72
B 2  .92  .61  .49 1.24
B 3  .30  .37  .38  .29
C 1  .43  .45  .63  .76
C 2  .44  .35  .31  .40
C 3  .23  .25  .24  .22
D 1  .45  .71  .66  .62
D 2  .56 1.02  .71  .38
D 3  .30  .36  .31  .33
;
*include macros(boxglm);
%boxglm(data=poisons,
        resp=Survival,
        model=antidote poison,
        class=antidote poison,
		  id=id,
        gplot=RMSE EFFECT INFL,
        npower=17, conf=.99);
The plot of RMSE vs. lambda (power) indicates power = -1 / sqrt(SURVIVAL)
as the maximum likelihood estimate, but power = -1 / SURVIVAL == survival
rate is within the confidence interval.
The EFFECT plot indicates that the significance of partial F-tests are unaffected by the choice of power.
The influence plot indicates that a few observations have a large leverage,
but none is influential in determining the choice of power.
See also
boxcox Power transformations by Box-Cox method
boxtid Power transformations by Box-Tidwell method
outlier Robust multivariate outlier detection
resline Resistant line for bivariate data
stars Diagnostic plots for transformations to symmetry