resline Fit a resistant line to bivariate data resline

SAS Macro Programs: resline

$Version: 2.3 (15 May 2006)
Michael Friendly
York University



resline macro ( [download] get resline.sas)

Fit a resistant line to bivariate data

The resline macro fits a resistant line to bivariate data (one whose slope and intercept are unaffected by outliers), using well-chosen summary points, and finds power transformations of both variables which make a curvilinear relation more nearly linear. The current version produces only printer plots.

Method

The resistant line technique follows Tukey (1977), Exploratory Data Analysis and Velleman and Hoaglin (1981), ABCs of Exploratory Data Analysis.

The data points are divided into thirds, based on the sorted values of the X= variable. The median X and Y value within each third define "summary points" which are used to calculate robust estimates of slope and intercept.

Power transformations are found by calculating a "ratio of slopes" table, transforming the X and Y coordinates of summary points to all powers in the list [ -1.0, -0.5, log, sqrt, raw, 2.0], and forming the ratio of the slopes of the lines connecting the first pair of summary points and the second pair of summary points. The optimal transformation is the one whose slope ratio is closest to 1 (or whose log is closest to zero).

The resline macro requires that all values for the X and Y variables are positive. If any data values are negative, the recommended solution is to add a constant to all values to make them positive.

Parameters

DATA=_LAST_
Name of the input data set.
X=
The name of the independent variable.
Y=
The name of the response variable.
ID=
The name of a character variable to identify each observation, used to label points in the output.
ENDS=.5
The greatest range of either end-third.
PLOT=FIT RESID,
Keywords to request one or more printer plots to show.
FIT requests a plot of observed and fitted values vs. X for the raw data. RESID requests a plot of residuals vs X.
OUT=_FIT_
The name of an output data set containing fitted values (FIT) and residuals (RESIDUAL), in addition to the X=, Y=, and ID= variables for all non-missing observations. Note that the ID variable is named ID
OUTSUM=_SUMVAL_
The name of an output data set containing median summary values for the thirds (THIRD) of the data.

Missing data

Observations with missing values on either the X= or Y= variable are removed from the dataset.

Example

The following example examines the relation between infant mortality rate (IMR) and per-capita income in the NATIONS dataset.
%include data(nations);
*include macros(resline);    *-- included in autocall library;

%resline(data=nations, x=income, y=imr, id=nation);
The printed output includes the following:
Warning:         4 row(s) with missing data have been removed.

        Summary Values

              X        Y      n   

Low     101.000  131.150     34  
Mid     426.000   51.700     51  
High   3574.500   14.850     16 R

('R' -> half-range rule;  '=' -> equal X value rule)

      Parameters of fitted resistant line
           slope  intercept

       -0.033482  111.67558
plus tables of fitted values and residuals, and plots. In addition, the following table indicates that a log transformation of IMR comes closest to having a linear relationship to (raw) INCOME.
     ----- Ratio of Slopes table ------
     Rows are powers of X, columns are powers of Y

         -1.0     -0.5      log     sqrt      raw      2.0

-1.0    2.163    1.921    1.708    1.521    1.356    1.081
-0.5    1.898    1.685    1.499    1.334    1.189    0.948
log     1.663    1.477    1.314    1.169    1.042    0.831
sqrt    1.457    1.294    1.151    1.024    0.913    0.728
raw     1.275    1.132    1.007    0.896    0.799    0.637
 2.0    0.975    0.866    0.770    0.685    0.611    0.487

     ------- 5 Best powers -------
      Power of X Power of Y   Slope Ratio   log Ratio

     raw        log                 1.007       0.003
     sqrt       sqrt                1.024       0.010
      2.0       -1.0                0.975      -0.011
     log        raw                 1.042       0.018
     -0.5        2.0                0.948      -0.023

See also

boxcox Power transformations by Box-Cox method
lowess Locally weighted scatterplot smoother