andrews Andrews function plots for multivariate data andrews

SAS Macro Programs: andrews

$Version: 1.1-0 (30 Sep 2011 09:31:49)
Michael Friendly
York University


The andrews macro ( [download] get andrews.sas)

Andrews function plots for multivariate data

The ANDREWS macro calculates a periodic function, z(t), composed of sin and cosine components to represent each observation in a multivariate sample. The macro plots this function vs. t, from -pi to +pi.

Two function types may be calculate and plotted. The original version, from Andrews (1972) defines

  z(t) = Y1/sqrt(2) + Y2 sin(t) + Y3 cos(t) + Y4 sin(2t) + Y5 cos (2t) + ...

A modified version, suggested by Khattree and Naik (2001) defines

  z(t) = [Y1 + Y2 (sin(t)+cos(t)) + Y3 (sin(t)-cos(t)) +
          Y4 (sin(2t)+cos(2t)) + Y5 (sin(2t)-cos(2t)) + ...]/sqrt(2)

and appears to separate distinct observations better than the original formulation.

These plots assume that the variables are measured on the same scale. If not, it is usually worthwhile to standardize them first, using either PROC STANDARD M=0 S=1, or scaling each variable to a [0,1] range with PROC STDIZE METHOD=RANGE.

Usage

The ANDREWS macro is defined with keyword parameters. The arguments may be listed within parentheses in any order, separated by commas. For example:

  %andrews(var=x1-x8, id=name);

Parameters

DATA=

Name of input data set [Default: DATA=_LAST_]

VAR=

List of variables to plot. You can list variables individually, or use any of the SAS shorthand notations, such as VAR=X-X101 or VAR=INCOME--STATUS1. [Default: VAR=_NUMERIC_]

ID=

Name of an ID variable (character or numeric), used in the legend for the curves, or as a curve label [Default: ID=_N_]

TYPE=

Type of function: ORIGinal or MODified [Default: TYPE=ORIG]

NUMPTS=

Number of function points (-1) calculated for each observation, on the range -pi to pi. [Default: NUMPTS=80]

ANNO=

Name of an optional input annotate data set.

OUT=

Name of the output data set. The output data set contains the variables Z, T, and ID, and (NUMPTS+1)*nobs observations. [Default: OUT=ANDREWS]

PLOT=

Draw the plot? [Default: PLOT=Y]

VAXIS=

Custom vertical axis statement, e.g., VAXIS=AXIS1, where you have defined AXIS1 before calling the macro. If not specified, the macro uses

                  axis98 offset=(2) label=(angle=90 rotate=0);
HAXIS=

Custom horizontal axis statement, e.g., HAXIS=AXIS2. If not specified, the macro uses

                 axis99 order=(-3.1416 to 3.1416 by 1.5708)
                    value=(font=greek '-p' '-p/2' '0' 'p/2' 'p')
                    offset=(2);
SYMBOLS=

Plotting symbols for the observations. [Default: SYMBOLS=NONE]

COLORS=

List of colors to be used for the observations. [Default: COLORS=BLACK RED GREEN BLUE BROWN YELLOW ORANGE PURPLE]

LINES=

List of line styles for observations. [Default: LINES=1 20 41 21 7 14 33 12 5]

LEGEND=

Name of a LEGEND statement, to alter the default placement or characteristics of the legend for observations.

IDLOC=

If non-blank, the legend is suppressed, and the curves are labeled using the ID= variable to the right of the last point plotted for each observation.

YLABEL=

Y-axis label. [Default: YLABEL=z(t)]

XLABEL=

X-axis label. [Default: XLABEL=t]

NAME=

Name of graphics catalog entry. [Default: NAME=ANDREWS]

GOUT=

Name of graphics catalog. [Default: GOUT=GSEG]

References

Andrews, D. F. (1972). Plots of high dimensional data. Biometrics, 28, 125-136.

Khattree, R. and Naik, D. N. (2001). Andrews plots for multivariate data: Some new suggestions and applications. J. Stat. Planning and Inference, 100(2), 411-425.

Example

%include macros(andrews);        *-- or include in an autocall library;
data test;
  array vars{*} x1-x10;
  do obs = 1 to 16;
  do i=1 to dim(vars);
    if i=1
      then vars{i} = normal(141423);
      else vars{i} = sum(vars{1}-vars{i-1}) + normal(141423);
    end;
    output;
  end;
run;
%andrews(data=test, var=x1-x10, idloc=last, ylabel=Andrews Function);

See also

biplot Generalized biplot of observations and variables
faces Faces display for multivariate data
scale Rescale variables to a given range
scatmat Scatterplot matrix
stars Star plot for multivariate data
sunplot Sunflower plot for X-Y data