andrews | Andrews function plots for multivariate data | andrews |
The ANDREWS macro calculates a periodic function, z(t), composed of sin and cosine components to represent each observation in a multivariate sample. The macro plots this function vs. t, from -pi to +pi.
Two function types may be calculate and plotted. The original version, from Andrews (1972) defines
z(t) = Y1/sqrt(2) + Y2 sin(t) + Y3 cos(t) + Y4 sin(2t) + Y5 cos (2t) + ...
A modified version, suggested by Khattree and Naik (2001) defines
z(t) = [Y1 + Y2 (sin(t)+cos(t)) + Y3 (sin(t)-cos(t)) + Y4 (sin(2t)+cos(2t)) + Y5 (sin(2t)-cos(2t)) + ...]/sqrt(2)
and appears to separate distinct observations better than the original formulation.
These plots assume that the variables are measured on the same scale.
If not, it is usually worthwhile to standardize them first, using
either PROC STANDARD M=0 S
=1, or scaling each variable to a [0,1]
range with PROC STDIZE METHOD=RANGE.
The ANDREWS macro is defined with keyword parameters. The arguments may be listed within parentheses in any order, separated by commas. For example:
%andrews(var=x1-x8, id=name);
Name of input data set [Default: DATA=_LAST_
]
List of variables to plot. You can list variables
individually, or use any of the SAS shorthand notations,
such as VAR=X-X101
or VAR=INCOME--STATUS1
.
[Default: VAR=_NUMERIC_
]
Name of an ID variable (character or numeric), used in the
legend for the curves, or as a curve label [Default: ID=_N_
]
Type of function: ORIGinal or MODified [Default: TYPE=ORIG
]
Number of function points (-1) calculated for each
observation, on the range -pi to pi. [Default: NUMPTS=80
]
Name of an optional input annotate data set.
Name of the output data set. The output data set
contains the variables Z, T, and ID, and (NUMPTS+1)*nobs
observations. [Default: OUT=ANDREWS
]
Draw the plot? [Default: PLOT=Y
]
Custom vertical axis statement, e.g., VAXIS=AXIS1
,
where you have defined AXIS1 before calling the macro.
If not specified, the macro uses
axis98 offset=(2) label=(angle=90 rotate=0);
Custom horizontal axis statement, e.g., HAXIS=AXIS2.
If not specified, the macro uses
axis99 order=(-3.1416 to 3.1416 by 1.5708) value=(font=greek '-p' '-p/2' '0' 'p/2' 'p') offset=(2);
Plotting symbols for the observations. [Default: SYMBOLS=NONE
]
List of colors to be used for the observations.
[Default: COLORS=BLACK RED GREEN BLUE BROWN YELLOW ORANGE PURPLE
]
List of line styles for observations.
[Default: LINES=1 20 41 21 7 14 33 12 5
]
Name of a LEGEND statement, to alter the default placement or characteristics of the legend for observations.
If non-blank, the legend is suppressed, and the curves are
labeled using the ID=
variable to the right of the last
point plotted for each observation.
Y-axis label. [Default: YLABEL=z(t)]
X-axis label. [Default: XLABEL=t]
Name of graphics catalog entry. [Default: NAME=ANDREWS
]
Name of graphics catalog. [Default: GOUT=GSEG
]
Andrews, D. F. (1972). Plots of high dimensional data. Biometrics, 28, 125-136.
Khattree, R. and Naik, D. N. (2001). Andrews plots for multivariate data: Some new suggestions and applications. J. Stat. Planning and Inference, 100(2), 411-425.
%include macros(andrews); *-- or include in an autocall library; data test; array vars{*} x1-x10; do obs = 1 to 16; do i=1 to dim(vars); if i=1 then vars{i} = normal(141423); else vars{i} = sum(vars{1}-vars{i-1}) + normal(141423); end; output; end; run; %andrews(data=test, var=x1-x10, idloc=last, ylabel=Andrews Function);