ellipses Plot bivariate data ellipses ellipses

SAS Macro Programs: ellipses

$Version: 2.4 (25 Oct 2006 08:48:13)
Michael Friendly
York University


The ellipses macro ( [download] get ellipses.sas)

Plot bivariate data ellipses

The ELLIPSES macro plots a bivariate scatterplot with a bivariate data ellipse for one or more groups. This macro was renamed from contour.sas

Usage

The ELLIPSES macro is defined with keyword parameters. The X= and Y= variables are required. The arguments may be listed within parentheses in any order, separated by commas. For example:

  %ellipses(data=auto, x=price, y=weight);

Parameters

DATA=
The name of the input data set [Default: DATA=_LAST_]
X=
Name of the X variable
Y=
Name of the Y variable
Z=
Name of a Z variable (for G3D)
VAR=
Two (or three) variable names x y (z), separated by spaces. Instead of specifying X= and Y= separately, you can specify the names of the variables with the VAR= parameter.
WHERE=
WHERE clause to select observations
WEIGHT=
Numeric weight for observations. Together with the OUTLIER or ROBCOV macro, the WEIGHT= variable can be used to produce robust data ellipses.
GROUP=
Name of a Group variable (optional). If a GROUP= variable is specified, one ellipse is produced for each value of this variable in the data set. If no GROUP= variable is specified, a single ellipse is drawn for the entire sample. The GROUP= variable may be character or numeric.
CLASS=
Synonym for GROUP=
GPFMT=
Name of a Format for group labels.
PVALUE=
Confidence coefficient (1-alpha). This is the proportion of data from a bivariate normal distribution contained within the ellipse. Several values may be specified in a list (e.g., PVALUE=.5 .9), in which case one ellipse is generated for each value. [Default: PVALUE=0.68]
LINE=
Line style(s) for ellipse [Default: LINE=5]
WIDTH=
Line width(s) for ellipse [Default: WIDTH=1]
ANNOADD=
Additional annotations added to the plot. [Default: ANNOADD=MEAN GPLABEL]
INANNO=
Additional (input) annotations data set
STD=
Error bar metric: STD or STDERR. STD=STDERR gives error bars equal to each mean +- one standard error for both variables. STD=STD gives error bars whose length is one standard deviation for both variables. [Default: STD=STDERR]
POINTS=
Number of points on each ellipse [Default: POINTS=40]
ALL=
Include an ellipse for total sample? Specifies whether the ellipse for the total sample should be drawn in addition to those for each group. If there is no GROUP= variable, ALL=YES just draws the ellipse twice. [Default: ALL=NO]
OUT=
The name of the output Annotate data set used to draw the ellipses, error bars and group labels. [Default: OUT=ELLIPSES]
PLOT=
Plot the results? If PLOT=YES, the macro plots the data together with the generated ellipses. Otherwise, only the output Annotate data set is generated.[Default: PLOT=YES]
HAXIS=
Name of an AXIS statement for the horizontal axis. By default, the plot range of the X= variable is defined by the data, so the ellipses may be clipped and generate warnings in the PROC GPLOT step. To avoid this, define an AXIS statement that defines a suitable range and specify that with HAXIS=.
VAXIS=
Name of an AXIS statement for the vertical axis. See description for HAXIS=.
I=
SYMBOL statement interpolate option. Use I=RL to include the regression line, I=RQ to draw a quadratic, etc. [Default: I=NONE]
INTERP=
(synonym for I=)
COLORS=
List of colors for each of the groups. If there are g groups, specify g colors if ALL=NO, and g + 1 colors if ALL=YES. The colors specified are recycled as needed. [Default: COLORS=RED BLUE GREEN BLACK PURPLE BROWN ORANGE YELLOW ]
SYMBOLS=
List of symbols, separated by spaces, used for plotting points in each of the groups. Recycled as needed. [Default: SYMBOLS=DOT SQUARE CIRCLE + STAR - PLUS : $ =]
HSYM=
Height of plot symbols [Default: HSYM=1.2]
HTEXT=
Height of text in the plot [Default: HTEXT=1.5]
NAME=
The name of the graph in the graphic catalog [Default: NAME=ELLIPSES]
GOUT=
The name of the graphics catalog [Default: GOUT=GSEG]

Examples

 %include macros(ellipses);        *-- or include in an autocall library;
 %include data(auto) ;
 %ellipses(data=auto,
          x=price, y=weight, group=origin,
          pvalue=.5);
Robust data ellipses may be obtained by first using the ROBUST macro, then feeding the output dataset to ELLIPSES, specifying WEIGHT=WEIGHT.
%include data(hawkins);
%robcov(data=hawkins, var=x1-x3 y, id=case, outc=robcov, out=hawkinsr);

axis1 order=(-5 to 11 by 2);
axis2 order=(-3 to 13 by 2);
title 'Standard data ellipse';
%ellipses(data=hawkinsr, y=y, x=x1, vaxis=axis1, haxis=axis2);
%gskip;

title 'Robust data ellipse';
%ellipses(data=hawkinsr, y=y, x=x1, weight=weight, vaxis=axis1, haxis=axis2);

See also

lowess Locally weighted scatterplot smoother
outlier Robust multivariate outlier detection
resline Resistant line for bivariate data
robcov Calculate robust covariance matrix via MCD or MVE
scatmat Scatterplot matrix
sunplot Sunflower plot for X-Y data