gtree Draw a tree dendrogram from PROC CLUSTER/VARLCUS output gtree

SAS Macro Programs: gtree

$Version: 1.5 (09 May 2006)
Michael Friendly
York University



The gtree macro ( [download] get gtree.sas)

Draw a tree dendrogram from PROC CLUSTER/VARLCUS output

The gtree macro is applied to the OUTTREE= dataset produced by PROC CLUSTER (to cluster observations) or PROC VARCLUS (to cluster variables). It uses PROC GPLOT to draw the tree dendrogram of the clustering solution. Provisions have been made for: Note: The graphic tree diagram produced by the GTREE macro can often be obtained more simply using PROC TREE:
proc cluster data=... outtree=tree;
   id name;
   var x1-x10;
proc tree graphics data=tree;

Method

PROC TREE does not produce an output dataset suitable for drawing a high=resolution graphic dendrogram. Howeverm, using a method described by Buckner and Lotz, SAS SUGI, 1988, 1363-1368, the printed output from PROC TREE is captured to a file (using PROC PRINTTO), then read in and parsed to extract the parent - child information required to draw the tree.

The gtree macro macro allows item labels up to 16 characters in length. However, one limitation of the method used is that the first 8 characters of the item labels MUST be unique after removing blanks and '-', and must constitute a valid SAS name.

Usage

First, use PROC CLUSTER or PROC VARCLUS to perform the cluster analysis, obtaining an output dataset with the OUTTREE= option. You must specify an ID variable identifying the name of the observation or variable. For example:
proc cluster noprint method=average outtree=tree1;
   id idname;
Then, invoke the gtree macro. Values must be supplied ...

The arguments may be listed within parentheses in any order, separated by commas. For example:

   %gtree(tree=inputdataset, out=outputdataset, ..., )

Parameters

Default values are shown after the name of each parameter.
TREE=_LAST_
The name of the OUTTREE= data set from PROC CLUSTER or VARCLUS
OUT=OUT
The name of output data set
HEIGHT=
The name of a variable in the TREE= dataset indicating height in tree
METRIC=DIS
Metric for the tree: Should be either SIM (SIMilarity) or DIS (DISsimilarity)
LABEL=HEIGHT
label for similarity/dissimilarity axis
FONT=
font for item labels. If no FONT= is specified, the default SAS/GRAPH font (specified in a GOPTIONS statement) is used.
HLABEL=
Height for item labels. If HLABEL= is not specified, the program calculates a height based on the number of items.
ORIENT=V
orientation of the tree diagram: H (horizontal) or V (vertical)
CTREE='BLACK'
color for tree. Should be the name of a SAS/GRAPH color (in quotes), or the name of a variable in the TREE= dataset.
CITEM='BLACK'
color for item labels: quoted color or variable name
TRIMLO=
Specify ignore values of height less than this
TRIMHI=
ignore height values greater than this
SYM=NONE
plotting symbol for cluster joins
PRINT=NO
Printed output: NO means no printed output is produced; YES means the output from PROC TREE is printed; ALL prints information about node placement and the OUT= data set in addition.
NAME=GTREE
Name for the graphic output in the graphics catalog.

Example

This example uses the Average Linkage method to cluster some US cities in terms of intercity distances.
%include macros(gtree);        *-- or include in an autocall library;
data   mileages (type=distance);
input  (atlanta chicago denver houston losangel
       miami newyork sanfran seattle washdc city cityname)
       (10*5. @54 $4. @61 $15.);
CARDS;
    0                                                ATL    Atlanta
  587    0                                           CHI    Chicago
 1212  920    0                                      DEN    Denver
  701  940  879    0                                 HOU    Houston
 1936 1745  831 1374    0                            LA     Los Angeles
  604 1188 1726  968 2339    0                       MIA    Miami
  748  713 1631 1420 2451 1092    0                  NYC    New York
 2139 1858  949 1645  347 2594 2571    0             SF     San Francisco
 2182 1737 1021 1891  959 2734 2408  678    0        SEA    Seattle
  543  597 1494 1220 2300  923  205 2442 2329    0   WAS    Washington D.C.
;
 
proc cluster noprint method=average outtree=tree1;
   id cityname;
run;
title h=1.6 'Intercity Flying Mileage';
%gtree(tree=tree1, orient=H,label=Average Distance,sym=dot, ctree='red');
Output:

A similar tree diagram may be obtained directly with the TREE procedure:

axis1 label=none;
proc tree data=tree1
    dis horizontal
    lines=(color=red dots)
    vaxis=axis1;

See also

biplot Biplot display of variables and observations
canplot Canonical discriminant structure plot
faces Faces display for multivariate data
outlier Robust multivariate outlier detection