Visualizing Categorical Data: mosaic
$Version: 1.5-1 (31 Mar 2009)
Michael Friendly
York University
Macro interface for mosaic displays
The MOSAIC macro provides an easily used macro interface to the MOSAICS,
MOSAICD and MOSPART SAS/IML programs. Using the SAS/IML programs directly
means that you must compose a PROC IML step and invoke the MOSAIC module
(or MOSPART, for partial mosaics).
The MOSAIC macro may be used with any SAS data set in frequency form (e.g.,
the output from PROC FREQ). The macro simply creates the PROC IML step,
reads the input data set, and runs the either the mosaic module, the
mosaicd module, or the mospart module, depending on the options specified.
If your data is in case form, just use PROC FREQ first to construct the
contingency table.
Ordinarily, the program fits a model (specified by the FITTYPE=
parameter) and displays residuals from this model in the mosaic for each
marginal subtable specified by the PLOTS=
parameter. However, if you have already fit a model and calculated
residuals some other way (e.g., using PROC CATMOD or PROC GENMOD), specify
a RESID= variable in the macro call. The macro will then call the mosaicd module.
If a BY= variable is specified, the macro produces one (partial) mosaic plot for
each level of the BY
variable(s).
Requirements
The MOSAIC macro is unusual, in that it requires that you also have downloaded and installed the
related SAS/IML programs as described in the User's Guide to Mosaics.
In particular, you should download, edit (as required) and run the program mosaicm.sas
to install the SAS/IML modules used by the macro.
The parameters for the mosaic macro are like those of the SAS/IML MOSAICS program,
except:
- DATA=
-
Specifies the name of the input dataset. The data set should contain one
observation per cell, the variables listed in VAR= and COUNT=, and possibly RESID= and BY=.
- VAR=
-
Specifies the names of the factor variables for the contingency table.
Abbreviated variable lists (e.g.,
V1-V3
) are not allowed. The levels of the factor variables may be character or
numeric, but are used `as is' in the input data. You may omit the VAR= variables if variable names are used in the VORDER= parameter.
- BY=
-
Specifies the names of one (or more) By variables. Partial mosaic plots are
produced for each combination of the levels of the BY= variables. The BY=
variable(s)
*must* be listed among the VAR= variables.
- COUNT=
-
Specifies the names of the frequency variable in the dataset
- CONFIG=
-
For a user-specified model, CONFIG= gives the terms in the model, separated by '/'. For example, to fit the
model of no-three-way association, specify
config=1 2 / 1 3 / 2 3
or (using variable names)
config = A B / A C / B C
Note that the numbers refer to the variables after they have been
reordered, either sorting the data set, or by the
VORDER= parameter.
- VORDER=
-
Specifies either the names of the variables or their indices in the desired
order in the mosaic. Note that the using the VORDER parameter keeps the
factor levels in their order in the input data set.
- SORT=
-
Specifies whether and how the input data set is to be sorted to produce the
desired order of variables in the mosaic.
SORT=YES
sorts the data in the reverse order that they are listed in the VAR= paraemter, so that the variables are entered in the order given in the VAR= parameter. Otherwise,
SORT= lists the variable names, possibly with the DESENDING or NOTSORTED options
in the reverse of the desired order. e.g., SORT=C DESCENDING B DESCENDING A
. The default is
SORT=YES
, unless VORDER= has been specified.
- RESID=
-
Specifies that a model has already been fit and that externally calculated
residuals are contained in the variable named by the RESID= parameter.
Example
%include vcd(mosaic); *-- or include in an autocall library;
%mosaic();
See also
mosaics SAS/IML programs for mosaic displays
mosmat Macro interface for mosaic matrices