Starting with Version 3.0, this document is no longer maintained.
 The official documentation for MOSAICS is now on the WWW at
                http://www.math.yorku.ca/SCS/mosaics.html
 
 
 
 
                        User's Guide for MOSAICS:
 
                  A SAS/IML Program for Mosaic Displays
 
 
                             Michael Friendly
 
                          Psychology Department
                             York University
                             Toronto, Ontario
                             Canada  M3J 1P3
 
                         email: FRIENDLY@YorkU.CA
                           MOSAICS, Version 2.9
 
 
                              March 16, 1996
 
 
 
                               Introduction
 
   The mosaic display, proposed by Hartigan & Kleiner (1981) represents
   the counts  in a contingency table  directly by tiles whose  area is
   proportional to the cell frequency.  This display generalizes readi-
   ly to n-way tables.  Friendly (1991, 1992, 1994) extended the use of
   the mosaic display  as a graphical tool for  fitting log-linear mod-
   els.   The  enhanced mosaic uses color  and shading of the  tiles to
   reflect the sign and magnitude of the residual from a specified log-
   linear model.  Friendly also shows how the understanding of patterns
   of association can be enhanced by reordering the rows and columns to
   make the  pattern more coherent.    Refer to Friendly  (1991,  1992,
   1994)  for details of the method and  examples of its use in fitting
   log-linear models.
 
      This report describes MOSAICS SAS,  a SAS/IML program for produc-
   ing mosaic displays.  The program has the following features:
 
   *   It produces graphical displays of  an n-way contingency table of
       any size.   Experience shows that tables of  up to 5 or 6 dimen-
       sions can be usefully explored.   The  main limitation is in the
       resolution of the display with large, complex tables.
 
   *   The order of  variables in the mosaic is specified  by the user.
       Different orderings of the variables  can show different aspects
       of the data.
 
   *   The program can produce sequential displays of the marginal sub-
       tables, [A],  [AB],  [ABC],  and so forth,  up to the full n-way
       table, where A, B, C,  ...,  refer to the table variables in the
       order entered.
 
   *   For each display the program fits a log-linear model and depicts
       the residuals from  the model by the color and  shading of tiles
       in the mosaic.
 
   *   The program can  automatically construct and fit a  set of base-
       line models  of independence or  partial independence  among the
       table variables.   Alternatively,  the user  can specify and fit
       any log-linear model which can be estimated by iterative propor-
       tional fitting.
 
 
   Changes
 
|  The most recent changes to the  program and/or this user's guide are
|  flagged with change bars like this.
 
|  Version 2.9:
 
|  *   Installation   simplified   by   creating   a   separate   file,
|      MOSAICM.SAS, to install IML modules.
|  *   Filltypes changed  to allow  separate coding  for postitive  and
|      negative residuals, and to provide grayscale shading levels.
|  *   Added ability (cellfill) to print a symbol in the cell symboliz-
|      ing the value of the residual.
 
 
 
                            Installation Guide
 
   How to obtain MOSAICS SAS
 
   The program, mosaics.sas,  and an example of its use,  mosademo.sas,
   are available by anonymous FTP from the host, HOTSPUR.PSYCH.YORKU.CA
   (Internet address  130.63.134.26).   Login  as user  'anonymous' and
   type your  full email address  as a  password.   Then change  to the
   directory shown  below and  issue the get  commands to  retrieve the
   files.
 
   >ftp hotspur.psych.yorku.ca
   220 hotspur.psych.yorku.ca FTP server ...
   Name (hotspur.psych.yorku.ca:userid): anonymous
   331 Guest login ok, send ident as password
   Password: userid@host
   ftp>cd /pub/sas/mosaics
   ftp>get mosaics.sas
   ftp>get mosademo.sas
|  ftp>get mosaicm.sas
|  ftp>get mosaics.doc
 
 
   Installing MOSAICS.SAS
 
   MOSAICS SAS  consists of a collection  of SAS/IML modules  which are
   designed  to be  called from  another program  in a  proc iml  step.
   Because the  program is  large,  the  modules are  most conveniently
   stored  in  compiled form  in  a  SAS/IML storage  catalog,   called
|  SASUSER.MOSAIC.   To install the program in this way, copy the files
|  MOSAICS.SAS and MOSAICM.SAS to  a directory,  ('~/sasuser/mosaics/',
|  say) and run the MOSAICM program, with the command,
 
|  sas mosaicm
 
   This step need only be done once.
 
      In applications,  the  modules are loaded into  the SAS/IML work-
   space with the load statement, as follows,
 
   proc iml;
     reset storage=mosaic;
     load module=_all_;
 
   On some platforms,  a libref statement  may be needed to specify the
   location of the SASUSER library in  the operating system file struc-
   ture.
 
      Alternatively,  it  is possible to store  and use the  program in
   source  form.   This  avoids the  need  to maintain  and access  the
   SAS/IML catalog, but means that the program is compiled each time it
|  is run.   To use the program in this way,  simply access the program
|  with a %include statement:
 
|  filename mosaics 'path/to/mosaics.sas';
|  proc iml;
|    %include mosaics;
 
   On some platforms  you may need to  add a path specification  to the
   %include statement or use a filename  statement to specify the loca-
   tion of the MOSAICS.SAS file in the operating system file structure.
 
 
 
                              Using MOSAICS
 
   Input parameters
 
   The frequency table  analyzed is specified in the  run mosaic state-
   ment.  Various options, all of which have default values, are speci-
   fied by global variables in the proc iml step.   Hence,  the program
   should be called as,
 
   proc iml;
     reset storage=mosaic;
     load module=_all_;
     *-- specify data parameters;
     levels = { ... };   *-- variable levels;
     table  = { ... };   *-- contingency table;
     vnames = { ... };   *-- variable names;
       ...
 
     *-- specify non-default global inputs;
     fittype='USER';
     config = { 1  1,
                2  3 };
 
     run mosaic(levels, table, vnames, lnames, plots, title);
 
   The parameters for the run mosaic statement are:
 
   Parameter
           Description
 
   levels  is a vector which specifies the  number of variables and the
           dimensions of the contingency table.   If levels is n  x  1,
           then the table has n dimensions, and the number of levels of
           variable i is levels[i].  The order of the variables in lev-
           els is the order they are entered into the mosaic display.
 
   table   is a matrix or vector giving the frequency, f sub ij...,  of
           observations in each cell of the table.  The table variables
           are  arranged in  accordance  with  the conventions  of  the
           SAS/IML IPF and MARG functions, so the first variable varies
           most rapidly across the columns of  table and the last vari-
           able varies most slowly down the rows.
 
              In addition table must conform to levels as follows.   If
           table is I rows by J columns,  the product of all entries in
           levels must be IJ.   Moreover,  J  must equal the product of
           the first k entries of levels, for some k.
 
   vnames  is a 1  x  n character vector of variable (factor) names, in
           an order corresponding to levels.
 
   lnames  is a character matrix of labels for the variable levels, one
           row for each variable.  The number of columns is the maximum
           value in levels.  When the number of levels are unequal, the
           rows for smaller factors must be padded with blank entries.
 
   plots   is a  vector containing  any of  the integers  1 to  n which
           specifies the  list of marginal  tables to be  plotted.   If
           plots contains the value i   the marginal subtable for vari-
           ables 1 to i will be displayed.  For a 3-way table, plots={1
           2 3} displays each sequential  plot,  showing the [A],  [AB]
           and [ABC] marginal tables;  while  plots=3 displays only the
           final 3-way [ABC] mosaic.
 
:  title   is  a  character  string or  vector  of  strings  containing
:          title(s)  for  the plots.   If  title is a  single character
:          string,  it is used as the title for all plots.   Otherwise,
:          title may  be a  vector of up  to max(plots)   strings,  and
:          title[i]  is used  as  the tile  for  the  plot produced  by
:          plots[]  =  i.   If  the  number  of  strings is  less  than
:          max(plots) the last string is used for all remaining plots.
 
:             Moreover,  if  the title  for a  given plot  contains the
:          string &MODEL (upper case),  that  string is replaced by the
:          symbolic model  description.   For example,   the specifica-
:          tions,
 
:          plots = 2:3;
:          fittype='JOINT';
:          title = { '',
:                    'Hair-color Eye-color Data  Model (H)(E)',
:                    'Hair-color Eye-color Data  Model (HE)(S)'};
 
:          produces two plots with titles  from title[2] and title[3].¶
:          Equivalent results  (using substitution)  are  produced with
:          the single title,
 
:          title = 'Hair-color Eye-color Data  Model &MODEL';
 
 
   Global input variables
 
   The global  variables below allow many  of the details of  the model
   fitting and  mosaic display  to be  altered.   Since  they all  have
   default values,  it  is only necessary to specify those  you wish to
   change.
 
|  colors  is a character vector of one  or two elements specifying the
|          colors  used  for  positive and  negative  residuals.    The
|          default is {BLACK RED}.   For a monochrome display,  specify
|          colors='BLACK' and  use two distinct  fill patterns  for the
|          fill type, such as filltype={M0 M45}.
 
   config  is a numeric matrix specifying  which marginal totals to fit
           when fittype='USER'  is also specified.   config  is ignored
           for all other fit types.  Each column specifies a high-order
           marginal in the model.   For  example,  the log-linear model
           [AB] [AC] [BC]  for a three-way table is specified  by the 2
           by 3 matrix,
 
            config = { 1  1  2,
                       2  3  3};
 
   devtype  {GF | LR}
           is a character string which specifies the type of deviations
           (residuals)  to be represented by shading.   devtype='GF' is
           the default.
 
           GF      calculates  components of  Pearson  goodness of  fit
                   chisquare, d sub ij  = < ( f sub ij  -  m hat sub ij
                   )  >  /  < sqrt < m hat sub ij > >,  where m hat sub
                   ij  is the  estimated expected  frequency under  the
                   model.
           LR      calculates components of the likelihood ratio (devi-
                   ance) chisquare,  d sub ij  =  roman sign ( f sub ij
                   - m hat sub ij )  %% [ 2 %  | f sub ij % log ( f sub
                   ij %  /  m hat sub ij )  | +  ( f sub ij - m hat sub
                   ij ) ] sup < 1 / 2 >.
 
   fittype  {JOINT | MUTUAL | CONDIT | PARTIAL | USER}
           is a character string which specifies the type of sequential
           log-linear models to fit.  fittype='JOINT' is the default.
 
           JOINT   specifies sequential  models of  joint independence,
                   [A][B], [AB][C], [ABC][D], ...  These models specify
                   that the last  variable in a given  plot is indepen-
                   dent of all previous variables jointly.
           MUTUAL  specifies sequential models  of mutual independence,
                   [A][B], [A][B][C], [A][B][C][D], ...
           CONDIT  specifies sequential models  of conditional indepen-
                   dence which hypothesize that  all previous variables
                   are independent,   given the  last,  i.e.,   [A][B],
                   [AC][BC], [ A D ] [ B D ] [ C D], ...  For the 3-way
                   model,  A and B are hypothesized to be conditionally
                   independent, given C; for the 4-way model, A, B, and
                   C are conditionally independent, given D.
:          PARTIAL specifies sequential models  of partial independence
:                  of the first pair of variables,  conditioning on all
:                  remaining  variables   one  at  a   time:    [A][B],
:                  [AC][BC],  [ A C D ] [ B C D ],  ...   For the 3-way
:                  model,  A and B are hypothesized to be conditionally
:                  independent, given C;  for the 4-way model,  A and B
:                  are conditionally independent, given C and D.
           USER    If fittype='USER', specify the hypothesized model in
                   the global matrix  config.  The models for  plots of
                   marginal tables  are based  on reducing  the hypoth-
                   esized configuration,  eliminating all variables not
                   participating in the current plot.
 
|  filltype  {M45 | LR | M0 | GRAY}
|          is a character vector of one or two elements which specifies
|          the type of fill pattern to use for shading.  filltype[1] is
|          used for positive residuals;   filltype[2],  if present,  is
|          used for negative  residuals.   If only one  value is speci-
|          fied, a complementary value for negative residuals is gener-
|          ated internally.  filltype='M45' is the default.
 
           M45     uses SAS/GRAPH patterns MdN135  and Md45 with hatch-
                   ing at 45 and 135Ś.   d  is the density value deter-
                   mined from the residual and the shade parameter.
           LR      uses SAS/GRAPH patterns Ld and Rd.
           M0      uses SAS/GRAPH patterns MdN0 and MdN90 with hatching
                   at 0 and 90Ś.
|          GRAYstep
|                  uses solid, greyscale fill using the patterns GRAYnn
|                  starting from  GRAYF0 for  density=1 and  increasing
|                  darkness by step for  each successive density level.
|                  The default for step is 16,  so 'GRAY' gives GRAYF0,
|                  GRAYE0, GRAYD0, and so forth.
 
|  cellfill  {NONE | SIGN | SIZE | DEV)
|          Provides the ability to display a  symbol in the cell repre-
|          senting the coded value of large residuals.  This is partic-
|          ularly useful for black and white output, where it is diffi-
|          cult to portray both sign and magnitude distinctly.
 
|          NONE    Nothing (default)
|          SIGN    Draws + or - symbols in the cell,  whose number cor-
|                  responds to the shading density.
|          SIZE    Draws + or - symbols in the cell,  whose size corre-
|                  sponds to the shading density.
|          DEV     Writes the value of the standardized residual in the
|                  cell.
 
   htext   is  a  numeric value  which  specifies  the height  of  text
           labels, in character cells.  The default is htext=1.3.   The
           program attempts to  avoid overlap of category  labels,  but
           this cannot always be achieved.    Adjust htext (or make the
           labels shorter) if they collide.
 
|  legend  {H | V | NONE}
|          Orientation  of legend  for shading  of  residual values  in
|          mosaic tiles.   'V' specifies a vertical legend at the right
|          of the display;   'H' specifies a horizontal  legend beneath
|          the display.  Default: 'NONE'.
 
   shade   is a vector of up to 5 values of | d sub ij |, which specify
           the boundaries between shading levels.   If shade={2 4} (the
           default), then the shading density number d is:
 
            0    0  le  | d sub ij |  lt  2
            1    2  le  | d sub ij |  lt  4
            2    4  le  | d sub ij |
 
           Standardized  deviations are  often referred  to a  standard
           Gaussian distribution;  under the  assumption that the model
           fits,  these  values roughly correspond to  two-tailed prob-
           abilities p  lt  .05 and p  lt   .0001 that a given value of
           | d sub ij | exceeds 2 or 4, respectively.  Use shade= a big
           number to suppress all shading.
 
   space   is a vector of two values which  specify the x,  % y percent
           of the plotting area reserved  for spacing between the tiles
           of the mosaic.   The default value is 10 times the number of
           variables allocated to  each of the vertical  and horizontal
           directions in the plot.
 
   split   is a  character vector  consisting of  the letters  V and  H
           which specifies the directions in which the variables divide
           the unit square of the mosaic display.   If split={H V} (the
           default),  the mosaic alternates between horizontal and ver-
           tical splitting.  If the number of elements in split is less
           than the maximum number in plots,  the elements in split are
           reused cyclically.
 
   verbose  {NONE | FIT | BOX}
           is a  character vector of one  or more words  which controls
           verbose  or detailed  output.   If  verbose contains  'FIT',
           additional details  of the fitting process  (fitted frequen-
           cies,  marginal proportions)  are printed.   If verbose con-
           tains 'BOX', additional details of the drawing process (tile
           dimensions, label placement) are printed.
 
      There is one caveat imposed by this use of global variables:  The
   mosaic module should not  be called from an IML module  with its own
   arguments,  since this would cause all variables defined within that
   module to inaccessible as global  variables.   The mosaic module may
   be called either in immediate mode,  as  in the examples in the next
   section, or from an IML module defined without arguments.
 
 
   GOPTIONS
 
   MOSAICS assumes that  the vertical and horizontal  dimensions of the
   plot are equal,  so you should include a goptions statement specify-
   ing equal values for hsize and vsize  if the default values for your
   device are unequal.
 
      The program uses the colors black and  red to draw the tiles cor-
   responding to positive and negative residuals.  You can use the col-
   ors option on the goptions statement  to change these assignments if
   you wish.
 
 
   Multiple calls
 
   The mosaic  module may be  called repeatedly  in one proc  iml step.
   However, global variables which are set in one call remain in force.
   To restore these  values to their default setting,   use the SAS/IML
   free statement.   For example,  to revert to the default fit type of
   joint independence, use the statement,
 
   free fittype;
 
   before the next run mosaic statement.
 
 
 
                                 Examples
 
   Example 1
 
   The program below shows the use of  MOSAICS to produce a set of dif-
   ferent mosaic displays for a 4  x  4  x  2 table of 592 people clas-
   sified by hair color, eye color and sex.
 
      The module haireye creates the variables table,  levels,  vnames,
   lnames,  and title.   Since the variables are to be entered into the
   mosaic in the order hair color, eye color, and sex,  the table vari-
   able is created  as a 16  x   2 matrix with hair  color varying most
   rapidly across the columns and sex varying down the two rows.   Note
   that the lnames variable is a 3  x  4 matrix,  and the last row con-
   tains two blank values.   The  statement run haireye;  creates these
   variables in the SAS/IML workspace.
 
      The first run mosaics statement  produces two plots,  whose tiles
   show the  [Hair][Eye] marginal table  and the full  three-way table.
   Since fittype is not specified, the model [HairEye] [Sex],  in which
   Sex is independent of  hair color and eye color jointly,   is fit to
   the three-way table.   split={V H} specifies that the first division
   of the mosaic is in the vertical direction.  The printed output pro-
   duced from this run is shown in Figure 1.
 
      The second run mosaics statement fits the same models, but reord-
   ers the  eye colors in  the table to  better display the  pattern of
   association between hair  color and eye color in  the two-way table.
   It is also necessary  to rearrange the eye color labels  in row 2 of
   lnames.   (This reordering is based  on a correspondence analysis of
   residuals in the two-way table described by Friendly (1994)  carried
   out separately.)    Note that the  global variables split  and htext
   specified in the first mosaic continue to be used here.
 
      The third run mosaics statement plots only the three-way display,
   showing residuals from the model in which hair color,  eye color and
   sex are mutually independent.
 
   goptions vsize=7 hsize=7 ;   *-- square plot environment;
 
   proc iml;
   start haireye;
      *-- Hair color, eye color data;
     table = {
     /* ----brown---   -----blue-----   ----hazel---   ---green--- */
       32  38  10  3   11  50  10  30   10  25  7  5   3  15  7  8,   /* M */
       36  81  16  4    9  34   7  64    5  29  7  5   2  14  7  8 }; /* F */
 
     levels= { 4 4 2 };
     vnames = {'Hair' 'Eye' 'Sex' };    /* Variable names */
     lnames = {                         /* Category names */
              'Black' 'Brown' 'Red' 'Blond',    /* hair color */
              'Brown' 'Blue' 'Hazel' 'Green',   /* eye color  */
              'Male' 'Female' ' '  ' ' };       /* sex        */
     title  = 'Hair color - Eye color data';
     finish;
 
     run haireye;
      reset storage=mosaic;
      load module=_all_;
      *-- Fit models of joint independence (fittype='JOINT');
      plots = 2:3;
      split={V H};
      htext=1.6;
      run mosaic(levels, table, vnames, lnames, plots, title);
 
      *-- reorder eye colors (brown, hazel, green, blue);
      table  = table[,((1:4) || (9:16) || (5:8))];
      lnames[2,] = lnames[2,{1 3 4 2}];
      plots=2:3;
      run mosaic(levels, table, vnames, lnames, plots, title);
 
      plots=3;
      fittype='MUTUAL';
      run mosaic(levels, table, vnames, lnames, plots, title);
   quit;
 
 
   +------------------------------------------------------------------+
   |                                                                  |
   |                                                                  |
   |                 +-------------------------------------------+    |
   |                 |  Generalized Mosaic Display, Version 2.9  |    |
   |                 +-------------------------------------------+    |
   |                                                                  |
   |                          TITLE                                   |
   |                          Hair color - Eye color data             |
   |                                                                  |
   |                VNAMES     LEVELS    LNAMES                       |
   |                Hair            4    Black  Brown  Red    Blond   |
   |                Eye             4    Brown  Hazel  Green  Blue    |
   |                Sex             2    Male   Female                |
   |                                                                  |
   |                                 Global options                   |
   |                                                                  |
   |                  FITTYPE  DEVTYPE  FILLTYPE  SPLIT  SHADE        |
   |                  JOINT    GF       M45       V H       2    4    |
   |                                                                  |
   |                             Factor:         1 Hair               |
   |                                                                  |
   |                                Marginal totals                   |
   |                                                                  |
   |                 MARGIN     Black     Brown       Red     Blond   |
   |                                                                  |
   |                              108       286        71       127   |
   |                                                                  |
   |                             Factor:         2 Eye                |
   |                                                                  |
   |                                Marginal totals                   |
   |                                                                  |
   |                 MARGIN     Brown     Hazel     Green      Blue   |
   |                                                                  |
   |                 Black         68        15         5        20   |
   |                 Brown        119        54        29        84   |
   |                 Red           26        14        14        17   |
   |                 Blond          7        10        16        94   |
   |                                                                  |
   |                                                                  |
   |                MODEL              DF   CHISQ               PROB  |
   |                {Hair}{Eye}         9   G.F.    138.290   0.0000  |
   |                                        L.R.    146.444   0.0000  |
   |                                                                  |
   |                        Standardized Pearson deviations           |
   |                                                                  |
   |                             Brown    Hazel    Green     Blue     |
   |                                                                  |
   |                   Black      4.40    -0.48    -1.95    -3.07     |
   |                   Brown      1.23     1.35    -0.35    -1.95     |
   |                   Red       -0.07     0.85     2.28    -1.73     |
   |                   Blond     -5.85    -2.23     0.61     7.05     |
   |                                                                  |
   |                             Factor:         3 Sex                |
   |                                                                  |
   |                                Marginal totals                   |
   |                                                                  |
   |                        MARGIN            Male    Female          |
   |                                                                  |
   |                        Black Brown         32        36          |
   |                        Black Hazel         10         5          |
   |                        Black Green          3         2          |
   |                        Black Blue          11         9          |
   |                        Brown Brown         38        81          |
   |                        Brown Hazel         25        29          |
   |                        Brown Green         15        14          |
   |                        Brown Blue          50        34          |
   |                        Red   Brown         10        16          |
   |                        Red   Hazel          7         7          |
   |                        Red   Green          7         7          |
   |                        Red   Blue          10         7          |
   |                        Blond Brown          3         4          |
   |                        Blond Hazel          5         5          |
   |                        Blond Green          8         8          |
   |                        Blond Blue          30        64          |
   |                                                                  |
   |                                                                  |
   |              MODEL                  DF   CHISQ               PROB|
   |              [Hair,Eye][Sex]        15   G.F.     28.993   0.0161|
   |                                          L.R.     29.350   0.0145|
   |                                                                  |
   |                        Standardized Pearson deviations           |
   |                                                                  |
   |                                          Male   Female           |
   |                                                                  |
   |                         Black Brown      0.30    -0.27           |
   |                         Black Hazel      1.28    -1.15           |
   |                         Black Green      0.52    -0.46           |
   |                         Black Blue       0.70    -0.63           |
   |                         Brown Brown     -2.07     1.86           |
   |                         Brown Hazel      0.19    -0.17           |
   |                         Brown Green      0.57    -0.52           |
   |                         Brown Blue       2.05    -1.84           |
   |                         Red   Brown     -0.47     0.42           |
   |                         Red   Hazel      0.30    -0.27           |
   |                         Red   Green      0.30    -0.27           |
   |                         Red   Blue       0.88    -0.79           |
   |                         Blond Brown     -0.07     0.06           |
   |                         Blond Hazel      0.26    -0.23           |
   |                         Blond Green      0.32    -0.29           |
   |                         Blond Blue      -1.84     1.65           |
   |                                                                  |
   |  Figure 1:  Printed output for hair color, eye color data,  run  |
   |             1                                                    |
   |                                                                  |
   +------------------------------------------------------------------+
 
 
   Example 2
 
   This example illustrates input  of data from a SAS data  set and the
   use of proc sort to rearrange the  variables in a table to the order
   desired in the mosaic displays.
 
      The data is a 2 sup 4  table classified by Gender,  reported Pre-
   marital sex,  Extra-marital sex and Marital  Status,  read in by the
   DATA step marital below.  Note that the variable marital varies most
   rapidly and the variable gender varies  most slowing in the observa-
   tions in the data  set.   The desired order of the  variables in the
   mosaic is Gender, Pre, Extra, and Marital.  In the table variable in
   SAS/IML, the first variable, Gender,  must vary most rapidly.   This
   is accomplished by sorting the observations with the variables list-
   ed in the reverse order on the by statement in the proc sort step.
 
   data marital;
      input gender $ pre $ extra $ @;
      marital='Divorced';  input freq @;  output;
      marital='Married';   input freq @;  output;
   cards;
   Women  Yes  Yes   17   4
   Women  Yes  No    54  25
   Women  No   Yes   36   4
   Women  No   No   214 322
   Men    Yes  Yes   28  11
   Men    Yes  No    60  42
   Men    No   Yes   17   4
   Men    No   No    68 130
   ;
   proc sort data=marital;
      by marital extra pre gender;
 
      In the proc  iml step,  the statement use  marital;  accesses the
   data set.  The variable freq  from the data set is read into the IML
   table variable,  a  16  x  1 matrix.    Note that the levels  of the
   character variables gender, pre,  and extra are sorted alphabetical-
   ly, so the category labels in lnames must appear in this order.
 
   proc iml;
      use marital;
      read all var{freq} into table;
      levels = { 2 2 2 2 };
      vnames = {'Gender' 'Pre' 'Extra' 'Marital'};
      lnames = {'Men      '  'Women     ',
                'Pre Sex: No'  'Yes',
                'Extra Sex: No'   'Yes',
                'Divorced'   'Married' };
      title  = 'Pre/Extramarital Sex and Marital Status';
 
      reset storage=mosaic;
      load module=_all_;
      split = {V H};
      htext=1.6;
      plots = 2:4;
      run mosaic(levels, table, vnames, lnames, plots, title);
 
      plots = 4;
      fittype='USER';
      title ='Model (GPE, PM, EM)';
      config = { 1  2  3,
                 2  4  4,
                 3  0  0};
      run mosaic(levels, table, vnames, lnames, plots, title);
 
      The first  run mosaic  statement produces plots  of the  2-way to
   4-way tables, fitting models of joint independence.   The second run
   mosaic statement  produces a plot of  the 4-way table,   fitting the
   model [GPE]  [PM] [EM]  specified by the  config variable  and "fit-
   type='USER'".
 
 
   Example 3
 
   This example  shows the use of  SAS/IML itself to reorder  the vari-
   ables in a  contingency table for the mosaic display.    It uses the
   same data as in the previous example.
 
      The variables  in a contingency table  are reordered by  the MARG
   function (which calculates marginal totals) when the model specified
   by the config parameter is the  saturated model,  with the variables
   listed in the desired order.  For example, for the four-way table of
   the previous example, the configuration "{ 4,3,2,1 }" gives the same
   order of the variables created by the proc sort step.
 
      MOSAICS.SAS includes an  IML module reorder (shown  below)  which
   will reorder  the variables in any  table.   It also  rearranges the
   values in  the levels,   vnames,  and lnames  variables in  the same
   order.
 
   start reorder(dim, table, vnames, lnames, order);
      *-- reorder the dimensions of an n-way table;
      if nrow(dim  ) =1 then dim  =dim`;
      if nrow(order) =1 then order=order`;
      if nrow(vnames)=1 then vnames=vnames`;
      run marg(loc,newtab,dim,table,order);
      table = newtab;
      dim = dim[order,];
      vnames = vnames[order,];
      lnames = lnames[order,];
      finish;
 
      The data table is defined,  listing  the observations in the same
   order as in  the DATA step marital  shown in Example 2.    Note that
   vnames and lnames conform to this order.   After the call to reorder
   the variables table, levels, vnames, and lnames have been rearranged
   so that Gender is the first variable in the mosaic, and Marital sta-
   tus is last.
 
   proc iml;
     *-- define the data variables;
     table={ 17   4 ,  /* Women  Yes  Yes  */
             54  25 ,  /* Women  Yes  No   */
             36   4 ,  /* Women  No   Yes  */
            214 322 ,  /* Women  No   No   */
             28  11 ,  /* Men    Yes  Yes  */
             60  42 ,  /* Men    Yes  No   */
             17   4 ,  /* Men    No   Yes  */
             68 130 }; /* Men    No   No   */
      levels = { 2 2 2 2 };
      vnames = {'Marital' 'Extra' 'Pre' 'Gender'};
      lnames = {'Divorced'   'Married',
                'Extra Sex: Yes' 'No',
                'Pre Sex: Yes'   'No',
                'Women    '      'Men' };
      title  = 'Pre/Extramarital Sex and Marital Status';
 
      reset storage=mosaic;
      load module=_all_;
 
      order = { 4,3,2,1};
      run reorder(levels, table, vnames, lnames, order);
      split = {V H};
      plots = 2:4;
      run mosaic(levels, table, vnames, lnames, plots, title);
   quit;
 
 
 
                              Implementation
 
   This section describes the algorithm  for the construction of mosaic
   displays and provides some notes on the structure of the program.
 
 
   Algorithm
 
   The process  is a naturally recursive  one which can  be implemented
   easily in a language which  supports recursion and multi-dimensional
   arrays, such as APL.  Wang (1985) describes a FORTRAN implementation
   of mosaic displays which simulates  multi-dimensional arrays by sub-
   scripting  a vector.    The following  algorithm,   which uses  two-
   dimensional arrays, is much simpler.
 
   1. Denote the number of levels of the n variables by l sub 1 ,   ...
      ,  % l sub n,  and let L sub s  =   PI from i=1 to s < l sub i >.
      At step s  =  0, start with one tile, a square of size 100 x 100,
      and let L sub 0  =  1.
 
   2. The tiles  in the mosaic  are represented by  an array B  of four
      columns (called boxes in the program).   Columns 1 and 2 give the
      x , % y location of the lower left corner of the tile;  columns 3
      and 4 give the horizontal and  vertical lengths of the tile.   At
      step 0,  B  =  {  0 %% 0 %% 100 %% 100 }.    There is one row for
      each tile.
 
   The following steps are repeated for each variable,  s  =  1 ,   ...
   , % n:
 
   3. For variable s find the marginal frequencies of variables 1,  ...
      , % s, a vector of length L sub s,  with the levels of variable s
      varying most rapidly.
 
   4. Reshape this vector row-wise to a matrix M   =  { m sub gh } of L
      sub <s -  1> rows and l sub  s columns.   (The array  M is called
      margin in the program.  See the arrays labelled "Marginal totals"
      in Figure 1.)   The rows of M correspond to the tiles of the pre-
      vious variables at step s  -  1.
 
   5. Each old tile is then divided vertically  (if s is odd)  or hori-
      zontally (s even) into l sub s tiles,  with the width (s odd)  or
      height (s even)  of each tile proportional  to m sub gh  /  m sub
      g+.
 
   This computational scheme has several desirable properties:
 
   *   At any stage the division of  the tiles for the current variable
       is in proportion to the entries in  each row of M divided by the
       row totals.
 
   *   We can draw  the tiles representing the  marginal frequencies at
       any stage, not just the final stage as Hartigan and Kleiner do.
 
   *   Fitting the model of joint  independence of the current variable
       with all  previous variables  jointly is  equivalent to  testing
       independence of the rows and columns of the matrix M.  For exam-
       ple,  for a three-way table,  the expected frequencies under the
       model [AB]  % [C] can  be expressed in  terms of  the I J   x  K
       matrix M as m sub (ij)+ % m sub +k  /  m sub ++.
 
   Spacing:  This procedure gives a mosaic of L sub n  =  l sub 1  x  l
   sub 2  x   ...    x  l sub n  tiles with no spacing,  in which cells
   with small frequencies are difficult to see.  Following Hartigan and
   Kleiner the tiles are separated, with larger spacings at the earlier
   subdivisions,  to help  preserve the visual impact  of small counts.
   For a four-way  table with vertical splitting on variables  1 and 3,
   the divisions of  the first variable are spaced  proportionally to 1
   /  ( l sub  1 - 1);  divisions between levels  of the third variable
   are spaced proportionally to 1  /  ( l sub 1 l sub 3 - 1 ).
 
      This  spacing of  the tiles  is accomplished  by constructing  an
   unspaced mosaic in  a reduced area (determined by  the space parame-
   ter), then expanding to include the necessary spacing.
 
 
   Program structure
 
   MOSAICS SAS  consists of 14  SAS/IML modules (subroutines  and func-
   tions).  The calling structure of the modules is shown in Figure 2.
 
 
   +------------------------------------------------------------------+
   |                                                                  |
   |   mosaic    *-- check inputs, assign default values;             |
   |   |                                                              |
   |   |-- divide   *-- fit models and draw the mosaic display;       |
   |       |                                                          |
   |       |--reduce   *-- find reduced model for factors 1:f;        |
   |       |                                                          |
   |       |--mfit     *-- fits a specified model;                    |
   |       |                                                          |
   |       |--chisq    *-- calculate chisquares;                      |
   |       |                                                          |
   |       |--df       *-- calculate degrees of freedom;              |
   |       |   |--terms    *-- find all terms in a loglinear model;   |
   |       |       |--vars_in    *-- find variables in a term;        |
   |       |                                                          |
   |       |--modname  *-- expand config into string for model label; |
   |       |                                                          |
   |       |--divide1  *-- divide the mosaic for the next variable;   |
   |       |                                                          |
   |       |--space    *-- space the tiles in the current display;    |
   |       |                                                          |
   |       |--labels   *-- calculate label placements;                |
   |       |                                                          |
   |       |--gboxes   *-- draw the current display;                  |
   |          |--fillbox   *-- custom shading;                        |
   |                                                                  |
   |  Figure 2:  Calling structure of the modules in MOSAICS          |
   |                                                                  |
   +------------------------------------------------------------------+
 
 
      The top-level module,  mosaic simply  validates the input parame-
   ters,  assigns default  values for global variables,   and calls the
   module divide.   The steps in the algorithm described above are car-
   ried out by divide;   the calculation of the new tiles  in step 5 is
   performed in divide1.
 
 
 
                                References
 
   Friendly, M. (1991).  Mosaic displays for multi-way contingency
      tables.  York Univ.: Dept. of Psychology Reports, 1991, No. 195.
 
   Friendly, M. (1992).  Mosaic displays for loglinear models.  Pro-
      ceedings of the Statistical Graphics Section, American Statisti-
      cal Association, 61-68.
 
   Friendly, M. (1994).  Mosaic displays for multi-way contingency
      tables.  Journal of the American Statistial Association, 89,
      190-200.
 
   Hartigan, J. A., and Kleiner, B. (1981), Mosaics for contingency
      tables.  In W. F. Eddy (Ed.), Computer Science and Statistics:
      Proceedings of the 13th Symposium on the Interface, 268-273.  New
      York: Springer-Verlag.
 
   Wang, C. M. (1985).  Applications and computing of mosaics.  Compu-
      tational Statistics & Data Analysis, 3, 89-97.
 
   -----------------------
 
:  ¶ SAS/GRAPH fonts do not produce brackets, [ ] and braces, { }.  Use
:    parentheses instead in model symbolic formulae.