Starting with Version 3.0, this document is no longer maintained. The official documentation for MOSAICS is now on the WWW at http://www.math.yorku.ca/SCS/mosaics.html User's Guide for MOSAICS: A SAS/IML Program for Mosaic Displays Michael Friendly Psychology Department York University Toronto, Ontario Canada M3J 1P3 email: FRIENDLY@YorkU.CA MOSAICS, Version 2.9 March 16, 1996 Introduction The mosaic display, proposed by Hartigan & Kleiner (1981) represents the counts in a contingency table directly by tiles whose area is proportional to the cell frequency. This display generalizes readi- ly to n-way tables. Friendly (1991, 1992, 1994) extended the use of the mosaic display as a graphical tool for fitting log-linear mod- els. The enhanced mosaic uses color and shading of the tiles to reflect the sign and magnitude of the residual from a specified log- linear model. Friendly also shows how the understanding of patterns of association can be enhanced by reordering the rows and columns to make the pattern more coherent. Refer to Friendly (1991, 1992, 1994) for details of the method and examples of its use in fitting log-linear models. This report describes MOSAICS SAS, a SAS/IML program for produc- ing mosaic displays. The program has the following features: * It produces graphical displays of an n-way contingency table of any size. Experience shows that tables of up to 5 or 6 dimen- sions can be usefully explored. The main limitation is in the resolution of the display with large, complex tables. * The order of variables in the mosaic is specified by the user. Different orderings of the variables can show different aspects of the data. * The program can produce sequential displays of the marginal sub- tables, [A], [AB], [ABC], and so forth, up to the full n-way table, where A, B, C, ..., refer to the table variables in the order entered. * For each display the program fits a log-linear model and depicts the residuals from the model by the color and shading of tiles in the mosaic. * The program can automatically construct and fit a set of base- line models of independence or partial independence among the table variables. Alternatively, the user can specify and fit any log-linear model which can be estimated by iterative propor- tional fitting. Changes | The most recent changes to the program and/or this user's guide are | flagged with change bars like this. | Version 2.9: | * Installation simplified by creating a separate file, | MOSAICM.SAS, to install IML modules. | * Filltypes changed to allow separate coding for postitive and | negative residuals, and to provide grayscale shading levels. | * Added ability (cellfill) to print a symbol in the cell symboliz- | ing the value of the residual. Installation Guide How to obtain MOSAICS SAS The program, mosaics.sas, and an example of its use, mosademo.sas, are available by anonymous FTP from the host, HOTSPUR.PSYCH.YORKU.CA (Internet address 130.63.134.26). Login as user 'anonymous' and type your full email address as a password. Then change to the directory shown below and issue the get commands to retrieve the files. >ftp hotspur.psych.yorku.ca 220 hotspur.psych.yorku.ca FTP server ... Name (hotspur.psych.yorku.ca:userid): anonymous 331 Guest login ok, send ident as password Password: userid@host ftp>cd /pub/sas/mosaics ftp>get mosaics.sas ftp>get mosademo.sas | ftp>get mosaicm.sas | ftp>get mosaics.doc Installing MOSAICS.SAS MOSAICS SAS consists of a collection of SAS/IML modules which are designed to be called from another program in a proc iml step. Because the program is large, the modules are most conveniently stored in compiled form in a SAS/IML storage catalog, called | SASUSER.MOSAIC. To install the program in this way, copy the files | MOSAICS.SAS and MOSAICM.SAS to a directory, ('~/sasuser/mosaics/', | say) and run the MOSAICM program, with the command, | sas mosaicm This step need only be done once. In applications, the modules are loaded into the SAS/IML work- space with the load statement, as follows, proc iml; reset storage=mosaic; load module=_all_; On some platforms, a libref statement may be needed to specify the location of the SASUSER library in the operating system file struc- ture. Alternatively, it is possible to store and use the program in source form. This avoids the need to maintain and access the SAS/IML catalog, but means that the program is compiled each time it | is run. To use the program in this way, simply access the program | with a %include statement: | filename mosaics 'path/to/mosaics.sas'; | proc iml; | %include mosaics; On some platforms you may need to add a path specification to the %include statement or use a filename statement to specify the loca- tion of the MOSAICS.SAS file in the operating system file structure. Using MOSAICS Input parameters The frequency table analyzed is specified in the run mosaic state- ment. Various options, all of which have default values, are speci- fied by global variables in the proc iml step. Hence, the program should be called as, proc iml; reset storage=mosaic; load module=_all_; *-- specify data parameters; levels = { ... }; *-- variable levels; table = { ... }; *-- contingency table; vnames = { ... }; *-- variable names; ... *-- specify non-default global inputs; fittype='USER'; config = { 1 1, 2 3 }; run mosaic(levels, table, vnames, lnames, plots, title); The parameters for the run mosaic statement are: Parameter Description levels is a vector which specifies the number of variables and the dimensions of the contingency table. If levels is n x 1, then the table has n dimensions, and the number of levels of variable i is levels[i]. The order of the variables in lev- els is the order they are entered into the mosaic display. table is a matrix or vector giving the frequency, f sub ij..., of observations in each cell of the table. The table variables are arranged in accordance with the conventions of the SAS/IML IPF and MARG functions, so the first variable varies most rapidly across the columns of table and the last vari- able varies most slowly down the rows. In addition table must conform to levels as follows. If table is I rows by J columns, the product of all entries in levels must be IJ. Moreover, J must equal the product of the first k entries of levels, for some k. vnames is a 1 x n character vector of variable (factor) names, in an order corresponding to levels. lnames is a character matrix of labels for the variable levels, one row for each variable. The number of columns is the maximum value in levels. When the number of levels are unequal, the rows for smaller factors must be padded with blank entries. plots is a vector containing any of the integers 1 to n which specifies the list of marginal tables to be plotted. If plots contains the value i the marginal subtable for vari- ables 1 to i will be displayed. For a 3-way table, plots={1 2 3} displays each sequential plot, showing the [A], [AB] and [ABC] marginal tables; while plots=3 displays only the final 3-way [ABC] mosaic. : title is a character string or vector of strings containing : title(s) for the plots. If title is a single character : string, it is used as the title for all plots. Otherwise, : title may be a vector of up to max(plots) strings, and : title[i] is used as the tile for the plot produced by : plots[] = i. If the number of strings is less than : max(plots) the last string is used for all remaining plots. : Moreover, if the title for a given plot contains the : string &MODEL (upper case), that string is replaced by the : symbolic model description. For example, the specifica- : tions, : plots = 2:3; : fittype='JOINT'; : title = { '', : 'Hair-color Eye-color Data Model (H)(E)', : 'Hair-color Eye-color Data Model (HE)(S)'}; : produces two plots with titles from title[2] and title[3].¶ : Equivalent results (using substitution) are produced with : the single title, : title = 'Hair-color Eye-color Data Model &MODEL'; Global input variables The global variables below allow many of the details of the model fitting and mosaic display to be altered. Since they all have default values, it is only necessary to specify those you wish to change. | colors is a character vector of one or two elements specifying the | colors used for positive and negative residuals. The | default is {BLACK RED}. For a monochrome display, specify | colors='BLACK' and use two distinct fill patterns for the | fill type, such as filltype={M0 M45}. config is a numeric matrix specifying which marginal totals to fit when fittype='USER' is also specified. config is ignored for all other fit types. Each column specifies a high-order marginal in the model. For example, the log-linear model [AB] [AC] [BC] for a three-way table is specified by the 2 by 3 matrix, config = { 1 1 2, 2 3 3}; devtype {GF | LR} is a character string which specifies the type of deviations (residuals) to be represented by shading. devtype='GF' is the default. GF calculates components of Pearson goodness of fit chisquare, d sub ij = < ( f sub ij - m hat sub ij ) > / < sqrt < m hat sub ij > >, where m hat sub ij is the estimated expected frequency under the model. LR calculates components of the likelihood ratio (devi- ance) chisquare, d sub ij = roman sign ( f sub ij - m hat sub ij ) %% [ 2 % | f sub ij % log ( f sub ij % / m hat sub ij ) | + ( f sub ij - m hat sub ij ) ] sup < 1 / 2 >. fittype {JOINT | MUTUAL | CONDIT | PARTIAL | USER} is a character string which specifies the type of sequential log-linear models to fit. fittype='JOINT' is the default. JOINT specifies sequential models of joint independence, [A][B], [AB][C], [ABC][D], ... These models specify that the last variable in a given plot is indepen- dent of all previous variables jointly. MUTUAL specifies sequential models of mutual independence, [A][B], [A][B][C], [A][B][C][D], ... CONDIT specifies sequential models of conditional indepen- dence which hypothesize that all previous variables are independent, given the last, i.e., [A][B], [AC][BC], [ A D ] [ B D ] [ C D], ... For the 3-way model, A and B are hypothesized to be conditionally independent, given C; for the 4-way model, A, B, and C are conditionally independent, given D. : PARTIAL specifies sequential models of partial independence : of the first pair of variables, conditioning on all : remaining variables one at a time: [A][B], : [AC][BC], [ A C D ] [ B C D ], ... For the 3-way : model, A and B are hypothesized to be conditionally : independent, given C; for the 4-way model, A and B : are conditionally independent, given C and D. USER If fittype='USER', specify the hypothesized model in the global matrix config. The models for plots of marginal tables are based on reducing the hypoth- esized configuration, eliminating all variables not participating in the current plot. | filltype {M45 | LR | M0 | GRAY} | is a character vector of one or two elements which specifies | the type of fill pattern to use for shading. filltype[1] is | used for positive residuals; filltype[2], if present, is | used for negative residuals. If only one value is speci- | fied, a complementary value for negative residuals is gener- | ated internally. filltype='M45' is the default. M45 uses SAS/GRAPH patterns MdN135 and Md45 with hatch- ing at 45 and 135Œ. d is the density value deter- mined from the residual and the shade parameter. LR uses SAS/GRAPH patterns Ld and Rd. M0 uses SAS/GRAPH patterns MdN0 and MdN90 with hatching at 0 and 90Œ. | GRAYstep | uses solid, greyscale fill using the patterns GRAYnn | starting from GRAYF0 for density=1 and increasing | darkness by step for each successive density level. | The default for step is 16, so 'GRAY' gives GRAYF0, | GRAYE0, GRAYD0, and so forth. | cellfill {NONE | SIGN | SIZE | DEV) | Provides the ability to display a symbol in the cell repre- | senting the coded value of large residuals. This is partic- | ularly useful for black and white output, where it is diffi- | cult to portray both sign and magnitude distinctly. | NONE Nothing (default) | SIGN Draws + or - symbols in the cell, whose number cor- | responds to the shading density. | SIZE Draws + or - symbols in the cell, whose size corre- | sponds to the shading density. | DEV Writes the value of the standardized residual in the | cell. htext is a numeric value which specifies the height of text labels, in character cells. The default is htext=1.3. The program attempts to avoid overlap of category labels, but this cannot always be achieved. Adjust htext (or make the labels shorter) if they collide. | legend {H | V | NONE} | Orientation of legend for shading of residual values in | mosaic tiles. 'V' specifies a vertical legend at the right | of the display; 'H' specifies a horizontal legend beneath | the display. Default: 'NONE'. shade is a vector of up to 5 values of | d sub ij |, which specify the boundaries between shading levels. If shade={2 4} (the default), then the shading density number d is: 0 0 le | d sub ij | lt 2 1 2 le | d sub ij | lt 4 2 4 le | d sub ij | Standardized deviations are often referred to a standard Gaussian distribution; under the assumption that the model fits, these values roughly correspond to two-tailed prob- abilities p lt .05 and p lt .0001 that a given value of | d sub ij | exceeds 2 or 4, respectively. Use shade= a big number to suppress all shading. space is a vector of two values which specify the x, % y percent of the plotting area reserved for spacing between the tiles of the mosaic. The default value is 10 times the number of variables allocated to each of the vertical and horizontal directions in the plot. split is a character vector consisting of the letters V and H which specifies the directions in which the variables divide the unit square of the mosaic display. If split={H V} (the default), the mosaic alternates between horizontal and ver- tical splitting. If the number of elements in split is less than the maximum number in plots, the elements in split are reused cyclically. verbose {NONE | FIT | BOX} is a character vector of one or more words which controls verbose or detailed output. If verbose contains 'FIT', additional details of the fitting process (fitted frequen- cies, marginal proportions) are printed. If verbose con- tains 'BOX', additional details of the drawing process (tile dimensions, label placement) are printed. There is one caveat imposed by this use of global variables: The mosaic module should not be called from an IML module with its own arguments, since this would cause all variables defined within that module to inaccessible as global variables. The mosaic module may be called either in immediate mode, as in the examples in the next section, or from an IML module defined without arguments. GOPTIONS MOSAICS assumes that the vertical and horizontal dimensions of the plot are equal, so you should include a goptions statement specify- ing equal values for hsize and vsize if the default values for your device are unequal. The program uses the colors black and red to draw the tiles cor- responding to positive and negative residuals. You can use the col- ors option on the goptions statement to change these assignments if you wish. Multiple calls The mosaic module may be called repeatedly in one proc iml step. However, global variables which are set in one call remain in force. To restore these values to their default setting, use the SAS/IML free statement. For example, to revert to the default fit type of joint independence, use the statement, free fittype; before the next run mosaic statement. Examples Example 1 The program below shows the use of MOSAICS to produce a set of dif- ferent mosaic displays for a 4 x 4 x 2 table of 592 people clas- sified by hair color, eye color and sex. The module haireye creates the variables table, levels, vnames, lnames, and title. Since the variables are to be entered into the mosaic in the order hair color, eye color, and sex, the table vari- able is created as a 16 x 2 matrix with hair color varying most rapidly across the columns and sex varying down the two rows. Note that the lnames variable is a 3 x 4 matrix, and the last row con- tains two blank values. The statement run haireye; creates these variables in the SAS/IML workspace. The first run mosaics statement produces two plots, whose tiles show the [Hair][Eye] marginal table and the full three-way table. Since fittype is not specified, the model [HairEye] [Sex], in which Sex is independent of hair color and eye color jointly, is fit to the three-way table. split={V H} specifies that the first division of the mosaic is in the vertical direction. The printed output pro- duced from this run is shown in Figure 1. The second run mosaics statement fits the same models, but reord- ers the eye colors in the table to better display the pattern of association between hair color and eye color in the two-way table. It is also necessary to rearrange the eye color labels in row 2 of lnames. (This reordering is based on a correspondence analysis of residuals in the two-way table described by Friendly (1994) carried out separately.) Note that the global variables split and htext specified in the first mosaic continue to be used here. The third run mosaics statement plots only the three-way display, showing residuals from the model in which hair color, eye color and sex are mutually independent. goptions vsize=7 hsize=7 ; *-- square plot environment; proc iml; start haireye; *-- Hair color, eye color data; table = { /* ----brown--- -----blue----- ----hazel--- ---green--- */ 32 38 10 3 11 50 10 30 10 25 7 5 3 15 7 8, /* M */ 36 81 16 4 9 34 7 64 5 29 7 5 2 14 7 8 }; /* F */ levels= { 4 4 2 }; vnames = {'Hair' 'Eye' 'Sex' }; /* Variable names */ lnames = { /* Category names */ 'Black' 'Brown' 'Red' 'Blond', /* hair color */ 'Brown' 'Blue' 'Hazel' 'Green', /* eye color */ 'Male' 'Female' ' ' ' ' }; /* sex */ title = 'Hair color - Eye color data'; finish; run haireye; reset storage=mosaic; load module=_all_; *-- Fit models of joint independence (fittype='JOINT'); plots = 2:3; split={V H}; htext=1.6; run mosaic(levels, table, vnames, lnames, plots, title); *-- reorder eye colors (brown, hazel, green, blue); table = table[,((1:4) || (9:16) || (5:8))]; lnames[2,] = lnames[2,{1 3 4 2}]; plots=2:3; run mosaic(levels, table, vnames, lnames, plots, title); plots=3; fittype='MUTUAL'; run mosaic(levels, table, vnames, lnames, plots, title); quit; +------------------------------------------------------------------+ | | | | | +-------------------------------------------+ | | | Generalized Mosaic Display, Version 2.9 | | | +-------------------------------------------+ | | | | TITLE | | Hair color - Eye color data | | | | VNAMES LEVELS LNAMES | | Hair 4 Black Brown Red Blond | | Eye 4 Brown Hazel Green Blue | | Sex 2 Male Female | | | | Global options | | | | FITTYPE DEVTYPE FILLTYPE SPLIT SHADE | | JOINT GF M45 V H 2 4 | | | | Factor: 1 Hair | | | | Marginal totals | | | | MARGIN Black Brown Red Blond | | | | 108 286 71 127 | | | | Factor: 2 Eye | | | | Marginal totals | | | | MARGIN Brown Hazel Green Blue | | | | Black 68 15 5 20 | | Brown 119 54 29 84 | | Red 26 14 14 17 | | Blond 7 10 16 94 | | | | | | MODEL DF CHISQ PROB | | {Hair}{Eye} 9 G.F. 138.290 0.0000 | | L.R. 146.444 0.0000 | | | | Standardized Pearson deviations | | | | Brown Hazel Green Blue | | | | Black 4.40 -0.48 -1.95 -3.07 | | Brown 1.23 1.35 -0.35 -1.95 | | Red -0.07 0.85 2.28 -1.73 | | Blond -5.85 -2.23 0.61 7.05 | | | | Factor: 3 Sex | | | | Marginal totals | | | | MARGIN Male Female | | | | Black Brown 32 36 | | Black Hazel 10 5 | | Black Green 3 2 | | Black Blue 11 9 | | Brown Brown 38 81 | | Brown Hazel 25 29 | | Brown Green 15 14 | | Brown Blue 50 34 | | Red Brown 10 16 | | Red Hazel 7 7 | | Red Green 7 7 | | Red Blue 10 7 | | Blond Brown 3 4 | | Blond Hazel 5 5 | | Blond Green 8 8 | | Blond Blue 30 64 | | | | | | MODEL DF CHISQ PROB| | [Hair,Eye][Sex] 15 G.F. 28.993 0.0161| | L.R. 29.350 0.0145| | | | Standardized Pearson deviations | | | | Male Female | | | | Black Brown 0.30 -0.27 | | Black Hazel 1.28 -1.15 | | Black Green 0.52 -0.46 | | Black Blue 0.70 -0.63 | | Brown Brown -2.07 1.86 | | Brown Hazel 0.19 -0.17 | | Brown Green 0.57 -0.52 | | Brown Blue 2.05 -1.84 | | Red Brown -0.47 0.42 | | Red Hazel 0.30 -0.27 | | Red Green 0.30 -0.27 | | Red Blue 0.88 -0.79 | | Blond Brown -0.07 0.06 | | Blond Hazel 0.26 -0.23 | | Blond Green 0.32 -0.29 | | Blond Blue -1.84 1.65 | | | | Figure 1: Printed output for hair color, eye color data, run | | 1 | | | +------------------------------------------------------------------+ Example 2 This example illustrates input of data from a SAS data set and the use of proc sort to rearrange the variables in a table to the order desired in the mosaic displays. The data is a 2 sup 4 table classified by Gender, reported Pre- marital sex, Extra-marital sex and Marital Status, read in by the DATA step marital below. Note that the variable marital varies most rapidly and the variable gender varies most slowing in the observa- tions in the data set. The desired order of the variables in the mosaic is Gender, Pre, Extra, and Marital. In the table variable in SAS/IML, the first variable, Gender, must vary most rapidly. This is accomplished by sorting the observations with the variables list- ed in the reverse order on the by statement in the proc sort step. data marital; input gender $ pre $ extra $ @; marital='Divorced'; input freq @; output; marital='Married'; input freq @; output; cards; Women Yes Yes 17 4 Women Yes No 54 25 Women No Yes 36 4 Women No No 214 322 Men Yes Yes 28 11 Men Yes No 60 42 Men No Yes 17 4 Men No No 68 130 ; proc sort data=marital; by marital extra pre gender; In the proc iml step, the statement use marital; accesses the data set. The variable freq from the data set is read into the IML table variable, a 16 x 1 matrix. Note that the levels of the character variables gender, pre, and extra are sorted alphabetical- ly, so the category labels in lnames must appear in this order. proc iml; use marital; read all var{freq} into table; levels = { 2 2 2 2 }; vnames = {'Gender' 'Pre' 'Extra' 'Marital'}; lnames = {'Men ' 'Women ', 'Pre Sex: No' 'Yes', 'Extra Sex: No' 'Yes', 'Divorced' 'Married' }; title = 'Pre/Extramarital Sex and Marital Status'; reset storage=mosaic; load module=_all_; split = {V H}; htext=1.6; plots = 2:4; run mosaic(levels, table, vnames, lnames, plots, title); plots = 4; fittype='USER'; title ='Model (GPE, PM, EM)'; config = { 1 2 3, 2 4 4, 3 0 0}; run mosaic(levels, table, vnames, lnames, plots, title); The first run mosaic statement produces plots of the 2-way to 4-way tables, fitting models of joint independence. The second run mosaic statement produces a plot of the 4-way table, fitting the model [GPE] [PM] [EM] specified by the config variable and "fit- type='USER'". Example 3 This example shows the use of SAS/IML itself to reorder the vari- ables in a contingency table for the mosaic display. It uses the same data as in the previous example. The variables in a contingency table are reordered by the MARG function (which calculates marginal totals) when the model specified by the config parameter is the saturated model, with the variables listed in the desired order. For example, for the four-way table of the previous example, the configuration "{ 4,3,2,1 }" gives the same order of the variables created by the proc sort step. MOSAICS.SAS includes an IML module reorder (shown below) which will reorder the variables in any table. It also rearranges the values in the levels, vnames, and lnames variables in the same order. start reorder(dim, table, vnames, lnames, order); *-- reorder the dimensions of an n-way table; if nrow(dim ) =1 then dim =dim`; if nrow(order) =1 then order=order`; if nrow(vnames)=1 then vnames=vnames`; run marg(loc,newtab,dim,table,order); table = newtab; dim = dim[order,]; vnames = vnames[order,]; lnames = lnames[order,]; finish; The data table is defined, listing the observations in the same order as in the DATA step marital shown in Example 2. Note that vnames and lnames conform to this order. After the call to reorder the variables table, levels, vnames, and lnames have been rearranged so that Gender is the first variable in the mosaic, and Marital sta- tus is last. proc iml; *-- define the data variables; table={ 17 4 , /* Women Yes Yes */ 54 25 , /* Women Yes No */ 36 4 , /* Women No Yes */ 214 322 , /* Women No No */ 28 11 , /* Men Yes Yes */ 60 42 , /* Men Yes No */ 17 4 , /* Men No Yes */ 68 130 }; /* Men No No */ levels = { 2 2 2 2 }; vnames = {'Marital' 'Extra' 'Pre' 'Gender'}; lnames = {'Divorced' 'Married', 'Extra Sex: Yes' 'No', 'Pre Sex: Yes' 'No', 'Women ' 'Men' }; title = 'Pre/Extramarital Sex and Marital Status'; reset storage=mosaic; load module=_all_; order = { 4,3,2,1}; run reorder(levels, table, vnames, lnames, order); split = {V H}; plots = 2:4; run mosaic(levels, table, vnames, lnames, plots, title); quit; Implementation This section describes the algorithm for the construction of mosaic displays and provides some notes on the structure of the program. Algorithm The process is a naturally recursive one which can be implemented easily in a language which supports recursion and multi-dimensional arrays, such as APL. Wang (1985) describes a FORTRAN implementation of mosaic displays which simulates multi-dimensional arrays by sub- scripting a vector. The following algorithm, which uses two- dimensional arrays, is much simpler. 1. Denote the number of levels of the n variables by l sub 1 , ... , % l sub n, and let L sub s = PI from i=1 to s < l sub i >. At step s = 0, start with one tile, a square of size 100 x 100, and let L sub 0 = 1. 2. The tiles in the mosaic are represented by an array B of four columns (called boxes in the program). Columns 1 and 2 give the x , % y location of the lower left corner of the tile; columns 3 and 4 give the horizontal and vertical lengths of the tile. At step 0, B = { 0 %% 0 %% 100 %% 100 }. There is one row for each tile. The following steps are repeated for each variable, s = 1 , ... , % n: 3. For variable s find the marginal frequencies of variables 1, ... , % s, a vector of length L sub s, with the levels of variable s varying most rapidly. 4. Reshape this vector row-wise to a matrix M = { m sub gh } of L sub rows and l sub s columns. (The array M is called margin in the program. See the arrays labelled "Marginal totals" in Figure 1.) The rows of M correspond to the tiles of the pre- vious variables at step s - 1. 5. Each old tile is then divided vertically (if s is odd) or hori- zontally (s even) into l sub s tiles, with the width (s odd) or height (s even) of each tile proportional to m sub gh / m sub g+. This computational scheme has several desirable properties: * At any stage the division of the tiles for the current variable is in proportion to the entries in each row of M divided by the row totals. * We can draw the tiles representing the marginal frequencies at any stage, not just the final stage as Hartigan and Kleiner do. * Fitting the model of joint independence of the current variable with all previous variables jointly is equivalent to testing independence of the rows and columns of the matrix M. For exam- ple, for a three-way table, the expected frequencies under the model [AB] % [C] can be expressed in terms of the I J x K matrix M as m sub (ij)+ % m sub +k / m sub ++. Spacing: This procedure gives a mosaic of L sub n = l sub 1 x l sub 2 x ... x l sub n tiles with no spacing, in which cells with small frequencies are difficult to see. Following Hartigan and Kleiner the tiles are separated, with larger spacings at the earlier subdivisions, to help preserve the visual impact of small counts. For a four-way table with vertical splitting on variables 1 and 3, the divisions of the first variable are spaced proportionally to 1 / ( l sub 1 - 1); divisions between levels of the third variable are spaced proportionally to 1 / ( l sub 1 l sub 3 - 1 ). This spacing of the tiles is accomplished by constructing an unspaced mosaic in a reduced area (determined by the space parame- ter), then expanding to include the necessary spacing. Program structure MOSAICS SAS consists of 14 SAS/IML modules (subroutines and func- tions). The calling structure of the modules is shown in Figure 2. +------------------------------------------------------------------+ | | | mosaic *-- check inputs, assign default values; | | | | | |-- divide *-- fit models and draw the mosaic display; | | | | | |--reduce *-- find reduced model for factors 1:f; | | | | | |--mfit *-- fits a specified model; | | | | | |--chisq *-- calculate chisquares; | | | | | |--df *-- calculate degrees of freedom; | | | |--terms *-- find all terms in a loglinear model; | | | |--vars_in *-- find variables in a term; | | | | | |--modname *-- expand config into string for model label; | | | | | |--divide1 *-- divide the mosaic for the next variable; | | | | | |--space *-- space the tiles in the current display; | | | | | |--labels *-- calculate label placements; | | | | | |--gboxes *-- draw the current display; | | |--fillbox *-- custom shading; | | | | Figure 2: Calling structure of the modules in MOSAICS | | | +------------------------------------------------------------------+ The top-level module, mosaic simply validates the input parame- ters, assigns default values for global variables, and calls the module divide. The steps in the algorithm described above are car- ried out by divide; the calculation of the new tiles in step 5 is performed in divide1. References Friendly, M. (1991). Mosaic displays for multi-way contingency tables. York Univ.: Dept. of Psychology Reports, 1991, No. 195. Friendly, M. (1992). Mosaic displays for loglinear models. Pro- ceedings of the Statistical Graphics Section, American Statisti- cal Association, 61-68. Friendly, M. (1994). Mosaic displays for multi-way contingency tables. Journal of the American Statistial Association, 89, 190-200. Hartigan, J. A., and Kleiner, B. (1981), Mosaics for contingency tables. In W. F. Eddy (Ed.), Computer Science and Statistics: Proceedings of the 13th Symposium on the Interface, 268-273. New York: Springer-Verlag. Wang, C. M. (1985). Applications and computing of mosaics. Compu- tational Statistics & Data Analysis, 3, 89-97. ----------------------- : ¶ SAS/GRAPH fonts do not produce brackets, [ ] and braces, { }. Use : parentheses instead in model symbolic formulae.