/*-------------------------------------------------------------------*/
 /*                  Visualizing Categorical Data                     */
 /*                      by Michael Friendly                          */
 /*       Copyright(c) 2000 by SAS Institute Inc., Cary, NC, USA      */
 /*                    ISBN 978-1-58025-660-5                         */
 /*-------------------------------------------------------------------*/
 /*                                                                   */
 /* This material is provided "as is" by SAS Institute Inc.  There    */
 /* are no warranties, expressed or implied, as to merchantability or */
 /* fitness for a particular purpose regarding the materials or code  */
 /* contained herein. The Institute is not responsible for errors     */
 /* in this material as it now exists or will exist, nor does the     */
 /* Institute provide technical support for it.                       */
 /*                                                                   */
 /*-------------------------------------------------------------------*/
 /* Questions or problem reports concerning this material may be      */
 /* addressed to the author:                                          */
 /*                                                                   */
 /* SAS Institute Inc.                                                */
 /* SAS Press                                                         */
 /* Attn: Michael Friendly 	                                          */
 /* SAS Campus Drive                                                  */
 /* Cary, NC   27513                                                  */
 /*                                                                   */
 /*                                                                   */
 /* If you prefer, you can send email to:  saspress@sas.com           */
 /* Use this for subject field:                                       */
 /*     Comments for Michael Friendly                                 */
 /*                                                                   */
 /*-------------------------------------------------------------------*/
 /* Date Last Updated: 22May08                                        */
 /*-------------------------------------------------------------------*/

/* This archive contains the programs and data sets from ``Visualizing
Categorical Data''.  These programs are maintained by me at the
VCD web site, http://www.math.yorku.ca/SCS/vcd/, where any updated
versions may be found.

The files are grouped into the following directories:

   catdata- data sets from Appendix B
   doc    - some documentation, in pdf form
   iml    - all the SAS/IML programs
   macros - all the macro programs (Appendix A)
   mosaics- SAS/IML programs for mosaic displays, and examples
   sample - sample programs

INSTALLATION

For ease of use, you should copy these directories to similarly-named
directories under your !SASUSER directory.  In a DOS or Windows
environment, for example, this might be C:\SAS\SASUSER; under a
Unix-like system, this might be ~/sasuser/ where ~ refers to your
home directory.

The macro programs are most easily used if you add the name of the
macros directory to the list of directories recognized by the SAS
Autocall Facility.  Then, SAS will search this directory automatically
for macro programs which you invoke.  You can do this by adding a
statement like the following to your AUTOEXEC.SAS file:

   options sasautos=('vcdmacros', SASAUTOS);

substituting for 'vcdmacros' the directory to which you copied the 
macros; for example under Windows,

   options sasautos=('c:\sasuser\macros', SASAUTOS);

If you are running SAS from a networked installation, you may need to
modify the -autoexec option for SAS invocation so that
your local AUTOEXEC.SAS is used, rather than the system-wide
version.

It is also convenient to define (in your AUTOEXEC.SAS file) FILENAME
statements to point to these directories, for use in %INCLUDE
statements, so that you do not need to refer to full path filenames.
The following FILEREFs are assumed to be defined in most of the
sample programs.  They are illustrated here using pathnames for a
Unix system.

filename iml      '~/sasuser/iml';
filename macros   '~/sasuser/macros';
filename mosaics  '~/sasuser/mosaics';
filename catdata  '~/sasuser/catdata';

Thus, the statement,

   %include catdata(icu);

will find the file ~/sasuser/catdata/icu.sas.

In addition, all of my graphics application programs (in the samples/
and mosaics/ directories) use a single file, goptions.sas, to set
global graphics options the way I want, and to make these applications
portable.  I define this file in a FILENAME statement as,

filename goptions    '~/sasuser/goptions.sas';

and start each program with

   %include goptions;

For Windows systems, this file might contain, for example,

   goptions reset=all device=win target=winprtc;

or might simply be an empty file.  On a Unix system, for direct output
to PostScript files, I use something like,

   goptions lfactor=3 device=pscolor ftext=hwpsl009 htext=1.5 htitle=2;

Instructions for installing the SAS/IML programs for mosaic displays
are given in the INSTALL file in the mosaics/ directory.

REQUIREMENTS:

All of the programs require Base/SAS and SAS/GRAPH.  Many also require
SAS/IML.

LIMITATIONS:

All of the macros and other programs were developed and tested under
SAS Versions 6.07 - 6.12.  Initial testing and modification
for Versions 7 - 8.1 has begun, but I cannot yet say that subtle changes
have all been caught and accounted for.

In addition, all SAS graphics depend on the graphics parameters for fonts,
graph size, etc. set in the graphics device driver and in goptions statements.
*/

/*
title 'Arthritis treatment data';
*/
proc format;
   value outcome 0 = 'not improved'
                 1 = 'improved';
data arthrit;
   length treat $7. sex $6. ;
   input id treat $ sex $ age improve @@ ;
   case = _n_;
   better  = (improve > 0);
   _treat_ = (treat ='Treated') ;     /* dummy variables */
   _sex_   = (sex = 'Female');
  cards ;
57 Treated Male   27 1   9 Placebo Male   37 0
46 Treated Male   29 0  14 Placebo Male   44 0
77 Treated Male   30 0  73 Placebo Male   50 0
17 Treated Male   32 2  74 Placebo Male   51 0
36 Treated Male   46 2  25 Placebo Male   52 0
23 Treated Male   58 2  18 Placebo Male   53 0
75 Treated Male   59 0  21 Placebo Male   59 0
39 Treated Male   59 2  52 Placebo Male   59 0
33 Treated Male   63 0  45 Placebo Male   62 0
55 Treated Male   63 0  41 Placebo Male   62 0
30 Treated Male   64 0   8 Placebo Male   63 2
 5 Treated Male   64 1  80 Placebo Female 23 0
63 Treated Male   69 0  12 Placebo Female 30 0
83 Treated Male   70 2  29 Placebo Female 30 0
66 Treated Female 23 0  50 Placebo Female 31 1
40 Treated Female 32 0  38 Placebo Female 32 0
 6 Treated Female 37 1  35 Placebo Female 33 2
 7 Treated Female 41 0  51 Placebo Female 37 0
72 Treated Female 41 2  54 Placebo Female 44 0
37 Treated Female 48 0  76 Placebo Female 45 0
82 Treated Female 48 2  16 Placebo Female 46 0
53 Treated Female 55 2  69 Placebo Female 48 0
79 Treated Female 55 2  31 Placebo Female 49 0
26 Treated Female 56 2  20 Placebo Female 51 0
28 Treated Female 57 2  68 Placebo Female 53 0
60 Treated Female 57 2  81 Placebo Female 54 0
22 Treated Female 57 2   4 Placebo Female 54 0
27 Treated Female 58 0  78 Placebo Female 54 2
 2 Treated Female 59 2  70 Placebo Female 55 2
59 Treated Female 59 2  49 Placebo Female 57 0
62 Treated Female 60 2  10 Placebo Female 57 1
84 Treated Female 61 2  47 Placebo Female 58 1
64 Treated Female 62 1  44 Placebo Female 59 1
34 Treated Female 62 2  24 Placebo Female 59 2
58 Treated Female 66 2  48 Placebo Female 61 0
13 Treated Female 67 2  19 Placebo Female 63 1
61 Treated Female 68 1   3 Placebo Female 64 0
65 Treated Female 68 2  67 Placebo Female 65 2
11 Treated Female 69 0  32 Placebo Female 66 0
56 Treated Female 69 1  42 Placebo Female 66 0
43 Treated Female 70 1  15 Placebo Female 66 1
                        71 Placebo Female 68 1
                         1 Placebo Female 74 2
;
title 'Berkeley Admissions data';
proc format;
   value admit 1="Admitted" 0="Rejected"          ;
   value yn    1="+"        0="-"                 ;
   value dept  1="A" 2="B" 3="C" 4="D" 5="E" 6="F";
        value $sex  'M'='Male'   'F'='Female';
data berkeley;
   do dept = 1 to 6;
      do gender = 'M', 'F';
         do admit = 1, 0;
            input freq @@;
            output;
   end; end; end;
/* Admit  Rej  Admit Rej */
cards;
     512  313    89   19
     353  207    17    8
     120  205   202  391
     138  279   131  244
      53  138    94  299
      22  351    24  317
;
title 'Hair - Eye color data';
data haireye;
   length  hair $8 eye $6 sex $6;
   drop c i black brown red blond;
   array h{*} black brown red blond;
   c='Black Brown Red Blond';
   input sex $ eye $ black brown red blond;
   do i=1 to dim(h);
      count = h(i); hair=scan(c,i);
      output;
      end;
cards;
M  Brown        32        53        10         3
M  Blue         11        50        10        30
M  Hazel        10        25         7         5
M  Green         3        15         7         8
F  Brown        36        66        16         4
F  Blue          9        34         7        64
F  Hazel         5        29         7         5
F  Green         2        14         7         8
;
/*
Name:      icu.sas
Title:     The ICU data
KEYWORDS:  Logistic Regression
SIZE:  200 observations, 21 variables

NOTE:
        These data come from Appendix 2 of Hosmer and Lemeshow (1989).
These data are copyrighted and must be acknowledged and used accordingly.

DESCRIPTIVE ABSTRACT:
        The ICU data set consists of a sample of 200 subjects who were part of
a much larger study on survival of patients following admission to an adult
intensive care unit (ICU).  The major goal of this study was to develop a
logistic regression model to predict the probability of survival to hospital
discharge of these patients and to study the risk factors associated with 
ICU mortality.  A number of publications have appeared which have focused on
various facets of the problem.  The reader wishing to learn more about the
clinical aspects of this study should start with Lemeshow, Teres, Avrunin,
and Pastides (1988).

SOURCE:  Data were collected at Baystate Medical Center in Springfield,
Massachusetts.

REFERENCES:

1.  Hosmer and Lemeshow, Applied Logistic Regression, Wiley, (1989).

2.  Lemeshow, S., Teres, D., Avrunin, J. S., Pastides, H. (1988). Predicting
    the Outcome of Intensive Care Unit Patients. Journal of the American
    Statistical Association, 83, 348-356.
*/

proc format;
        value yn    0='No' 1='Yes';
        value sex   0='Male'  1='Female';
        value race  1='White' 2='Black'  3='Other';
        value ser   0='Medical' 1='Surgery';
        value admit 0='Elective' 1='Emergency';
        value po    0='>60'     1='<=60';
        value ph    0='>=7.25'  1='<7.25';
        value pco   0='<=45'   1='>45';
        value cre   0='<=2'    1='>2';

data icu;
        input id died age sex race service cancer renal infect
                cpr systolic hrtrate previcu admit fracture po2 ph pco bic
                creatin coma;
        label
                id = 'Patient id code'
                died = 'Died before discharge'      /* 0=No, 1=Yes */
                age = 'Age'                         /* years */
                sex = 'Sex'                         /* 0 = Male, 1 = Female */
                race = 'Race'                       /* 1 = White, 2=Black, 3 = Other */
                service = 'Service at Admission'    /* 0 = Medical, 1 = Surgical */
                cancer = 'Cancer Part of Problem'   /* 0=No, 1=Yes */
                renal = 'History of Chronic Renal'  /* 0=No, 1=Yes */
                infect = 'Infection Probable'       /* 0=No, 1=Yes */
                cpr = 'CPR Prior to ICU Admission'  /* 0=No, 1=Yes */
                systolic = 'Systolic Blood Pressure' /* mm Hg */
                hrtrate = 'Heart Rate at Admission' /* beats/min */
                previcu = 'Previous Admit to ICU'   /* 0=No, 1=Yes */
                admit = 'Type of Admission'         /* 0=Elec 1=Emerg */
                fracture = 'Fracture'               /* 0=No, 1=Yes */
                po2 = 'PO2, inital Blood Gas'       /* 0=>60, 1=<=60 */
                ph = 'PH, inital Blood Gas'         /* 0=7.25, 1= <7.25 */
                pco = 'PCO2, inital Blood Gas'      /* 0=45, 1= >45 */
                bic = 'Bicarbonate, inital Blood'   /* 0=18, 1= <18 */
                creatin = 'Creatinine, inital Blood' /* 0=2, 1= >2 */
                coma = 'Consciousness at ICU'       /* 0=None 1=Stupor 2=Coma */
                uncons = 'Stupor or coma at ICU';

    white = (race=1);
    uncons= (coma>0);

    format died cancer renal infect cpr previcu fracture yn.;
    format sex sex. race race. admit admit. ph ph. pco pco. creatin cre.;
/*
          D         R                                                   C
          I   A  S  A  S  C  C  I  C   S    H   P  T  F  P     P  B  C  O
       I  E   G  E  C  E  A  R  N  P   Y    R   R  Y  R  O  P  C  I  R  M
       D  D   E  X  E  R  N  N  F  R   S    A   E  P  A  2  H  O  C  E  A
*/
cards;
       8  0  27  1  1  0  0  0  1  0  142   88  0  1  0  0  0  0  0  0  0
      12  0  59  0  1  0  0  0  0  0  112   80  1  1  0  0  0  0  0  0  0
      14  0  77  0  1  1  0  0  0  0  100   70  0  0  0  0  0  0  0  0  0
      28  0  54  0  1  0  0  0  1  0  142  103  0  1  1  0  0  0  0  0  0
      32  0  87  1  1  1  0  0  1  0  110  154  1  1  0  0  0  0  0  0  0
      38  0  69  0  1  0  0  0  1  0  110  132  0  1  0  1  0  0  1  0  0
      40  0  63  0  1  1  0  0  0  0  104   66  0  0  0  0  0  0  0  0  0
      41  0  30  1  1  0  0  0  0  0  144  110  0  1  0  0  0  0  0  0  0
      42  0  35  0  2  0  0  0  0  0  108   60  0  1  0  0  0  0  0  0  0
      50  0  70  1  1  1  1  0  0  0  138  103  0  0  0  0  0  0  0  0  0
      51  0  55  1  1  1  0  0  1  0  188   86  1  0  0  0  0  0  0  0  0
      53  0  48  0  2  1  1  0  0  0  162  100  0  0  0  0  0  0  0  0  0
      58  0  66  1  1  1  0  0  0  0  160   80  1  0  0  0  0  0  0  0  0
      61  0  61  1  1  0  0  1  0  0  174   99  0  1  0  0  1  0  1  1  0
      73  0  66  0  1  0  0  0  0  0  206   90  0  1  0  0  0  0  0  1  0
      75  0  52  0  1  1  0  0  1  0  150   71  1  0  0  0  0  0  0  0  0
      82  0  55  0  1  1  0  0  1  0  140  116  0  0  0  0  0  0  0  0  0
      84  0  59  0  1  0  0  0  1  0   48   39  0  1  0  1  0  1  1  0  2
      92  0  63  0  1  0  0  0  0  0  132  128  1  1  0  0  0  0  0  0  0
      96  0  72  0  1  1  0  0  0  0  120   80  1  0  0  0  0  0  0  0  0
      98  0  60  0  1  0  0  0  1  1  114  110  0  1  0  0  0  0  0  0  0
     100  0  78  0  1  1  0  0  0  0  180   75  0  0  0  0  0  0  0  0  0
     102  0  16  1  1  0  0  0  0  0  104  111  0  1  0  0  0  0  0  0  0
     111  0  62  0  1  1  0  1  0  0  200  120  0  0  0  0  0  0  0  0  0
     112  0  61  0  1  0  0  0  1  0  110  120  0  1  0  0  0  0  0  0  0
     136  0  35  0  1  0  0  0  0  0  150   98  0  1  0  0  0  0  0  0  0
     137  0  74  1  1  1  0  0  0  0  170   92  0  0  0  0  0  1  0  0  0
     143  0  68  0  1  1  0  0  0  0  158   96  0  0  0  0  0  0  0  0  0
     153  0  69  1  1  1  0  0  0  0  132   60  0  1  0  0  0  0  0  0  0
     170  0  51  0  1  0  0  0  0  0  110   99  0  1  0  0  0  0  0  0  0
     173  0  55  0  1  1  0  0  0  0  128   92  0  0  0  0  0  0  0  0  0
     180  0  64  1  3  1  0  0  1  0  158   90  1  1  0  0  0  0  0  0  0
     184  0  88  1  1  1  0  0  1  0  140   88  1  1  0  0  0  0  0  0  0
     186  0  23  1  1  1  0  0  0  0  112   64  0  1  1  0  0  0  0  0  0
     187  0  73  1  1  1  1  0  0  0  134   60  0  0  0  0  0  1  0  0  0
     190  0  53  0  3  1  0  0  0  0  110   70  1  0  0  0  0  0  0  0  0
     191  0  74  0  1  1  0  0  0  0  174   86  0  0  0  0  0  0  0  0  0
     207  0  68  0  1  1  0  0  0  0  142   89  0  0  0  0  0  0  0  0  0
     211  0  66  1  1  0  0  0  1  0  170   95  1  1  0  0  0  0  0  0  0
     214  0  60  0  1  1  1  0  1  0  110   92  0  0  0  0  0  0  0  0  0
     219  0  64  0  1  1  0  0  1  0  160  120  0  0  0  0  0  0  0  0  0
     225  0  66  0  2  1  1  0  1  0  150  120  0  0  0  0  0  1  0  0  0
     237  0  19  1  1  1  0  0  1  0  142  106  0  1  1  0  0  0  0  0  0
     247  0  18  1  1  0  0  0  0  0  146  112  0  1  0  0  0  0  0  0  0
     249  0  63  0  1  1  0  0  1  0  162   84  1  1  0  0  0  0  0  0  0
     260  0  45  0  1  0  0  0  0  0  126  110  0  1  0  0  0  0  0  0  0
     266  0  64  0  1  0  0  0  0  0  162  114  0  1  0  0  0  0  0  0  0
     271  0  68  1  1  0  0  0  1  0  200  170  1  1  0  0  0  0  0  0  0
     276  0  64  1  1  0  0  0  1  0  126  122  0  1  0  1  0  1  0  0  0
     277  0  82  0  1  1  0  0  0  0  135   70  0  0  0  0  0  0  0  0  0
     278  0  73  0  1  1  0  0  0  0  170   88  0  0  0  0  0  0  0  0  0
     282  0  70  0  1  0  0  0  0  0   86  153  1  1  0  0  0  1  0  0  0
     292  0  61  0  1  1  0  0  1  0   68  124  0  1  0  0  0  0  0  0  0
     295  0  64  0  1  1  1  0  1  0  116   88  0  0  0  0  0  0  0  0  0
     297  0  47  0  1  1  1  0  1  0  120   83  0  0  0  0  0  0  0  0  0
     298  0  69  0  1  1  0  0  0  0  170  100  0  0  0  0  0  0  0  0  0
     308  0  67  1  1  0  0  0  1  0  190  125  0  1  0  0  0  0  0  0  0
     310  0  18  0  1  1  1  0  0  0  156   99  0  0  0  0  0  0  0  0  0
     319  0  77  0  1  1  0  0  1  0  158  107  0  0  0  0  0  0  0  0  0
     327  0  32  0  2  1  0  0  0  0  120   84  0  1  0  0  0  0  0  0  0
     333  0  19  1  1  1  0  0  1  0  104  121  1  0  0  0  0  0  0  0  0
     335  0  72  1  1  1  0  0  0  0  130   86  0  1  0  0  0  0  0  0  0
     343  0  49  0  1  0  0  0  1  0  112  112  0  1  0  0  0  0  0  0  0
     357  0  68  1  1  1  0  0  0  0  154   74  0  0  0  0  0  0  0  0  0
     362  0  82  0  1  1  0  1  1  0  130  131  0  1  0  0  0  0  0  0  0
     365  0  32  1  3  0  0  0  1  1  110  118  0  1  0  0  0  0  0  0  0
     369  0  78  1  1  1  0  0  1  0  126   96  0  1  0  0  0  0  0  0  0
     370  0  57  0  1  0  0  0  1  0  128  104  0  1  0  0  0  1  0  0  0
     371  0  46  1  1  1  1  0  0  0  132   90  0  1  0  0  0  0  0  0  0
     376  0  23  0  1  0  0  0  1  0  144   88  0  1  0  0  0  0  0  0  0
     378  0  55  0  1  0  0  0  0  0  132  112  0  1  0  0  0  0  0  0  0
     379  0  18  0  1  1  0  0  0  0  112   76  0  1  1  0  0  0  0  0  0
     381  0  20  0  1  1  0  0  0  0  164  108  0  1  0  0  0  0  0  0  0
     382  0  75  1  1  1  0  0  0  0  100   48  0  0  0  0  0  0  0  0  0
     398  0  79  0  1  1  0  0  1  0  112   67  0  0  0  0  0  0  0  0  0
     401  0  40  0  1  1  0  0  0  0  140   65  0  1  1  0  0  0  0  0  0
     409  0  76  0  1  1  0  0  1  0  110   70  0  1  0  0  0  0  0  0  0
     413  0  66  1  1  1  0  0  1  0  139   92  0  0  0  0  0  0  0  0  0
     416  0  76  0  1  0  0  0  1  0  190  100  0  1  0  0  0  0  0  0  0
     438  0  80  1  1  1  0  0  0  0  162   44  0  1  0  0  0  0  0  0  0
     439  0  23  1  1  0  0  0  1  0  120   88  0  1  0  0  0  0  0  0  0
     440  0  48  0  2  1  0  0  1  0   92  162  1  1  0  0  0  0  0  0  0
     455  0  67  0  2  1  0  0  0  0   90   92  1  0  0  0  0  0  0  0  0
     462  0  69  1  1  1  0  0  0  0  150   85  0  1  0  0  0  0  0  0  0
     495  0  65  0  3  1  0  0  0  0  208  124  0  0  0  0  0  0  0  0  0
     498  0  72  0  1  1  0  0  0  0  126   88  0  0  0  0  0  0  0  0  0
     502  0  55  0  1  0  0  0  0  0  190  136  0  1  0  1  1  1  0  0  0
     505  0  40  0  1  0  0  0  0  0  130   65  0  1  0  0  0  0  0  0  0
     508  0  55  1  1  0  0  0  1  0  110   86  0  1  0  0  0  0  0  0  0
     517  0  34  0  1  1  0  0  0  0  110   80  0  1  1  0  0  0  0  0  0
     522  0  47  1  1  1  0  0  0  0  132   68  0  1  0  0  0  0  0  0  0
     525  0  41  1  1  0  0  0  1  0  118  145  0  1  0  0  1  0  1  0  0
     526  0  84  1  1  0  0  1  1  0  100  103  0  1  0  0  0  0  1  1  0
     546  0  88  1  1  1  0  0  0  0  110   46  1  0  0  0  0  0  0  0  0
     548  0  77  1  1  1  1  0  0  0  212   87  0  0  0  0  0  1  0  0  0
     550  0  80  0  1  0  0  0  0  0  122  126  0  1  0  1  0  0  1  0  0
     552  0  16  0  1  1  0  0  0  0  100  140  0  1  1  0  0  0  0  0  0
     560  0  70  0  1  1  0  0  0  0  160   60  0  0  0  0  0  0  0  0  0
     563  0  83  1  1  1  0  0  1  0  138   91  0  1  0  0  0  0  0  0  0
     573  0  23  0  2  0  0  0  0  0  130   52  0  1  0  0  0  0  0  0  0
     575  0  67  1  1  0  0  0  0  1  120  120  0  1  0  0  1  1  0  0  0
     584  0  18  0  1  1  1  0  0  0  130  140  0  0  0  0  0  0  0  0  0
     597  0  77  1  1  0  0  0  1  0  136  138  0  0  0  1  1  1  0  0  0
     598  0  48  1  1  0  0  0  0  1  128   96  0  1  0  0  0  0  0  0  0
     601  0  24  1  2  0  0  0  0  0  140   86  0  1  0  0  0  0  0  0  0
     605  0  71  1  1  0  0  0  1  0  124  106  0  1  0  0  0  0  0  0  0
     607  0  72  0  1  1  0  0  0  0  134   60  0  1  0  0  0  0  0  0  0
     619  0  77  1  1  1  0  1  0  0  170  115  1  0  0  0  0  0  0  0  0
     620  0  60  0  1  1  0  0  1  0  124  135  0  1  0  0  0  0  0  0  0
     639  0  46  0  1  1  1  0  0  0  110  128  0  0  0  0  0  0  0  0  0
     644  0  65  1  1  0  0  0  0  0  100  105  0  1  0  0  0  0  0  0  0
     645  0  36  0  1  0  0  0  0  0  224  125  0  1  0  0  0  0  0  0  0
     648  0  68  0  1  1  0  0  0  0  112   64  0  0  0  0  0  0  0  0  0
     655  0  58  0  1  0  0  0  0  0  154   98  0  1  0  0  0  0  0  0  0
     659  0  76  1  1  0  0  0  1  0   92  112  0  1  0  0  0  0  0  0  0
     669  0  41  1  2  0  0  0  0  0  110  144  0  1  0  0  0  0  1  1  0
     670  0  20  0  3  0  0  0  0  0  120   68  0  1  0  0  0  0  0  0  0
     674  0  91  0  1  0  0  1  1  0  152  125  0  1  0  0  0  0  0  0  0
     675  0  75  0  1  1  0  0  0  0  140   90  0  1  0  0  0  0  0  0  0
     676  0  25  1  1  0  0  0  0  0  131  135  0  1  0  0  0  0  1  0  0
     709  0  70  0  1  0  0  0  1  0   78  143  0  1  0  1  0  0  0  0  0
     713  0  47  0  1  1  0  0  0  0  156  112  0  1  0  0  0  0  0  0  0
     727  0  75  0  3  1  0  0  0  0  144  120  0  1  0  0  0  0  0  1  0
     728  0  40  0  2  0  0  0  1  0  160  150  1  1  1  0  0  0  0  0  0
     732  0  71  0  1  0  0  0  1  0  148  192  0  1  0  1  1  1  0  0  0
     746  0  70  1  1  0  0  0  1  0   90  140  0  1  0  1  0  0  1  0  0
     749  0  58  0  1  1  0  0  0  0  148   95  1  1  0  0  0  0  0  0  0
     754  0  54  0  1  1  0  0  0  0  136   80  0  0  0  0  0  0  0  0  0
     761  0  77  0  1  1  0  0  0  0  128   59  0  0  0  0  0  0  0  0  0
     763  0  55  0  1  1  1  0  1  0  138  140  0  0  0  0  0  0  0  0  0
     764  0  21  0  1  1  0  0  0  0  120   62  0  1  0  0  0  0  0  0  0
     765  0  53  0  2  0  0  1  0  1  170  115  0  1  0  0  0  0  0  0  0
     766  0  31  1  1  0  1  1  1  1  146  100  0  1  0  0  1  1  0  0  0
     772  0  71  0  1  1  1  0  0  0  204   52  0  0  0  0  0  0  0  0  0
     776  0  49  0  2  0  0  0  0  0  150  100  0  1  0  0  0  0  0  0  0
     784  0  60  1  2  0  0  0  1  0  116   92  1  1  0  0  0  0  0  0  0
     794  0  50  0  1  0  0  0  1  0  156   99  0  1  0  1  0  1  0  0  0
     796  0  45  1  1  1  0  0  0  0  132  109  0  1  1  0  0  0  0  0  0
     809  0  21  0  1  1  0  0  0  0  110   90  0  1  0  0  0  0  0  0  0
     814  0  73  1  1  1  0  0  0  0  130   83  0  1  0  0  0  0  0  0  0
     816  0  28  0  1  1  0  0  1  0  122   80  1  0  1  0  0  0  0  0  0
     829  0  17  0  1  1  0  0  0  0  140   78  0  1  1  0  0  0  0  0  0
     837  0  17  1  3  0  0  0  0  0  130  140  0  1  0  0  0  0  0  0  0
     846  0  21  1  1  1  0  0  0  0  142   79  0  1  0  0  0  0  0  0  0
     847  0  68  1  1  1  1  0  0  0   91   79  0  0  0  0  0  0  0  0  0
     863  0  17  0  3  1  0  0  0  0  136   78  0  1  0  0  0  0  0  0  0
     867  0  60  0  1  0  0  0  1  0  108  120  0  1  0  0  0  0  0  0  0
     875  0  69  0  1  1  0  0  0  0  169   73  0  1  0  0  0  0  0  0  0
     877  0  88  1  1  0  0  1  0  0  190   88  0  1  0  0  0  0  0  0  0
     880  0  20  0  1  1  0  0  0  0  120   80  0  1  0  0  0  0  0  0  0
     881  0  89  1  1  1  0  0  0  0  190  114  0  1  0  0  0  1  0  0  2
     889  0  62  1  1  0  0  0  0  0  110   78  0  1  0  0  0  0  0  0  0
     893  0  46  0  1  0  0  1  1  0  142   89  0  1  0  0  1  0  1  0  0
     906  0  19  0  1  1  0  0  1  0  100  137  0  1  0  0  0  0  0  0  0
     912  0  71  0  1  0  0  0  1  0  124  124  0  1  0  1  1  1  0  0  0
     915  0  67  0  1  1  0  0  0  0  152   78  0  0  0  0  0  0  0  0  0
     923  0  20  0  1  1  0  0  0  0  104   83  0  1  0  0  0  0  0  0  0
     924  0  73  1  2  0  0  1  0  0  162  100  0  1  0  0  0  0  0  0  0
     925  0  59  0  1  0  0  0  0  0  100   88  0  1  0  0  0  0  0  0  0
     929  0  42  0  1  1  0  0  0  0  122   84  0  1  1  0  0  0  0  0  0
       4  1  87  1  1  1  0  0  1  0   80   96  0  1  1  1  1  1  0  0  0
      27  1  76  1  1  1  0  0  1  0  128   90  1  1  0  0  0  0  0  0  0
      47  1  78  0  1  0  0  0  1  0  130  132  0  1  0  0  0  0  1  0  0
      52  1  63  0  1  0  0  1  1  0  112  106  1  1  0  1  0  0  0  0  0
     127  1  19  0  1  1  0  0  0  0  140   76  0  1  0  0  0  0  0  0  0
     145  1  67  1  1  0  0  0  1  0   62  145  0  1  0  0  0  0  0  1  0
     154  1  53  1  1  0  0  0  1  0  148  128  0  1  0  0  1  1  0  0  0
     165  1  92  0  1  0  0  0  1  0  124   80  0  1  0  0  0  0  1  0  0
     195  1  57  0  1  0  0  0  1  1  110  124  0  1  0  0  0  0  0  0  2
     202  1  75  1  1  1  1  0  0  0  130  136  0  0  0  0  0  0  0  0  0
     204  1  91  0  1  0  0  0  1  0   64  125  0  1  0  0  0  1  0  0  0
     208  1  70  0  1  1  0  0  0  0  168  122  0  0  0  1  0  0  0  0  1
     222  1  88  0  1  0  0  0  1  1  141  140  0  1  0  0  0  0  0  0  0
     238  1  41  0  1  1  0  0  1  0  140   58  0  1  0  0  0  0  0  0  2
     241  1  61  0  1  0  0  0  0  0  140   81  0  1  0  0  0  0  0  0  0
     273  1  80  0  1  1  0  0  0  0  100   85  0  1  0  0  0  0  0  0  0
     285  1  40  0  1  0  0  0  1  0   86   80  1  1  0  0  0  0  0  0  0
     299  1  75  0  1  0  0  0  1  0   90  100  0  1  0  0  0  0  0  0  1
     331  1  63  1  1  1  0  1  1  1   36   86  0  1  1  0  0  0  0  1  2
     346  1  75  1  1  0  1  0  0  0  190   94  0  1  0  0  0  0  0  0  0
     380  1  20  0  1  1  0  0  0  0  148   72  0  1  1  0  0  0  0  0  0
     384  1  71  0  1  0  0  0  0  0  142   95  0  1  0  0  0  0  0  0  0
     412  1  51  1  1  1  0  0  1  0  134  100  1  1  0  0  0  0  0  0  1
     427  1  65  0  1  0  0  0  0  0   66   94  0  1  0  0  0  0  0  0  2
     442  1  69  1  3  0  0  1  0  0  170   60  1  1  0  1  0  0  0  0  0
     461  1  55  0  1  1  0  1  1  0  122  100  1  1  0  0  0  0  0  0  0
     468  1  50  1  1  1  1  0  0  0  120   96  0  1  0  0  0  0  0  0  0
     490  1  78  0  1  0  0  0  1  0  110   81  0  1  0  0  0  0  0  0  0
     518  1  71  1  1  0  0  0  0  1   70  112  0  1  0  0  0  0  0  0  2
     611  1  85  1  1  1  0  0  0  0  136   96  0  1  0  0  0  0  0  0  0
     613  1  75  0  1  0  0  1  1  0  130  119  0  1  0  0  1  0  1  1  0
     666  1  65  1  1  0  0  0  1  1  104  150  0  1  0  0  0  1  0  0  2
     671  1  49  0  1  0  0  0  1  1  140  108  0  1  0  0  0  0  1  0  0
     706  1  75  1  1  0  0  1  1  1  150   66  0  1  0  0  0  0  0  1  2
     740  1  72  1  1  0  0  0  0  0   90  160  0  1  0  0  0  0  0  0  0
     751  1  69  0  1  0  0  1  0  0   80   81  0  1  0  0  0  0  0  0  2
     752  1  64  0  1  0  1  0  1  0   80  118  0  1  0  1  0  0  0  1  0
     789  1  60  0  1  0  0  0  1  0   56  114  1  1  0  0  1  0  1  0  0
     871  1  60  0  3  1  0  1  1  0  130   55  0  1  0  0  0  0  0  0  1
     921  1  50  1  2  0  0  0  0  0  256   64  0  1  0  0  0  0  0  0  1
;
proc sort;
        by descending died age;
/*
  Name:  marital.sas
 Title:  Pre-marital sex, extra-marital sex, and divorce
Source:  Thornes and Collard 1979, Gilbert 1981
*/ 
data marital;
   input gender $ pre $ extra $ @;
        pre = 'Pre:' || pre;
        extra = 'X:' || extra;
   marital='Divorced';  input count @;  output;
   marital='Married';   input count @;  output;
cards;
Women  Yes  Yes   17   4
Women  Yes  No    54  25
Women  No   Yes   36   4
Women  No   No   214 322
Men    Yes  Yes   28  11
Men    Yes  No    60  42
Men    No   Yes   17   4
Men    No   No    68 130
;
proc sort;
        by marital extra pre gender;
title 'Lifeboats on the Titanic';
/* from the Board of Trade (1912)
        "Report on the Loss of the S.S. Titanic", p, 38
*/
proc format;
        value $side 'p'='Port'  's'='Starboard';
        value period 0='Early'  1='Middle'  2='Late';
        
data lifeboat;
        input  launch time5.2 side $ boat $ crew men women;
        total = sum(crew, men, women);
        format launch hhmm. side $side.;
        port = (side='p');
        int  = launch * port;
        select (boat);
                when ('C', 'D')  cap=47;
                when ('1', '2')  cap=40;
                otherwise        cap=65;
        end;
        label launch='Launch Time'
           side = 'Side'
                boat = 'Boat label'
                crew = 'Men of crew'
                men = 'Men passengers'
                women = 'Women and Children'
                cap = 'Boat capacity'
                total = 'Total loaded';
datalines;
 0:45  p  7   3  4  20
 0:55  p  5   5  6  30
 1:00  p  3  15 10  25
 1:10  p  1   7  3   2
 1:20  p  9   8  6  42
 1:25  p 11   9  1  60
 1:35  p 13   5  0  59
 1:35  p 15  13  4  53
 1:40  p  C   5  2  64
 0:55  s  6   2  2  24
 1:10  s  8   4  0  35
 1:20  s 10   5  0  50
 1:25  s 12   2  0  40
 1:30  s 14   8  2  53
 1:35  s 16   6  0  50
 1:45  s  2   4  1  21
 1:55  s  4   4  0  36
 2:05  s  D   2  2  40
;

proc rank out=lifeboat groups=3;
        var launch;
        ranks period;
        by side;
run;
/*
 Name:   lifeboa2.sas
 Title:  Lifeboats on the Titanic- data set 2
 Source: Extracted from the Encyclopedia Titanica Web Site 
              http://atschool.eduweb.co.uk/phind
1. Boat (1 to 16, and four collapsible boats, A to D)
2. Launch (rank order of departing from Titanic)
3. Side (side of Titanic boat was launched from)
4. Males (just of the passengers, not including servants or crew)
5. Females (same as 4.)
6. First (first class passengers and their servants)
7. Second (second class passengers and their servants)
8. Third (third class passengers)
9. Crew
10. Other (number of lifeboat occupants (6+7+8+9) that got onto the
  lifeboat by other means, e.g. stowaway, pulled from water, jumped etc.)
11. Launch (reported launch time)
*/

proc format;
        value $side 'p'='Port'  's'='Starboard';
data lifeboa2;
        input boat $ order side $ men women class1-class3 crew other
              launch time5.2;
        format launch hhmm. side $side.;
        total=sum(of class1-class3 crew);
        label launch='Launch Time'  order='Launch order'
                boat = 'Boat label'      side='Side'
                men = 'Men passengers'   women = 'Women and Children'
                class1 = '1st Class passengers'  class2='2nd Class passengers'
                class3 = '3rd Class passengers'  other ='Other lifeboat occupants'
                crew = 'Men of crew'
                cap = 'Boat capacity'    total = 'Total loaded';
        port = (side='p');
        select (boat);
                when ('C', 'D')  cap=47;
                when ('1', '2')  cap=40;
                otherwise        cap=65;
        end;
cards; 
 1   5 s   3  1   5  0  0  7 0  1:10
 2  15 p   3  9   8  0  6  4 0  1:45
 3   4 s  11  8  26  0  0 13 0  1:00
 4  16 p   3 16  24  2  0 12 9  1:50
 5   2 s  13 14  27  0  0  8 2  0:55
 6   3 p   2 16  19  0  1  4 1  0:55
 7   1 s  13 12  24  1  0  3 0  0:45
 8   5 p   0 17  23  0  0  4 0  1:10
 9  10 s   9 16   6 17  3 15 0  1:30
10   7 p   5 28   9 18  6  4 0  1:20
11  12 s   7 16   6 14  5 26 0  1:35
12  10 p   1 18   0 17  2  3 1  1:30
13  13 s  15 24   1 12 26 24 2  1:40
14   8 p  10 23   5 21  7  9 4  1:25
15  13 s  23 15   1  1 36 25 0  1:40
16   . p   2 23   0  3 22 12 0  1:35
 A   . s   9  2   3  0  8  5 0  .
 B   . p  10  0   3  1  6 18 0  .
 C  17 s  13 25   2  0 36  6 4  1:40
 D   . s   6 13   8  2  9  5 3  2:05
;
/*
Title:  Mental impariment and parents SES
Source: Haberman, 1979 [p.375], from Srole etal,(1978) p.289
        also, Agresti:90, Lindsey p.99;  
*/
proc format;
        value mental 1='Well' 2='Mild' 3='Moderate' 4='Impaired'; 
        value ses    1='High' 2='2' 3='3' 4='4' 5='5' 6='Low';   
data mental;
   input ses mental count @@;
        label ses="Parents SES"
           mental='Mental Impairment';
cards;
1  1  64   1  2  94   1  3  58   1  4  46
2  1  57   2  2  94   2  3  54   2  4  40
3  1  57   3  2 105   3  3  65   3  4  60
4  1  72   4  2 141   4  3  77   4  4  94
5  1  36   5  2  97   5  3  54   5  4  78
6  1  21   6  2  71   6  3  54   6  4  71
;
*newsas(msdiag);
/*-------------------------------------------------------------------------*
Title:    Diagnosis of multiple sclerosis
Diagnostic classification of mulitiple sclerosis by two neurologists
for two populations.
Source:   Landis, J.R. & Koch, G.G. (1977) "The measurement of observer
          agreement for categorical data." Biometrics 33: 159-174
*--------------------------------------------------------------------------*/
proc format;
     value rating 1="Certain MS" 2="Probable" 3="Possible" 4="Doubtful MS";
data msdiag;
     do patients='Winnipeg  ', 'New Orleans';
        do N_rating = 1 to 4;
           do W_rating = 1 to 4;
              input count @;
              output;
              end;
           end;
        end;
   format N_rating W_rating rating.;
   label N_rating = 'New Orleans neurologist'
         W_rating = 'Winnipeg nurologist';
cards;
38  5  0  1
33 11  3  0
10 14  5  6
 3  7  3 10
 5  3  0  0
 3 11  4  0
 2 13  3  4
 1  2  4 14
;

*-- Agreement, separately, and conrolling for Patients;
proc freq data=msdiag;
     weight count;
     tables patients * N_rating * W_rating / norow nocol nopct agree;
run;
/*
  Title:  NASA space shuttle O-ring failures
  Source: Table 1, Dalal etal, JASA 1989, 84, 945-957, field joint data;
          Damage index from Tufte 1997.
*/
data orings;
        flt_num = _n_;
        input flight $ temp pressure fail failures damage;
        orings = 6;
        label temp='Temperature'  pressure='Leak check pressure'
                fail = 'Any failure?'  failures='Number of O-ring failures'
                damage = 'Damage index';
cards;
  1   66    50   0   0   0
  2   70    50   1   1   4
  3   69    50   0   0   0
  4   80    50   .   .   .
  5   68    50   0   0   0
  6   67    50   0   0   0
  7   72    50   0   0   0
  8   73    50   0   0   0
  9   70   100   0   0   0
41B   57   100   1   1   4
41C   63   200   1   1   2
41D   70   200   1   1   4
41G   78   200   0   0   0
51A   67   200   0   0   0
51C   53   200   1   2  11
51D   67   200   0   0   0
51B   75   200   0   0   0
51G   70   200   0   0   0
51F   81   200   0   0   0
51I   76   200   0   0   0
51J   79   200   0   0   0
61A   75   200   1   2   4
61C   58   200   1   1   4
61I   76   200   0   0   4
;
title 'Suicide Rates by Age, Sex and Method';
data suicide0;
   input sex $1 age poison cookgas toxicgas hang drown gun knife
                    jump other;
   length sexage $ 4;
   sexage=trim(sex)||trim(left(put(age,2.)));
cards;
M 10     4   0   0  247   1  17   1   6   0
M 15   348   7  67  578  22 179  11  74 175
M 20   808  32 229  699  44 316  35 109 289
M 25   789  26 243  648  52 268  38 109 226
M 30   916  17 257  825  74 291  52 123 281
M 35  1118  27 313 1278  87 293  49 134 268
M 40   926  13 250 1273  89 299  53  78 198
M 45   855   9 203 1381  71 347  68 103 190
M 50   684  14 136 1282  87 229  62  63 146
M 55   502   6  77  972  49 151  46  66  77
M 60   516   5  74 1249  83 162  52  92 122
M 65   513   8  31 1360  75 164  56 115  95
M 70   425   5  21 1268  90 121  44 119  82
M 75   266   4   9  866  63  78  30  79  34
M 80   159   2   2  479  39  18  18  46  19
M 85    70   1   0  259  16  10   9  18  10
M 90    18   0   1   76   4   2   4   6   2
F 10    28   0   3   20   0   1   0  10   6
F 15   353   2  11   81   6  15   2  43  47
F 20   540   4  20  111  24   9   9  78  47
F 25   454   6  27  125  33  26   7  86  75
F 30   530   2  29  178  42  14  20  92  78
F 35   688   5  44  272  64  24  14  98 110
F 40   566   4  24  343  76  18  22 103  86
F 45   716   6  24  447  94  13  21  95  88
F 50   942   7  26  691 184  21  37 129 131
F 55   723   3  14  527 163  14  30  92  92
F 60   820   8   8  702 245  11  35 140 114
F 65   740   8   4  785 271   4  38 156  90
F 70   624   6   4  610 244   1  27 129  46
F 75   495   8   1  420 161   2  29 129  35
F 80   292   3   2  223  78   0  10  84  23
F 85   113   4   0   83  14   0   6  34   2
F 90    24   1   0   19   4   0   2   7   0
;
*title 'Suicide rates in Germany'; 
data suicide;
        input sex $ age $ @;
        do method = 'Poison', 'Gas', 'Hang', 'Drown', 'Gun', 'Jump';
                input count @;
                output;
                end;
        input;
cards;
    M  10-20     1160     335    1524      67     512     189
    M  25-35     2823     883    2751     213     852     366
    M  40-50     2465     625    3936     247     875     244
    M  55-65     1531     201    3581     207     477     273
    M  70-90      938      45    2948     212     229     268

    F  10-20      921      40     212      30      25     131
    F  25-35     1672     113     575     139      64     276
    F  40-50     2224      91    1481     354      52     327
    F  55-65     2283      45    2014     679      29     388
    F  70-90     1548      29    1355     501       3     383
;
title 'Survival on the Titanic';

proc format;
  value class 1='1st' 2='2nd' 3='3rd' 4='crew';
  value age   0='Child' 1='Adult';
  value sex   0='Female' 1='Male';
  value surv  1='Survived' 0='Died';

data titanic;
        input survive  age  sex  @;
        format age age. class class. sex sex. survive surv.;
        do class = 1 to 4;
                input count @;
                output;
                end;
cards;
1   1    1         57        14        75       192
1   1    0        140        80        76        20
1   0    1          5        11        13         0
1   0    0          1        13        14         0
0   1    1        118       154       387       670
0   1    0          4        13        89         3
0   0    1          0         0        35         0
0   0    0          0         0        17         0
;
title 'Womens labor-force participation, Canada 1977';
/*
Source: Social Change in Canada Project, York Institute for
Social Research.
*/

proc format;
   value labor    /* labor-force participation */
      1 ='working full-time'
      2 ='working part-time'
      3 ='not working';
   value kids     /* presence of children in the household */
      0 ='Children absent'
      1 ='Children present';
   value region   /* region of Canada */
      1 ='Atlantic Canada'
      2 ='Quebec'
      3 ='Ontario'
      4 ='Prairie provinces'
      5 ='British Columbia';

data wlfpart;
   input case labor husinc children region @@;
   working = labor < 3;
   if working then
      fulltime = (labor = 1);
   /* dummy variables for region */
   r1 = (region=1);
   r2 = (region=2);
   r3 = (region=3);
   r4 = (region=4);
        label husinc="Husband's Income";
cards;
  1  3  15  1  3    2  3  13  1  3    3  3  45  1  3    4  3  23  1  3
  5  3  19  1  3    6  3   7  1  3    7  3  15  1  3    8  1   7  1  3
  9  3  15  1  3   10  3  23  1  3   11  3  23  1  3   12  1  13  1  3
 13  3   9  1  4   14  3   9  1  4   15  3  45  1  1   16  3  15  1  1
 17  3   5  1  3   18  3   9  1  3   19  3  13  1  3   20  3  13  0  3
 21  2  19  0  3   22  3  23  1  4   23  1  10  0  4   24  1  11  0  3
 25  3  23  1  3   26  3  23  1  3   27  3  19  1  3   28  3  19  1  3
 29  3  17  1  4   30  1  14  1  4   31  3  13  1  3   32  3  13  1  3
 33  3  15  1  3   34  3   9  0  3   35  3   9  0  3   36  3  19  0  3
 37  3  15  1  3   38  1  20  0  3   39  3   9  1  1   40  2   6  0  1
 41  3   9  1  5   42  2   4  1  3   43  2  28  0  3   44  3  23  1  3
 45  2   5  1  3   46  3  28  1  3   47  3   7  1  3   48  3   7  1  3
 49  3  23  1  4   50  1  15  0  4   51  2  10  1  4   52  2  10  1  4
 53  3   9  0  3   54  3   9  0  3   55  2   9  1  1   56  3  17  0  1
 57  3  23  1  1   58  3  23  1  1   59  3   9  1  3   60  3   9  1  3
 61  1   9  0  3   62  1  28  0  3   63  2  10  1  3   64  2  23  0  4
 65  3  11  1  4   66  3  15  1  3   67  3  15  1  3   68  3  19  1  3
 69  3  19  1  3   70  3  23  1  3   71  3  17  1  3   72  3  17  1  3
 73  3  17  1  3   74  3  17  1  3   75  3  17  1  3   76  2  38  1  3
 77  2  38  1  3   78  3   7  1  1   79  3  19  1  4   80  2  19  1  5
 81  1  13  0  3   82  2  15  1  3   83  1  17  1  3   84  1  17  1  3
 85  2  23  1  3   86  1  27  0  5   87  1  16  1  5   88  1  27  0  3
 89  3  35  0  3   90  3  35  0  3   91  3  35  0  3   92  2   9  1  3
 93  2   9  1  3   94  2   9  1  3   95  3  13  1  3   96  3  17  1  3
 97  3  17  1  3   98  1  15  0  3   99  1  15  0  3  100  3  15  1  3
101  1  11  0  1  102  3  23  1  1  103  3  15  1  1  104  3  15  0  5
105  2  12  0  5  106  2  12  0  5  107  3  13  1  4  108  3  19  1  3
109  3  19  1  1  110  3   3  1  1  111  3   9  1  1  112  1  17  1  1
113  3   1  1  1  114  3   1  1  1  115  2  13  1  4  116  3  13  1  4
117  3  19  0  5  118  3  19  0  5  119  1  15  0  5  120  2  30  1  3
121  3   9  1  1  122  3  23  1  1  123  1   9  0  3  124  1   9  0  3
125  3  13  1  4  126  2  13  1  3  127  3  17  1  1  128  2  13  1  4
129  2  13  1  4  130  2  19  1  3  131  2  19  1  3  132  3   3  1  3
133  1  14  0  3  134  1  14  0  3  135  1  11  1  3  136  1  11  1  3
137  2  14  1  3  138  3  13  1  3  139  3  28  1  3  140  3  28  1  3
141  3  14  1  3  142  3  14  1  3  143  3  11  1  4  144  3  13  1  4
145  3  13  1  4  146  2  11  1  1  147  2  11  1  1  148  3  19  1  5
149  1   6  0  5  150  3  28  0  5  151  1  13  0  5  152  1  13  0  5
153  3   5  0  5  154  2  28  1  5  155  2  11  1  5  156  3  23  1  5
157  2  15  1  5  158  3  13  1  5  159  1  22  0  3  160  1  15  0  3
161  3  15  1  3  162  3  15  1  1  163  1   5  1  1  164  1   1  0  4
165  1   1  0  4  166  3   9  1  1  167  3  15  1  3  168  1  13  0  3
169  3  19  1  1  170  2   8  1  5  171  1   7  1  4  172  3  19  1  3
173  3   7  1  3  174  1   9  0  3  175  1   9  0  3  176  1  24  0  3
177  3  15  1  3  178  1  13  0  3  179  3  13  0  5  180  1  13  0  5
181  1  17  1  1  182  1  16  0  1  183  1  18  0  3  184  1  18  0  3
185  3  13  0  3  186  2  15  1  5  187  3  13  1  5  188  3   7  1  5
189  1   9  1  1  190  3  23  1  5  191  3  17  1  4  192  3  15  1  5
193  3  11  1  4  194  3  17  1  4  195  3  17  1  4  196  1   5  1  4
197  1   5  1  4  198  3  26  1  3  199  1  10  0  2  200  1  11  0  2
201  1  20  1  2  202  3  13  1  2  203  3  15  1  2  204  3  28  1  2
205  2   9  1  2  206  3  19  1  2  207  3  11  1  2  208  1  11  0  2
209  3   9  1  2  210  1  10  0  2  211  3  19  1  2  212  3  13  1  2
213  1   3  0  2  214  3  15  1  2  215  3  15  1  2  216  2  17  1  2
217  3   7  1  2  218  2  15  0  2  219  3  19  1  2  220  1  16  0  2
221  3   5  0  2  222  3  11  1  2  223  3  11  1  2  224  3  19  1  2
225  3  15  1  2  226  3  15  1  2  227  3  11  1  2  228  1   5  0  2
229  2  23  1  2  230  2  23  1  2  231  3   7  1  2  232  3  13  1  2
233  1  15  0  2  234  1   5  0  2  235  3   7  1  2  236  1   6  0  2
237  1   5  1  2  238  1   5  1  2  239  3  13  1  2  240  3  13  1  2
241  3  13  1  2  242  3  13  0  2  243  3  17  1  2  244  1   6  1  2
245  3   5  1  2  246  2  19  1  2  247  1   3  1  2  248  3  23  0  2
249  3  23  0  2  250  1  15  0  2  251  3  11  0  2  252  3  23  0  2
253  3  13  1  2  254  2  23  1  2  255  1  11  0  2  256  3   9  0  2
257  1   2  0  2  258  3  15  1  2  259  3  15  0  2  260  3  15  1  2
261  3  11  1  2  262  3  11  0  2  263  3  15  1  2  
;
/*
  Name:  vision.sas
 Title:  Visual acuity in left and right eyes
Source:  Kendall and Stuart 1961 [Tables 33.2, 33.5]
*/

data women;
        input right left count @@;
        cards;
1 1 1520    1 2  266    1 3  124    1 4  66
2 1  234    2 2 1512    2 3  432    2 4  78
3 1  117    3 2  362    3 3 1772    3 4 205
4 1   36    4 2   82    4 3  179    4 4 492
;

data men;
        input right left count @@;
        cards;
1 1  821    1 2 112    1 3  85    1 4  35
2 1  116    2 2 494    2 3 145    2 4  27
3 1   72    3 2 151    3 3 583    4 4  87
4 1   43    4 2  34    4 3 106    4 4 331
;
*-- Join the two data sets;
data vision;
        set women (in=w)
            men   (in=m);
        if w then gender='F';
                  else gender='M';
title  'Race and Politics in the 1980 Presidential vote';
proc format;
  value race 0='NonWhite'
             1='White';
data vote;
        input @10 race cons @;
        do votefor='Reagan', 'Carter';
                input count @;
                output;
                end;
cards;
White    1 1    1   12
White    1 2   13   57
White    1 3   44   71
White    1 4  155  146
White    1 5   92   61
White    1 6  100   41
White    1 7   18    8
NonWhite 0 1    0    6
NonWhite 0 2    0   16
NonWhite 0 3    2   23
NonWhite 0 4    1   31
NonWhite 0 5    0    8
NonWhite 0 6    2    7
NonWhite 0 7    0    4
;
title 'von Bortkiewicz data';
data vonbort;
   input year @;
   do corps = 1 to 14;
      input deaths @;
      output;
      end;
/*  1 2  3  4   5 6  7  8    9  10 11 12 13 14 */
/*  G I II III IV V VI VII VIII IX X XI XIV XV */
cards;
75  0  0  0  0  0  0  0  1  1  0  0  0  1  0
76  2  0  0  0  1  0  0  0  0  0  0  0  1  1
77  2  0  0  0  0  0  1  1  0  0  1  0  2  0
78  1  2  2  1  1  0  0  0  0  0  1  0  1  0
79  0  0  0  1  1  2  2  0  1  0  0  2  1  0
80  0  3  2  1  1  1  0  0  0  2  1  4  3  0
81  1  0  0  2  1  0  0  1  0  1  0  0  0  0
82  1  2  0  0  0  0  1  0  1  1  2  1  4  1
83  0  0  1  2  0  1  2  1  0  1  0  3  0  0
84  3  0  1  0  0  0  0  1  0  0  2  0  1  1
85  0  0  0  0  0  0  1  0  0  2  0  1  0  1
86  2  1  0  0  1  1  1  0  0  1  0  1  3  0
87  1  1  2  1  0  0  3  2  1  1  0  1  2  0
88  0  1  1  0  0  1  1  0  0  0  0  1  1  0
89  0  0  1  1  0  1  1  0  0  1  2  2  0  2
90  1  2  0  2  0  1  1  2  0  2  1  1  2  2
91  0  0  0  1  1  1  0  1  1  0  3  3  1  0
92  1  3  2  0  1  1  3  0  1  1  0  1  1  0
93  0  1  0  0  0  1  0  2  0  0  1  3  0  0
94  1  0  0  0  0  0  0  0  1  0  1  1  0  0
;
data vonbort2;
        set vonbort;
        where corps not in (1,2,7,12);

proc freq data=vonbort2;
        tables deaths / out=horskick;
arthrit.sas     Arthritis treatment data
berkeley.sas    Berkeley Admissions data
chi2tab.sas     Generate a Chi-Square table
haireye.sas     Hair - Eye color data
icu.sas The ICU data
lifeboa2.sas    Lifeboats on the Titanic- data set 2
lifeboat.sas    Lifeboats on the Titanic
marital.sas     Pre-marital sex, extra-marital sex, and divorce
mental.sas      Mental impariment and parents SES
msdiag.sas      Diagnosis of multiple sclerosis
orings.sas      NASA space shuttle O-ring failures
suicide.sas     Suicide rates in Germany
suicide0.sas    Suicide Rates by Age, Sex and Method
titanic.sas     Survival on the Titanic
vietnam.sas     Student opinion about the war in Vietnam
vision.sas      Visual acuity in left and right eyes
vonbort.sas     von Bortkiewicz data
vote.sas        Race and Politics in the 1980 Presidential vote
wlfpart.sas     Womens labor-force participation, Canada 1977
/*
  Name:  vietnam.sas
 Title:  Student opinion about the war in Vietnam
 
From Aitken-etal:89  ``Statistical Modelling in GLIM''.
A survey of student opinion on the Vietnam War was taken at the
University of North Carolina at Chapel Hill in May 1967 and published
in the student newspaper. Students were asked to fill in ballot
papers stating which policy out of A,B,C or D they supported.
Responses were cross-classified by gender/year:

Responses:
  1 -- defeat North Vietnam by widespread bombing and land invasion
  2 -- follow the present policy
  3 -- withdraw troops to strong points and open negotiations on elections involving the Viet Cong
  4 -- immediate withdrawal of all U.S. troops
*/

proc format;
        value resp   1='Defeat North Vietnam'  2='Present policy'
                     3='Negotiate'             4='Immediate withdrawal';
        value letter 1='A'  2='B'  3='C'  4='D';
        value yr     1='Freshmen'  2='Sophomore'  3='Junior'
                     4='Senior'    5='Grad student';
        value $sex  'M'='Male'    'F'='Female';

data vietnam;
        do sex = 'F', 'M';
                do year = 1 to 5;
                        do response = 1 to 4;
                                input count @;
                                output;
                                end;
                        end;
                end;
        label year= 'Year of Study'
              sex = 'Sex';
cards;
 13  19  40   5  
  5   9  33   3  
 22  29 110   6  
 12  21  58  10  
 19  27 128  13  
175 116 131  17  
160 126 135  21  
132 120 154  29  
145  95 185  44  
118 176 345 141  
;
title 'Generate a Chi-Square table';
options ls=110;
%let df=  1 to 20,
         25 to 50 by 5,
         60 to 100 by 10,
        200 to 400 by 50;
%let np=12;   %*-- Number of p-values;
%let pvalue=.25 .10 .09 .08 .07 .06 .05 .025 .01 .005 .0025 .001;
%let divisor = 1;     *-- Chi-square values;
*let divisor = df;    *-- Chi-square / df values;

data chisq;
        array pr(*) p1-p&np   (&pvalue); 
        keep df p c;
        label p='Upper Tail Prob'
                        df='df';
   do k = 1 to dim(pr);        /* for each P-value */
                p = 100*pr(k);
                do df = &df;
         c = cinv(1-pr(k), df) / &divisor;
                        output;
         end;
      end;

proc sort;
        by df;
proc transpose out=chi2tab;
        by df; var c;

%*-- Generate variable labels;
%macro lab(k, prefix, values);
        %do i=1 %to &k;
                &&prefix.&i = "%scan(&values,&i,%str( ))"
                %end;
%mend;

proc format;
        picture cvalue low-<100  = '00.00'
                            100-high  = '000.0';
data chi2tab;
        set chi2tab;
        drop _name_;
        format col1-col&np cvalue. df 4.;
        label   %lab(&np, COL, &pvalue);
proc print label;
        id df;
*snip;
*sas2tex(data=chi2prob, 
        id=df, var=col1-col12,
        texalone=Y, tabenv=\small);
/*-------------------------------------------------------------------*
  *    Name: addvar.sas                                               *
  *   Title: Added variable plots for logistic regression             *
  *     Doc: http://www.math.yorku.ca/SCS/vcd/addvar.html             *
  *-------------------------------------------------------------------*
  *  Author:  Michael Friendly            <friendly@yorku.ca>         *
  * Created: 15 Apr 98 11:16                                          *
  * Revised:  6 Nov 2000 13:04:16                                     *
  * Version: 1.1                                                      *
  *  1.1   Fixed validvarname for V7+                                 *
  *                                                                   *
  * From ``Visualizing Categorical Data'', Michael Friendly (2000)    *         
  *-------------------------------------------------------------------*/
 /*Description:
 
 The ADDVAR macro produces added variable plots (TYPE=AVP) for the
 effect of adding a variable to a logistic regression model, or a
 constructed variable plot (TYPE=CVP) for the effect of transforming
 a variable.
 
 For a model with a binary response, Y, and predictors in the list
 X, an added variable plot may be constructed for a new predictor,
 Z, by plotting the residuals of Y given X against the residuals
 of Z given X.  A linear relation in this plot indicates that Z
 should be included in the model, but observations with extreme
 Z-residuals would be highly influential in this decision.  A line
 fitted to this plot should have an intercept approximately zero,
 and a slope approximating the coefficient of Z in the full model.

 The constructed variable plot is designed to detect nonlinear dependence
 of Y on one of the X variables, say X[j].  It is an added variable
 plot for the constructed variable, Z = X[j] log X[j].
 
 Usage:

 The addvar macro is called with keyword parameters.  The X=, Y=,
 and Z= parameters must be specified.  A TRIALS= variable may be
 specified if the data are in events/trials form.  The arguments
 may be listed within parentheses in any order, separated by commas.
 For example:
 
        %addvar(data=icu, y=Died,  x=age admit cancer uncons, z=Systolic,
                id=patient, loptions=order=data noprint);

 This gives an AVP for the variable Systolic, when added to the X=
 variables in the model predicting Y=DIED.
 
Parameters:

* DATA=   Specifies the name of the input data set to be analyzed.
              [Default: DATA=_LAST_]

* Y=              Specifies the name of the response variable.

* TRIALS=     Name of trials variable for event/trial

* X=              Specifies the names of the predictor variables in the model

* Z=          Name of the added variable

* ID=         Name of observation ID variable (char)

* LOPTIONS=   Options for PROC LOGISTIC [Default: LOPTIONS=NOPRINT]

* SMOOTH=     Lowess smoothing parameter [Default: SMOOTH=0.5]

* SUBSET=     Subset of points to label [Default: SUBSET=ABS(STUDRES)>2]

* OUT=        Specifies the name of the output data set [Default: OUT=_RES_]

* SYMBOL=     Plotting symbol for points [Default: SYMBOL=DOT]

* INTERP=     Interpolation options for points [Default: INTERP=RL CI=RED]

* TYPE=       Type of plot: AVP or CVP [Default: TYPE=AVP]

* NAME=       Name of graph in graphic catalog [Default: NAME=ADDVAR]

* GOUT=       Name of the graphics catalog

*/
 
%macro addvar(
        data=_last_,       /* Name of input data set                  */
        y=,                /* Name of response variable               */
        trials=,           /* Name of trials variable for event/trial  */
        x=,                /* Names of predictors                     */
        z=,                /* Name of the added variable              */
        id=,               /* Name of observation ID variable (char)  */
        loptions=noprint,  /* options for PROC LOGISTIC               */
        smooth=0.5,        /* lowess smoothing parameter              */
        subset=abs(studres)>2,   /* subset of points to label         */
        out=_res_,         /* output data set                         */
        symbol=dot,        /* plotting symbol for points              */
        interp=rl ci=red,  /* interpolation options for points        */
        type=AVP,          /* Type of plot: AVP or CVP                */
        name=addvar,       /* Name of graph in graphic catalog        */
        gout=
        );
          
        %*-- Reset required global options;
        %if &sysver >= 7 %then %do;
                %local o1 o2;
                %let o1 = %sysfunc(getoption(notes));
                %let o2 = %sysfunc(getoption(validvarname,keyword));
                options nonotes validvarname=V6;
                %end;
        %else %do;
           options nonotes;
                %end;

%let type=%upcase(&type);

%let abort=0;
%if %length(&y)=0 | %length(&x)=0
   %then %do;
      %put ERROR: The Y= and X= variables must be specified;
      %let abort=1;
      %goto DONE;
   %end;


*-- Fit the original model, get fit quantities in an ODS;
proc logistic nosimple data=&data &loptions outest=parms;
        %if %length(&trials)=0 %then %do;
           model &y         = &x;
                %end;
        %else %do;
           model &y/&trials = &x;
                %end;
   output out=_diag_ pred=p
                        difdev=difdev difchisq=difchisq c=c cbar=cbar h=h
                        resdev=resdev reschi=reschi;

proc print;
%let zl=&z Residual;
%if &type=CVP %then %do;
        *-- protect against negative values;
        proc univariate data=_diag_ noprint;
                var &z;
                output out=_min_ min=min;
        data _diag_;
                set _diag_;
                if _n_=1 then do;
                        set _min_(keep=min);
                        set parms(rename=(&z = beta));
                        end;
*               _z_ = beta * &z * log(&z);
                _z_ =  &z * log(&z);
        %let zl = Constructed &z*log(&z) Residual;
        %let z = _z_;
        %end; 

data _diag_;
   set _diag_;
   label h = 'Leverage (Hat value)'
                        studres = 'Studentized deviance residual';
   format h 3.2;
        studres = resdev / sqrt(1-h);
        drop p n;
        %if %length(&trials)=0 %then %do;
                n=1;
                %end;
        %else %do;
                n=&trials;
                %end;
        yhat = n * p;
        weight = n * p * (1-p);

proc reg data=_diag_ noprint;
        weight weight;
        model &z = &x;
        output out=&out r=zres;

*-- Find slope and intercept in the plot;               
proc reg data=&out outest=_parm_;
*       weight weight;
        model reschi = zres;

data &out;
        set &out;
        zres = zres * sqrt(weight);
        label zres="&zl"
                reschi = "&y Residual";

proc print data=&out;
        %if %length(&id) %then %do;
                id &id;
                %end;
        var &y &trials &x h reschi resdev studres zres;
        format resdev reschi studres zres 6.3;

%label(data=&out, x=zres, y=reschi, text=&id, out=_labels_, pos=-,
        subset=&subset);

*-- Label plot with slope value;
data _parm_;
        set _parm_(keep=zres);
   xsys='1'; ysys='1';
   length text $14 function $8;
   x = 2;   y=4; position='F';
   function = 'LABEL'; color='red';
   text = 'Slope: ' || left(put(zres,7.3));  output;
        %if &type=CVP %then %do;
        power = round(1+zres, 0.5);
        position='C';
   text = 'Power: ' || left(put(power,9.1));  output;
                %end;

data _cross_;
        xsys='2'; ysys='2'; color='red';
        x =0; y=0;  function='move'; output;
        xsys='7';  x= -5; function='draw'; output;
        xsys='7';  x=+10; function='draw'; output;
        xsys='2';  x=  0; function='move'; output;
        ysys='7';  y= -5; function='draw'; output;
        ysys='7';  y=+10; function='draw'; output;

data _labels_;
        length text $14;
        set _labels_ _parm_ _cross_;

%if &smooth>0 %then %do;
        %lowess(data=&out, x=zres, y=reschi, gplot=NO, pplot=NO,
                outanno=_smooth_, silent=YES, f=&smooth, line=20);
        
        data _labels_;
                set _labels_ _smooth_;
        %end;

proc gplot data=&out;
        plot reschi * zres / 
                anno=_labels_ vaxis=axis1 frame vm=1 hm=1
                name="&name" des="Added variable plot for &z in &data";
        symbol1 v=&symbol c=black i=&interp;
        axis1 label=(a=90) offset=(2);
run; quit;
%done:
        %*-- Restore global options;
        %if &sysver >= 7 %then %do;
                options &o1 &o2;
                %end;
        %else %do;
           options notes;
                %end;

%mend;

/*-------------------------------------------------------------------*
  *    Name: dummy.sas                                                *
  *  Title:  Macro to create dummy variables                          *
  *     Doc: http://www.math.yorku.ca/SCS/vcd/dummy.html              *
  *-------------------------------------------------------------------*
  *  Author:  Michael Friendly            <friendly@yorku.ca>         *
  * Created: 03 Feb 98 11:32                                          *
  * Revised: 06 Aug 98 17:12                                          *
  * Version: 1.2                                                      *
  * 1.1  Added FULLRANK parameter                                     *
  * 1.2  Now handles multiple VARiables                               *
  *                                                                   *
  * From ``Visualizing Categorical Data'', Michael Friendly (2000)    *         
  *-------------------------------------------------------------------*/
 /*Description:

 Given a character or discrete numerical variable, the DUMMY macro creates
 dummy (0/1) variables to represent the levels of the original variable
 in a regression model.  If the original variable has c levels, (c-1)
 new variables are produced (or c variables, if FULLRANK=0)
 
Usage:

 The DUMMY macro takes the following named parameters.  The arguments
 may be listed within parentheses in any order, separated by commas.
 For example:
 
   %dummy(var=sex group,prefix=);

 
Parameters:

* DATA=    The name of the input dataset.  If not specified, the most
           recently created dataset is used.
* OUT=     The name of the output dataset.  If not specified, the new
           variables are appended to the input dataset.
* VAR=     The name(s) of the input variable(s) to be dummy coded.  Must be
           specified.  The variable(s) can be character or numeric.
* PREFIX=  Prefix(s) used to create the names of dummy variables.  The
           default is 'D_'.  
* NAME=    If NAME=VAL, the dummy variables are named by appending the
           value of the VAR= variable to the prefix.  Otherwise, the
                          dummy variables are named by appending numbers, 1, 2, ...
                          to the prefix.  The resulting name must be 8 characters or
                          less.
* BASE=    Indicates the level of the baseline category, which is given
           values of 0 on all the dummy variables.  BASE=_FIRST_ 
                          specifies that the lowest value of the VAR= variable is the
                          baseline group; BASE=_LAST_ specifies the highest value of
                          the variable.  Otherwise, you can specify BASE=value to
                          make a different value the baseline group.
* FULLRANK=  0/1, where 1 indicates that the indicator for the BASE category
           is eliminated.  

Example:

 With the input data set,
 
  data test;
    input y group $ @@;
   cards;
   10  A  12  A   13  A   18  B  19  B  16  C  21  C  19  C  
   ;

 The macro statement:

         %dummy ( data = test, var = group) ;

 produces two new variables, D_A and D_B.  Group C is the baseline
 category (corresponding to BASE=_LAST_)

                OBS     Y    GROUP    D_A    D_B
                        1     10      A       1      0
                        2     12      A       1      0
                        3     13      A       1      0
                        4     18      B       0      1
                        5     19      B       0      1
                        6     16      C       0      0
                        7     21      C       0      0
                        8     19      C       0      0

*/
 
%macro dummy( 
        data=_last_ ,    /* name of input dataset */
        out=&data,       /* name of output dataset */
        var= ,           /* variable to be dummied */
        base=_last_,     /* base category */
        prefix = D_,     /* prefix for dummy variable names */
        name  = VAL,     /* VAL: variable names are D_value */
        fullrank=1 
        );

   %if (%length(&var) = 0) %then %do;
       %put ERROR: DUMMY: VAR= must be specified;
       %goto done;
       %end;

%let abort = 0;
%let base = %upcase(&base);
%let name = %upcase(&name);

%if %upcase(&data) = _LAST_ %then %let data = &syslast;
%if %upcase(&data) = _NULL_ %then %do;
        %put ERROR: There is no default input data set (_LAST_ is _NULL_);
        %goto DONE;
        %end;
        
%let prefix = %upcase(&prefix);

%local j vari;
%let j=1;
%let vari= %scan(&var,    &j, %str( ));
%let pre = %scan(&prefix, &j, %str( ));

options nonotes;

%if &out ^= &data %then %do;
        data &out;
                set &data;
        %end;
        
%do %while(&vari ^= );

*-- determine values of variable to be dummied;
proc summary data = &out nway ;
     class &vari ;
     output out = _cats_ ( keep = &vari ) ;
%if &syserr > 4 %then %let abort=1; %if &abort %then %goto DONE;

        %if &fullrank %then %do;
        *-- Eliminate the base category;
        data _cats_;
                set _cats_ end=_eof_;
                %if &base = _FIRST_ %then %str( if _n_ = 1 then delete;);
                %else %if &base = _LAST_ %then %str( if _eof_ then delete;);
                %else                    %str(if &vari = &base then delete;);
        
        run;
        %end;

data _null_ ;
 set _cats_ nobs = numvals ;

 if _n_ = 1 then do;
  call symput('abort',trim( left( put( (numvals=0), best. ) ) ) ) ;
  call symput( 'num', trim( left( put( numvals, best. ) ) ) ) ;
  end;
  call symput ( 'c' || trim ( left ( put ( _n_,     best. ) ) ),
                       trim ( left ( &vari ) ) ) ;
run ;
%if &syserr > 4 %then %let abort=1; %if &abort %then %goto DONE;

%let name = %upcase(&name);
%if "&name" = "VAL" %then %do ;
        %*-- Names by variable value;
%let vl&j =; %do k=1 %to &num; %let vl&j = &&vl&j  &pre&&c&k; %end; ;
%*put vl&j = &&&vl&j;
data &out;
        set &out ;
        
        array __d ( &num ) %do k=1 %to &num ;   &pre&&c&k
                                                        %end ; ;
        %put DUMMY: Creating dummy variables &pre&&c1 .. &pre&&c&num for &vari;
        %end ;

%else %do ;
        %*-- Numeric suffix names;
%let vl&j =; %do k=1 %to &num; %let vl&j = &&vl&j  &pre.&k; %end; ;
%*put vl&j = &&&vl&j;

data &out  ( rename = ( %do k=1 %to &num ;
                                                d&k = &pre.&k
                                                %end ; ) ) ;
        set &out ;
        %put DUMMY: Creating dummy variables &pre.1 .. &pre.&num;
        array __d ( &num ) d1-d&num ;
        %end ;

        drop j;
        do j = 1 to &num ; /* initilaize to 0 */
                __d(j) = 0 ;
        end ;

        if &vari = "&c1" then __d ( 1 ) = 1 ;  /* create dummies */

        %do i = 2 %to &num ;
                else if &vari="&&c&i" then __d ( &i ) = 1 ;
        %end;
run ;

%let j=%eval(&j+1);
%let vari = %scan(&var, &j, %str( ));
%let pre = %scan(&prefix, &j, %str( ));

%*put End of loop(&i): vari = &vari  pre=&pre;
%end;  /* %do %while */

%done:
%if &abort %then %put ERROR: The DUMMY macro ended abnormally.;
options notes;
%mend dummy ;
/*-------------------------------------------------------------------*
  *    Name: halfnorm.sas                                             *
  *   Title: Half normal plot for generalized linear models           *
  *     Doc: http://www.math.yorku.ca/SCS/vcd/halfnorm.html           *
  *-------------------------------------------------------------------*
  *  Author:  Michael Friendly            <friendly@yorku.ca>         *
  * Created: 08 Nov 1998  9:51                                        *
  * Revised:  9 Nov 2000 13:36:44                                     *
  * Version: 1.1                                                      *
  *  1.1   Fixed make ... noprint for V7+                             *
  *                                                                   *
  * From ``Visualizing Categorical Data'', Michael Friendly (2000)    *         
  *-------------------------------------------------------------------*/
 /* Description:
 
 The HALFNORM macro plots the ordered absolute values of residuals
 from a generalized linear model against expected values of normal
 order statistics.  A simulated envelope, correponding to an
 approximate 95% confidence interval, is added to the plot to aid
 assessment of whether the distribution of residuals corresponds
 to a good-fitting model.

Usage:

 The HALFNORM macro is called with keyword parameters.  The RESP=
 and MODEL= parameters are required.  The arguments may be listed
 within parentheses in any order, separated by commas. For example:

        %halfnorm(resp=count, class=sex response, model=sex|year|response@2);

 
Parameters:

* DATA= Specifies the name of the input data set to be analyzed.
            The default is the last data set created.

* Y=
* RESP=     Specifies the name of the response variable to be analyzed

* TRIALS=   The name of a trials variable, for DIST=BIN, with the data
            in events/trials form.

* MODEL=    Specifies the model formula, the right-had-side of the
            MODEL statement.  You can use the | and @ shorthands.

* CLASS=    Names of any class variables in the model.

* DIST=     Error distribution.  [Default: DIST=NORMAL].

* LINK=     Link function.  The default is the canonical link for the
            DIST= error distribution.

* OFFSET=   The name(s) of any offset variables in the model.

* MOPT=     Other model options (e.g., NOINT)

* FREQ=     The name of a frequency variable, when the data are in grouped
            form.

* ID=       The name of a character variable used as an observation
            identifier in the plot.

* OUT=          Specifies the name of the output data set. The output data
            set contains the input variables, absolute residuals (_ARES_),
                                half-normal expected value (_Z_),
                                [Default: _RES_].

* LABEL=    Specifies whether and how to label observations in the plot.
            LABEL=ALL means that all observations are labelled with the
                                ID= variable value; LABEL=NONE means that no observations are
                                labelled; LABEL=ABOVE means that observations above the mean
                                of the simulations are labelled; LABEL=TOP n means that the
                                highest n observations are labelled. [Default: TOP 5]
                                
* SEED=     Specifies the seed for the random number generators. SEED=0
            (the default) uses the time-of-day as the seeed, so a
                                different set of simulated observations is drawn each time
                                the program is run.

* RES=      The type of residual to plot.  Possible values are:
            STRESCHI (adjusted Pearson residual), STRESDEV (adj. deviance
                                residual).
            
* NRES=     Number of simulations for the confidence envelope. [Default: 19]

* SYMBOL=   Plotting symbol for residuals. [Default: dot]

* INTERP=   Interpolation for residuals. [Default: none]

* COLOR     Color for residuals. [Default: red]

* NAME=     Graph name in graphics catalog. [Default: halfnorm]

* GOUT=     The name of the graphics catalog. [Default: GSEG]

 */
 
%macro halfnorm(
   data=_last_,    /* Name of input data set                     */
        y=,             /* Name of response variable                  */
   resp=,          /* Name of response variable                  */
        trials=,        /* Name of trials variable (dist=bin only)    */
   model=,         /* Model specification                        */
   class=,         /* Names of class variables                   */
   dist=,          /* Error distribution                         */
   link=,          /* Link function                              */
        offset=,        /* Offset variable(s)                         */
   mopt=,          /* other model options (e.g., NOINT)          */
   freq=,          /* Freq variable                              */
   id=,            /* Name of observation ID variable (char)     */
        out=_res_,      /* output data set                            */
        label=top 5,    /* NONE|ALL|ABOVE|TOP n                       */
        seed=0,         /* Seed for simulated residuals               */
        res=stresdev,   /* Type of residual to use: streschi/stresdev */
        nres=19,        /* Number of simulations for envelope         */
        symbol=dot,     /* plotting symbol for residuals              */
        interp=none,    /* interpolation for residuals                */
        color=red,      /* color for residuals                        */
        name=halfnorm,  /* graph name in graphics catalog             */
        gout=
);

%let label=%upcase(&label);
%let abort=0;
%if %length(&model) = 0 %then %do;
    %put ERROR: List of model terms (MODEL=) is empty.;
    %let abort=1;
    %goto done;
    %end;

%if %length(&resp) = 0 %then %let resp=&y;
%if %length(&resp) = 0 %then %do;
    %put ERROR: No response (RESP= or Y=) has been specified.;
    %let abort=1;
    %goto done;
    %end;

%let dist=%upcase(&dist);
%if %length(&dist) = 0 %then %do;
    %put WARNING: No distribution (DIST=) has been specified.;
    %put WARNING: GENMOD will use DIST=NORMAL.;
    %end;

%let lres=&res;
      %if %upcase(&res)=STRESDEV %then %let lres=Std Deviance Residual;
%else %if %upcase(&res)=STRESCHI %then %let lres=Std Pearson Residual;
%else %if %upcase(&res)=RESDEV   %then %let lres=Deviance Residual;
%else %if %upcase(&res)=RESCHI   %then %let lres=Pearson Residual;
%else %if %upcase(&res)=RESLIK   %then %let lres=Likelihood Residual;
%else %if %upcase(&res)=RESRAW   %then %let lres=Raw Residual;
%else %do;
    %put WARNING: Residual type &res is unknown.  Using RES=STRESDEV;
    %let res=stresdev;
         %let lres=Std Deviance Residual;
    %goto done;
        %end;

%put HALFNORM: Fitting initial model: &resp = &model.;
%let _print_=OFF;
proc genmod data=&data;
   %if %length(&class)>0 %then %do; class &class; %end;
   %if %length(&freq)>0 %then %do;  freq &freq;   %end;
        %if %length(&trials)=0 %then %do;
           model &resp         = &model /
                %end;
        %else %do;
           model &resp/&trials = &model /
                %end;
        %if %length(&dist)>0 %then %do;  dist=&dist %end;
        %if %length(&link)>0 %then %do;  link=&link %end;
        %if %length(&offset)>0 %then %do; offset=&offset  %end;
        %if %length(&mopt)>0 %then %do;  %str(&mopt) %end;
                obstats residuals;
  make 'obstats'  out=_obstat_ %if &sysver<7 %then noprint;;
  run;

%*-- Find variables listed in model statment;
data _null_;
        length xv $200;
        xv = translate("&model", '    ', '(|)*');
        at= index(xv,'@');
        if at then xv=substr(xv,1,at-1);
        call symput('xvars', trim(left(xv)));
run;
%*put xvars=&xvars;

%*-- Generate simulated response values from the error distribution;
data _obstat_;
   merge &data(keep=&resp &trials &freq &class &xvars &offset &id)
              _obstat_(keep=pred &res);
        array _y_{&nres} _y1 - _y&nres;
        drop i seed;
        retain seed &seed;
        do i=1 to dim(_y_);
                %if %substr(&dist,1,1)=N %then %do;
                        call rannor(seed, _y_[i]);
                        _y_[i] = pred + _y_[i];
                        %end;
                %else %if %substr(&dist,1,1)=B %then %do;
                        n=&trials;
                        call ranbin(seed, n, pred, _y_[i]);
                        %end;
                %else %if %substr(&dist,1,1)=P %then %do;
                        call ranpoi(seed, pred, _y_[i]);
                        %end;
                %else %if %substr(&dist,1,1)=G %then %do;
                        call rangam(seed, pred, _y_[i]);
                        %end;
                end;
run;

%if %length(&id)=0 %then %do;
%put WARNING: No ID= was given.  Using observation number.;
data _obstat_;
        set _obstat_;
        _id_ = left(put(_n_,10.));
run;
%let id=_id_;
%end;

%put HALFNORM: Generating &nres simulated residual sets...;
%if &sysver >= 7 %then %do;
        ods listing exclude all;
        %end;
%do i=1 %to &nres;
options nonotes;
%*put Generating residual set &i;
        
proc genmod data=_obstat_;
   %if %length(&class)>0 %then %do; class &class; %end;
   %if %length(&freq)>0 %then %do;  freq &freq;   %end;
        %if %length(&trials)=0 %then %do;
           model _y&i         = &model /
                %end;
        %else %do;
           model _y&i/&trials = &model /
                %end;
        %if %length(&dist)>0 %then %do;  dist=&dist %end;
        %if %length(&link)>0 %then %do;  link=&link %end;
        %if %length(&offset)>0 %then %do; offset=&offset  %end;
        %if %length(&mopt)>0 %then %do;  %str(&mopt) %end;
                obstats residuals;
  make 'obstats'  out=_hres&i(keep=&res rename=(&res=res&i));
  run;

%end;  /* End %do i */
%let _print_=ON;
%if &sysver >= 7 %then %do;
        ods listing exclude none;
        %end;

%*-- Merge residuals, calculate absolute values;
data _obstat_;
        merge _obstat_
        %do i=1 %to &nres;
                _hres&i
                %end;
                ;
        drop i _y1-_y&nres;
        array _res_{&nres}  res1-res&nres;
        do i=1 to dim(_res_);
                if _res_[i]^=. then _res_[i] = abs(_res_[i]);;
                end;
        _ares_ = abs(&res);

proc sort data=_obstat_;
        by _ares_;

%*-- Sort each set of residuals;
proc iml;
start sortcols(X);
        *-- Sort columns, allowing for missing values;
        do i=1 to ncol(X);
                xi = x[,i];
                if any(xi=.) then do;
                        mi = xi[loc(xi=.),];
                        xi = xi[loc(xi^=.),];
                        end;
                else free mi;
                t = xi; r = rank(xi);
                t[r] = xi;
                x[,i] = mi//t;
                end;
        finish;

start symput(macnm,scal);
  *-- give macro variable &"macnm" the value of the scalar ;
  call execute('%let ',macnm,'=',char(scal),';');
finish;

        use _obstat_;
        read all var( "res1" : "res&nres" ) into X;
        nc=0;
        do i=1 to ncol(X);
                if ^all(X[,i]=.) then do;
                        Y = Y || X[,i];
                        nc = nc+1;
                        end;
                end;
        run symput('nc', nc);
        run sortcols(Y);
        create _sorted_  from Y;
        append from Y;
        quit;
%put NOTE: There are &nc sorted columns of simulated residuals;
%if &nc=0 %then %do;
        %let abort=1;
        %goto done;
        %end;

data _obstat_;
        merge _obstat_ _sorted_;
        array _res_{*} col1-col&nc;
        drop res1-res&nres;
        
        resmin = min(of col1-col&nc);           
        resmax = max(of col1-col&nc);
        resmean = mean(of col1-col&nc);         

*proc print data=_obstat_(obs=20);
run;
                
options notes;
data &out;
        set _obstat_ nobs=nobs end=eof;
        drop col1-col&nres;
        _z_ = probit((_n_ + nobs - .125)/(2*nobs + .5));
        label _z_='Expected value of half normal quantile'
                _ares_="Absolute &lres";
        if eof then call symput('nobs', left(put(nobs,best8.))); 
run;
options nonotes;

%if &label ^= NONE %then %do;
        %if &label=ALL   %then %let subset=1;
        %if &label=ABOVE %then %let subset=(_ares_>resmax);
        %if %scan(&label,1)=TOP %then %do;
                %let which = %scan(&label,2);
                %if %length(&which)=0 %then %let which=5;
                %let subset = (_n_ > %eval(&nobs-&which));
                %end;
%label(data=&out, x=_z_, y=_ares_, text=&id, out=_labels_, pos=4,
        subset=&subset, xoff=-.04);
%end;

proc gplot data=&out;
        plot _ares_  * _z_ = 1
             resmean * _z_ = 2 
             resmin  * _z_ = 3 
             resmax  * _z_ = 3 / overlay  
                vaxis=axis1 frame vm=1 hm=1 
                %if &label ^= NONE %then %do;
                anno=_labels_
                %end;
                name="&name" des="Half normal plot for &resp in &data";
        symbol1 v=&symbol c=&color i=&interp;
        symbol2 v=none    c=black  i=join;
        symbol3 v=none    c=gray60 i=join l=3;
        axis1 label=(a=90);
run; quit;

 /*------------------------------------*
  | Clean up datasets no longer needed |
  *------------------------------------*/
proc datasets nofs nolist nowarn library=work memtype=(data);
    delete _obstat_
        %do i=1 %to &nres;
                _hres&i
                %end;
         ;
         run; quit;

%done:
options notes;
%if &abort %then %put ERROR: The HALFNORM macro ended abnormally.;
%mend;

/*-------------------------------------------------------------------*
  *    Name: mosaic.sas                                               *
  *   Title: Macro interface for mosaic displays                      *
  *     Doc: http://www.math.yorku.ca/SCS/vcd/mosaic.html             *
  *-------------------------------------------------------------------*
  *  Author:  Michael Friendly            <friendly@yorku.ca>         *
  * Created:   9 Sep 1997 17:04:26                                    *
  * Revised:  14 Dec 1998 11:17:28                                    *
  * Version: 1.3                                                      *
  *  - added BY= variables (calling MOSPART)                          *
  *  - fixed bug with multiple BY= variables                          *
  *  - added check for non-existant vars                              *
  *                                                                   *
  * From ``Visualizing Categorical Data'', Michael Friendly (2000)    *         
  *-------------------------------------------------------------------*/
 /*Description:
 
 The MOSAIC macro provides an easily used macro interface to the
 MOSAICS, MOSAICD and MOSPART SAS/IML programs.  Using the
 SAS/IML programs directly means that you must compose a PROC IML
 step and invoke the mosaic module (or mospart, for partial mosaics).

 The MOSAIC macro may be used with any SAS data set in frequency
 form (e.g., the output from PROC FREQ).  The macro simply creates
 the PROC IML step, reads the input data set, and runs the either
 the mosaic module, the mosaicd module, or the mospart module,
 depending on the options specified.  
 
 If your data is in case form, or you wish to collapse over some
 table variables, you must use PROC FREQ first to construct the
 contingency table to be analyzed.  The TABLE macro may be used
 for this purpose.  It has the advantage of allowing formatted
 values of the table factors to be used by the mosaics program.

 Ordinarily, the program fits a model (specified by the FITTYPE=
 parameter) and displays residuals from this model in the mosaic
 for each marginal subtable specified by the PLOTS= parameter.
 However, if you have already fit a model and calculated residuals
 some other way (e.g., using PROC CATMOD or PROC GENMOD), specify
 a RESID= variable in the macro call.  The macro will then call
 the mosaicd module.
 
 If a BY= variable is specified, the macro produces one (partial)
 mosaic plot for each level of the BY variable(s).

Usage:

 The parameters for the mosaic macro are like those of the SAS/IML
 program, except:
 
* DATA=    Specifies the name of the input dataset.  The data set
           should contain one observation per cell, the variables
           listed in VAR= and COUNT=, and possibly RESID= and BY=.

* VAR=     Specifies the names of the factor variables for the 
           contingency table. Abbreviated variable lists (e.g., V1-V3) 
           are not allowed. The levels of the factor variables may be
           character or numeric, but are used `as is' in the input data.
           You may omit the VAR= variables if variable names are used in
           the VORDER= parameter.

* BY=      Specifies the names of one (or more) By variables.  Partial
           mosaic plots are produced for each combination of the levels
           of the BY= variables.  The BY= variable(s) *must* be listed among
           the VAR= variables.

* COUNT=   Specifies the names of the frequency variable in the dataset

* CONFIG=  For a user-specified model, CONFIG= gives the terms in the
           model, separated by '/'.  For example, to fit the model of
           no-three-way association, specify config=1 2 / 1 3 / 2 3,
           or (using variable names) config = A B / A C / B C.
           Note that the numbers refer to the variables after they
           have been reordered, either sorting the data set, or by the
           VORDER= parameter.

* VORDER=  Specifies either the names of the variables or their indices
           in the desired order in the mosaic.  Note that the using the
           VORDER parameter keeps the factor levels in their order in
           the input data set.

* SORT=    Specifies whether and how the input data set is to be sorted
           to produce the desired order of variables in the mosaic.
           SORT=YES sorts the data in the reverse order that they are
           listed in the VAR= paraemter, so that the variables are 
           entered in the order given in the VAR= parameter.  Otherwise,
           SORT= lists the variable names, possibly with the DESENDING
           or NOTSORTED options in the reverse of the desired order.
           e.g., SORT=C DESCENDING B DESCENDING A.  The default is
           SORT=YES, unless VORDER= has been specified.

* RESID=   Specifies that a model has already been fit and that externally
           calculated residuals are contained in the variable named 
           by the RESID= parameter.
 */
 
%macro mosaic(
   data=_last_,     /* Name of input dataset                        */
   var=,            /* Names of all factor variable                 */
   count=count,     /* Name of the frequency variable               */
   by=,             /* Name(s) of BY variables                      */
   fittype=joint,   /* Type of models to fit                        */
   config=,         /* User model for fittype='USER'                */
   devtype=gf,      /* Residual type                                */
   shade=2 4,       /* shading levels for residuals                 */
   plots=,          /* which plots to produce                       */
   colors=blue red, /* colors for + and - residuals                 */
   fill=HLS HLS,    /* fill type for + and - residuals              */
   split=V H,       /* split directions                             */
   vorder=,         /* order of variables in mosaic                 */
   htext=1.5,       /* height of text labels                        */
   font=,           /* font for text labels                         */
   title=,          /* title for plot(s)                            */
   space=,          /* room for spacing the tiles                   */
   cellfill=,       /* write residual in the cell?                  */
   vlabels=,        /* Number of variable names used as plot labels */
   sort=,           /* Pre-sort variables?                          */
   resid=,          /* Name of residual variable                    */
        fuzz=
   );


%if %length(&var)=0 & %length(&vorder)>0 %then %do;
        %if %verify(&vorder, %str(0123456789 ))>0 %then %let var=&vorder;
        %end;

%if %length(&var)=0 %then %do;
   %put ERROR: You must specify the VAR= classification variables;
   %goto done;
   %end;

%if %upcase(&data)=_LAST_ %then %let data = &syslast;

%if %length(&sort)=0 %then %do;
        %if %length(&vorder)>0
                %then %let sort=NO;
                %else %let sort=YES;
        %end;

%let sort=%upcase(&sort);
%if &sort^=NO %then %do;
   %if &sort=YES %then %let sort=%reverse(&var);
   proc sort data=&data;
      by &sort;
%end;
   
%if %upcase(&fittype)=USER and %length(&config)=0 %then %do;
   %put ERROR: You must specify the USER model with the CONFIG= argument;
   %goto done;
   %end;
   
%if %length(&config) %then %do;
   data _null_;
      length config $ 200;
      config = "&config";
      config = translate(config, ',', '/');
      call symput('config', trim(config));
   run;
   %*put config: &config;
%end;

%*-- Get variable labels for use as vnames;
        %*** use summary to reorder the variables in order of var list;
/*
proc summary data=&data(firstobs=1 obs=1);
        id &var;
        output out=_tmp_(drop=_TYPE_ _FREQ_);
proc contents data=_tmp_ out=_work_(keep=name type label npos) noprint;
proc sort data=_work_;
        by npos;

data _null_;
        set _work_;
        call symput("name"||left(put(_n_,5.)),trim(name));
        call symput("type"||left(put(_n_,5.)),put(type,1.));
        call symput("lab"||left(put(_n_,5.)),trim(label));
*/

        %*--Becuase of the large number of modules loaded, it may be
            necessary to adjust the symsize value;
proc iml  symsize=256  /* worksize=10000 */;
   reset storage=mosaic.mosaic;
   load module=_all_;
   *include mosaics(mosaics); 
   %if %length(&resid)>0 %then %do;
      %include mosaics(mosaicd);
      %end;
        %if %length(&by)>0 %then %do;
                %include mosaics(mospart);
                %end;

start str2vec(string);
        *-- String to character vector;
   free out;
   i=1;
   sub = scan(string,i,' ');
   do while(sub ^=' ');
      out = out || sub;
      i = i+1;
      sub = scan(string,i,' ');
   end;
        return(out);
        finish;

   vnames = str2vec("&var");          *-- Preserve case of var names;
        vars = t(contents("&data"));
        ok=1; 
        vn=upcase(vnames) || upcase("&count");
        %if %length(&by)>0 %then %do;
                vn = vn || upcase(str2vec("&by"));
                %end;
        if ncol(union(vars, vn)) > ncol(vars) then ok=0;

        do;
        if ok=0 then do;
                print 'One or more variables are not contained in the input data set';
                print "Data set &data contains", vars;
                print "You asked for" vn;
                goto done; 
                end;

   %*-- Read and reorder residuals if specified;
   %if %length(&resid)>0 %then %do;
      vn = vnames;
      run readtab("&data","&resid", vn, dev,   lev, ln);
                if any(dev = .) then dev[loc(dev=.)] = 0;
      %if %length(&vorder) %then %do;
         vorder  = { &vorder };
         *-- marg bug workaround: subtract min value, then add back in;
         mdev = min(dev);
         dev = dev - mdev;
         run reorder(lev, dev, vn, ln, vorder);
         dev = dev + mdev;
         %end;
   %end;

   %*-- Read and reorder counts;
   run readtab("&data","&count", vnames, table, levels, lnames);

   %if %length(&vorder) %then %do;
      vorder  = { &vorder };
      run reorder(levels, table, vnames, lnames, vorder);
      %end;

   shade={&shade};
   colors={&colors};
   filltype={&fill};
   split={&split};

   htext=&htext;
   title = "&title";
   
   %if %length(&space)>0    %then %do; space={&space};   %end;
   %if %length(&font)>0     %then %do; font = "&font";   %end;
   %if %length(&fuzz)>0     %then %do; fuzz = "&fuzz";   %end;
   %if %length(&vlabels)>0  %then %do; vlabels=&vlabels; %end;
   %if %length(&cellfill)>0 %then %do; cellfill="&cellfill";  %end;


   %if %length(&resid)>0 %then %do;
      run mosaicd(levels, table, vnames, lnames, dev, title); 
      %end;

   %else %do;
      fittype = "&fittype";
      devtype = "&devtype";
      %if %length(&config)>0 %then %do;
         config=t({&config});
         %end;

                %if %length(&by)>0 %then %do;
                        %if %verify(&by, %str(0123456789 ))>0 %then %let by={&by};
                        %*put verify: %verify(&by, %str(0123456789 ));
                        show levels vnames lnames;
                        run mospart(levels, table, vnames, lnames, title, &by); 
                        %end;

                %else %do;
                        %if %length(&plots)=0 %then %do;
                                plots = 1:nrow(vnames);
                                %end;
                        %else %do;
                                %if %index(&plots,:)
                                        %then %str(plots = &plots;);
                                        %else %str(plots = {&plots};);
                                %end;
                        run mosaic(levels, table, vnames, lnames, plots, title);
                        %end;
                %end;
done:
        end;
   quit;
%done:

%mend;

%macro reverse(list);
   %local result i v;
   %let result =;
   %let i = 1;
   %let v = %scan(&list,&i,%str( ));
   %do %while (%length(&v) > 0);
      %let result = &v &result;
      %let i = %eval(&i + 1);
      %let v = %scan(&list,&i,%str( ));
      %end;
   &result
%mend;
/*-------------------------------------------------------------------*
  |     Name: power2x2.sas                                            |
  |    Title: Power for testing two independent proportions           |
  |      Doc: http://www.math.yorku.ca/SCS/vcd/power2x2.html          |
  | ------------------------------------------------------------------|
  |    Procs: print tabulate sort plot gplot                          |
  |  Macdefs: power2x2                                                |
  | ------------------------------------------------------------------|
  | Original: Modified from POWER2x2.SAS by SAS Institute             |
  |   Author: Michael Friendly               <friendly@yorku.ca>      |
  |  Created: 12 May 1999 10:16:12                                    |
  |  Revised: 19 Aug 1999 09:33:12                                    |
  |  Version: 1.1                                                     |
  *                                                                   *
  * From ``Visualizing Categorical Data'', Michael Friendly (2000)    *         
  *-------------------------------------------------------------------*/
 /* Description:

The POWER2X2 macro computes the power of a test comparing proportions
from two, equal-sized, independent samples.  Power is given for
various sizes of the total sample, or required sample size is given
for various power values, allowing you to pick the sample size that
achieves the desired power.                                                                      

Usage:

 The POWER2X2 macro takes 9 keyword arguments.  You must supply the
 DIFF= parameter.  By default the macro computes power for a range
 of sample sizes (given by NMIN= and NMAX=).  Alternatively, you may
 specify a range of power values (given by POWER=) for which the
 required sample size is calculated.

Parameters:

* P1=.5      Specifies an estimate of the "success" rate in one
                            group, the baseline group. [Default: P1=.50]

* DIFF=      Specifies the difference in the proportions that you 
             want to detect.  This is the specification of the 
             alternative hypothesis at which power is computed.
                            The difference MUST be specified; there is NO default.
                                 You may specify a list of values separated by commas, a
                                 range of the form x TO y BY z, or a combination of these.
             However, you must surround the DIFF= value with
             %STR() if any commas appear in it.  For example,

                     diff=.10 to .30 by .05
                     diff=%str(.10, .13, .20)

* ALPHA=.05  Specifies the significance level or size of the test. 
             It is a decimal value less that 1.  
             For example, ALPHA=.05 sets the probability of a Type
             1 error at 0.05.  You may specify a single value, or
                                 a list of values separated by commas, or a range of the
                                 form x TO y by z.  [Default: ALPHA=.05]

* POWER=     Values of power for sample size calculation.
             You may specify a list of values separated by commas, a
                            range of the form x TO y BY z, or a combination of these,
                                 as in a DO statement.
             However, you must surround the POWER= value with
             %STR() if any commas appear in it.

* NMIN=10    Specifies the minimum total sample size at which power 
             will be computed.  [Default: NMIN=10]

* NMAX=200   Specifies the minimum total sample size at which power
             will be computed.  [Default: NMAX=200]
             To get power for a single total sample size, set NMIN and
             NMAX to half of the total sample size.

* PLOT=      is a specification for plotting the results, in the form
             Y * X or Y * X = Z, where X, Y, and Z may be any of the
                                 variables N, DIFF, P2, POWER or OR.  No plots are produced 
                                 if PLOT=  is blank.  [Default: PLOT=POWER * N=DIFF]

* PLOTBY=    is another variable in the OUT= data set.  Separate plots are
             drawn for each level of the PLOTBY= variable.

* OUT=       The name of the output data set. [Default: OUT=_POWER_]

Example:

                %power2x2( p1=.6,  diff=.10 to .20 by .05,  nmin=50);

 With the settings above, the expected baseline success rate is 60%.
 Power for detecting a difference of 10-20% in the two proportions will
 be computed for a .05 level test and for sample sizes ranging from
 50 to 200.

 PRINTED OUTPUT:
 Using the settings shown, the following output is generated:


                 Power for testing two independent proportions
           Two-tailed test, alpha=.05, p1=0.6 diff=.10 to .20 by .05

                   -----------------------------------------
                   |                  |     Diff p1-p2     |
                   |                  |--------------------|
                   |                  | 0.1  | 0.15 | 0.2  |
                   |------------------+------+------+------|
                   |Total Sample Size |      |      |      |
                   |------------------|      |      |      |
                   |50                | 0.116| 0.209| 0.353|
                   |------------------+------+------+------|
                   |60                | 0.129| 0.242| 0.410|
                   |------------------+------+------+------|
                   |70                | 0.143| 0.274| 0.465|
                   |------------------+------+------+------|
                   |80                | 0.156| 0.306| 0.516|
                   |------------------+------+------+------|
                   |90                | 0.170| 0.337| 0.564|
                   |------------------+------+------+------|
                   |100               | 0.184| 0.368| 0.609|
                   |------------------+------+------+------|
                   |110               | 0.198| 0.398| 0.650|
                   |------------------+------+------+------|
                   |120               | 0.211| 0.428| 0.688|
                   |------------------+------+------+------|
                   |130               | 0.225| 0.456| 0.722|
                   |------------------+------+------+------|
                   |140               | 0.239| 0.484| 0.754|
                   |------------------+------+------+------|
                   |150               | 0.252| 0.511| 0.782|
                   |------------------+------+------+------|
                   |160               | 0.266| 0.537| 0.807|
                   |------------------+------+------+------|
                   |170               | 0.280| 0.562| 0.830|
                   |------------------+------+------+------|
                   |180               | 0.293| 0.586| 0.851|
                   |------------------+------+------+------|
                   |190               | 0.306| 0.609| 0.869|
                   |------------------+------+------+------|
                   |200               | 0.320| 0.631| 0.885|
                   -----------------------------------------

Details:

 Hypotheses in the test are:                                          

        H0: p1 = p2                                                       
        Ha: p1 ne p2                                                      

 where p1 and p2 are the success probabilities in the two
 populations.  The Pearson chi-square statistic tests the null
 hypothesis (H0) against the alternative hypothesis (Ha) and is
 available in the FREQ procedure when the CHISQ option is specified
 on the TABLES statement.
                                                                      
 The power is the probability of rejecting H0 and is a function of
 the true difference in proportions.  Power is often computed
 assuming many different settings of the true proportions.  The type
 2 error rate (denoted beta) is the probability of accepting H0 for
 some non-zero true difference and is equal to 1-power.  The power
 and beta are computed for a range of total sample sizes at a
 particular alternative hypothesis that you specify.  It is assumed
 that the total sample size will be split equally between the two
 samples.

References:

* Agresti, A. (1990), Categorical Data Analysis, New York: John Wiley
       & Sons, Inc.
* Agresti, A. (1996), An Introduction to Categorical Data Analysis,
            New York: John Wiley & Sons, Inc.

 =*/

/************************************************************************

                                POWER2x2

   DISCLAIMER:
     THIS INFORMATION IS PROVIDED BY SAS INSTITUTE INC. AS A SERVICE TO
     ITS USERS.  IT IS PROVIDED "AS IS".  THERE ARE NO WARRANTIES,
     EXPRESSED OR IMPLIED, AS TO MERCHANTABILITY OR FITNESS FOR A
     PARTICULAR PURPOSE REGARDING THE ACCURACY OF THE MATERIALS OR CODE
     CONTAINED HEREIN.

   REQUIRES:
     POWER2x2 requires only Version 6 base SAS Software.

   USAGE:
     POWER2x2 is a macro program.  The options and allowable values are:

   SEE ALSO:
     PWR2x2un -- Computes the power of a test comparing proportions from
                 two, unequally-sized, independent samples.

     POWERRxC -- Computes power for Pearson and Likelihood Ratio
                 Chi-square tests of independence in FREQ.  Handles any 
                 number of rows and columns in a two-way table.

************************************************************************/

%macro power2x2(
        p1 = .5,     /* Success probability in baseline group       */
        diff =,      /* difference in proportions to be detected    */ 
        alpha = .05, /* alpha-level of test                         */
        power=,      /* Values of power for sample size calculation */
        nmin = 10,   /* Minimum sample size to consider             */
        nmax = 200,  /* Maximum sample size to consider             */
        plot =power * n=diff,    /* plot request                    */
        plotby =,
        print =diff n power or,  /* variables to be printed         */    
        out=_power_
        );
        
%if %length(&diff)=0 %then %do;
        %put ERROR: The required difference in proportions must be specified;
        %goto done;
        %end;
        
data &out;

  p1=&p1;
  do alpha=&alpha;

  /********************** Compute power ************************/

        za = probit(alpha/2);
        %if %length(&power)>0 %then %do;
                do diff = &diff;
                        p2 = p1 + diff;
                        if (0 < p2 < 1) then do;
                                or = (p2 / (1-p2)) / (p1 / (1-p1));
                                do power = &power;
                                        zb = probit(1-power);
                                        n = 2 * ( (za+zb)**2 * (p1*(1-p1) + (p2*(1-p2))) ) / diff**2;
                                        n = round(n);
                                        output;
                                end;
                        end;
                end;
                drop za zb;
        %end;
        %else %do;    /* determine power for specified n */
                nmin=&nmin;
                nmax=&nmax;
                /* Select 'nice' range of total sample sizes */
                diforder=10**(max(floor(log10(nmax-nmin+1e-8)),1)-1);
                normlen=(nmax-nmin)/diforder;
                step=diforder*((normlen<=20)+2*(20<normlen<=40)+5*(40<normlen<=100));
                
                do n=nmin, nmin+step-mod(nmin+step,step) to nmax by step, nmax;
                        do diff = &diff;
                                /* power and beta computation */
                                p2 = p1 + diff;
                                or = (p2 / (1-p2)) / (p1 / (1-p1));
                                var = 2*( p1*(1-p1) + p2*(1-p2) ) / n;
*                               power=1-probnorm(-za - diff*sqrt(n/(4*p*(1-p)))) +
                                                        probnorm( za - diff*sqrt(n/(4*p*(1-p))));
                                power=1-probnorm(-za - diff/sqrt(var)) +
                                                  probnorm( za - diff/sqrt(var));
                                output;
                        end;
                        if n=round(nmax) then stop;
                end;
        end; /* do &alpha */
                drop diforder normlen step nmin nmax za var;
  %end;
  
  label n='Total Sample Size'
        power='Power'
        diff='Diff p1-p2'
        or = 'Odds ratio';
  run;

%if %length(&print)>0 %then %do;
proc print data=&out noobs split=' ';
  var &print;
  %if %length(&power)=0 %then %do;
        title  "Power for testing two independent proportions";
        %end;
        %else %do;
        title  "Sample size for testing two independent proportions";
        %end;
  title2 "Total sample size to be split equally between the groups";
  title3 "Baseline p1=&p1; p1-p2=&diff; alpha=&alpha";
  run;
%end;

proc tabulate data=&out format=6.0;
        class diff n;
        var power;
        table n,  diff *power=' '*f=6.3  * sum=' ';
        title2 "Two-tailed test, alpha=&alpha, p1=&p1 diff=&diff";
run;

%if %length(&plot)>0 %then %do;
%if %length(&plotby) %then %do;
proc sort data=&out;
        by &plotby;
%end;

title2 "Baseline: p1=&p1; p1-p2=&diff; alpha=&alpha";
proc plot data=&out;
        plot &plot /box ;
        run;

proc gplot data=&out uniform;
        plot &plot / frame hminor=1 vaxis=axis1 haxis=axis2;
        %if %length(&plotby) %then %do;
                by &plotby;
        %end;

        axis1 label=(a=90);
        axis2 offset=(3);
        symbol1 v=circle   i=join l=1 c=black;
        symbol2 v=dot      i=join l=3 c=red;
        symbol3 v=square   i=join l=5 c=blue;
        symbol4 v=triangle i=join l=7 c=green;
        symbol5 v=hash     i=join l=9 c=black;
        symbol6 v=diamond  i=join l=11 c=red;
        symbol7 v=star     i=join l=13 c=blue;
        format n 5.;
run; quit;
        title2;
        goptions reset=symbol;

%end;   
title;
%done:
%mend;

/*-----------------------------------------------------------------*
  |     Name: table.sas                                             |
  |    Title: Construct a grouped frequency table, with recoding    |
  |     Doc: http://www.math.yorku.ca/SCS/vcd/table.html            |
  | ----------------------------------------------------------------|
  |    Procs: freq printto                                          |
  |  Macdefs: table join tempfile tempdel                           |
  | ----------------------------------------------------------------|
  |   Author: Michael Friendly               <friendly@yorku.ca>    |
  |  Created: 09 Jul 1999 16:52:05                                  |
  |  Revised: 17 Nov 2000 12:10:58                                  |
  |  Version: 1.1                                                   |
  *  1.1  Inlined %tempfile %tempdel                                *
  *                                                                 *
  * From ``Visualizing Categorical Data'', Michael Friendly (2000)  *         
  *-----------------------------------------------------------------*/
/* Description:

 The TABLE macro constructs a grouped frequency table suitable for 
 input to %mosaics.  The input data may be individual observations,
 or a contingency table,  which may be collapsed to fewer variables.
 Factor variables may be converted to character using user-supplied
 formats.

Usage:

 The TABLE macro takes 7 keyword arguments.  The VAR= parameter
 is required.
 
Parameters:

* DATA=              The name of the input dataset.  [Default: DATA=_LAST_]

* VAR=               Names of all factor (classification) variables to be
                     included in the
                     output dataset.  The observations are summed over
                     any other factors, weighted by the WEIGHT= variable,
                     if any.

* CHAR=              If non-blank, forces the VAR= variables to be
                     converted to character variables (using formatted
                     values) in the output dataset.  If CHAR= a
                     numeric value (e.g., CHAR=8), it specifies the
                     length of each character variable; otherwise, the
                     character variables default to length 16.

* WEIGHT=            Name of a frequency variable, if the input dataset
                     is already in frequency form.

* ORDER=             Specifies the order of the variable levels used
                     in the PROC FREQ step.  ORDER=INTERNAL|FREQ|DATA|
                     FORMATTED are valid option values.

* FORMAT=            List of variable(s), format pairs (suitable for a
                     format statement).  The FORMAT= option may be used
                     to recode the values of any of the VAR= variables.

* OUT=               Name of output dataset.  The variables in the output
                     dataset are the VAR= variables, plus COUNT, the
                     frequency variable for each cell. [Default: OUT=TABLE]

Limitations:

None of the factor variables may be named COUNT.

Example:

 This example reads a three-way frequency table (gender x admit x dept),
 where admit and dept are numeric variables, and collapses it to a
 two-way table, with Gender and Admit as character variables.
 
   %include data(berkeley);
   %table(data=berkeley, var=gender admit, weight=freq, char=Y,
          format=admit admit. gender $sex., order=data, out=berk2);
   %mosaic(data=berk2, var=Gender Admit);

 The formats admit. and $sex. are created with proc format:

   proc format;
      value admit 1="Admitted" 0="Rejected";
      value $sex  'M'='Male'   'F'='Female';

*/ 

%macro table (
   data=_last_,     /* Name of input dataset                        */
   var=,            /* Names of all factor variables                */
   char=,           /* Force factor variables to character?         */
   weight=,         /* Name of a frequency variable                 */
   order=,          /* Specifies the order of the variable levels   */
   format=,         /* List of var, format pairs                    */
   out=table        /* Name of output dataset                       */
   );

%let abort=0;
%let ls=120;
%*-- Save original linesize;
%if &sysver>6.10 %then %do;
   %let lso=%sysfunc(getoption(ls,keyword));
   %let pso=%sysfunc(getoption(ps,keyword));
   %let dto=%sysfunc(getoption(date));
   %let cto=%sysfunc(getoption(center));
   %end;
%else %do; %let lso=; %let pso=; %let dto=; %let cto=; %end;


%if %length(&var)=0 %then %do;
   %put ERROR:  The VAR= variables must be specified.;
   %end;

%let table = %join(&var, *);

proc freq data=&data
   %if %length(&order) %then order=&order;
   ;
   %if %length(&weight) %then %do;
      weight &weight;
      %end;
   tables &table / noprint sparse out=&out(drop=percent);
   %if %length(&format) %then %do;
      format &format;
      %end;

%if %length(&char)>0 %then %do;
   /*
    * Force the VAR= variables to character.  To do this cleanly, we
    * resort to printing the &out dataset, then reading it back as
    * character.  
    */
   %tempfile(table,&ls);
   proc printto new print=table;
   options nodate nocenter nonumber ls=&ls;
   proc print data=&out;
      id &var;
      var count;
      run;
   %if &syserr > 4 %then %let abort=1; %if &abort %then %goto DONE;
   proc printto print=print;


   %let tvar = %join(&var, $) $;
   %*put tvar=&tvar;

   %if %verify(&char,  %str(0123456789))=0
      %then %let clen=&char;
      %else %let clen=16;
   data &out;
      infile table length=len;
      length string $&ls &var $&clen;
      retain skipping 1;
      drop string skipping;
      input @1 string $varying. len @;
      if skipping=0 & string ^= ' ' then do;
         input @1 &tvar count;
         output;
         end;
      else input;
      if index(string,'COUNT')>0 then skipping=0;
   run;
   *proc contents data=&out;

%tempdel(table);
%end;

%done:;
%if &abort %then %put ERROR: The TABLE macro ended abnormally.;
options notes &lso &pso &dto &cto;
%if &sysver<=6.10 %then %do;
        options center date ls=80;
        %end;

%mend;

 /*
Description:

Join the &delim-separated words in &string with &sep.

Usage:

 %let result = %join(A B C, *);   *-- returns: A * B * C;

Parameters:

* STRING        A &delim-separated string of 'words'    

* SEP           The separator character(s) used between each pair

* DELIM         The delimiters [Default: %str( )]

 */

%macro join(string, sep, delim);

   %local count word;
   %if %length(&delim) %then %let delim=%str( );
   %let count=1;
   %let word = %scan(&string,&count,%str( ));
   %let result = &word;
   %do %while(&word^= );
       %let count = %eval(&count+1);
       %let word = %scan(&string,&count,%str( ));
       %if %length(&word)
          %then %let result = &result &sep &word;
   %end;
   &result
   
%mend;

%macro tempfile(fileref,ls);
   %global tempfn;
   %if %length(&ls)=0 %then %let ls=80;
   %if &sysscp = CMS
      %then %let tempfn=&fileref output a;
   %else %if &sysscp = WIN 
      %then %let tempfn=c:\temp\&fileref..out;
   %else /* %if &sysscp = NEXT | &sysscp = RS6000 
      %then  */ %let tempfn=/tmp/&fileref..out;
   filename &fileref "&tempfn" lrecl=&ls;
%mend;

%macro tempdel(fileref);
   %global tempfn;
    *-- Avoid annoying flash with X commands;
    %if &sysver > 6.10 %then %do;
        %let rc=%sysfunc(fdelete(&fileref));
        %let rc=%sysfunc(filename(&fileref,''));
    %end;

    %else %do;
   %if &sysscp = CMS
      %then cms erase &tempfn;
   %else %if &sysscp = WIN
      %then %do;
         options noxsync noxwait; run;
         %sysexec(erase &tempfn); run;
         options   xsync   xwait; run;
      %end;
   %else /* assume flavor of UNIX */
       %sysexec(rm -f &tempfn);
    %end;
%mend;
/*-----------------------------------------------------------------*
  |     Name: equate.sas                                            |
  |    Title: Creates AXIS statements for a GPLOT with equated axes |
  |      Doc: http://www.math.yorku.ca/SCS/vcd/equate.html          |
  | ----------------------------------------------------------------|
  |    Procs: means gplot                                           |
  |  Macdefs: equate                                                |
  | ----------------------------------------------------------------|
  | Original: Warren Kuhfeld (SAS Sample Library)                   |
  |      Ref: P-179, PROC CORRESP, EXAMPLE 3.                       |
  |   Author: Michael Friendly               <friendly@yorku.ca>    |
  |  Created:  28 Jul 1998 14:13:25                                 |
  |  Revised:  10 Dec 1999 15:40:52                                 |
  |  Version:  1.2                                                  |
  |  1.1 Generates AXIS stmts for use by other macros               |
  |    - Determines XMAX, YMAX from DGSI if not specified           |
  |    - Optionally plots the data (if PLOT=YES)                    |
  |  1.2 Added XINC= and YINC= calculation from data                |
  |                                                                 |
  | From ``Visualizing Categorical Data'', Michael Friendly (2000)  |         
  *-----------------------------------------------------------------*/
 /*
Description:

 The EQUATE macro creates AXIS statements for a GPLOT with equated
 axes, and optionally produces a plot using point labels (supplied
 in an input annotate data set).  It is a modified version of the
 macro appearing in the SAS Sample Library.
                                                            
 It creates an AXIS statement for the vertical variable Y and
 an AXIS statement for horizontal variable X such that an inch
 on the vertical axis represents the same data range as an inch on
 the horizontal axis.  Equated axes are necessary whenever
 distances between points, or angles between vectors from the origin
 are to be interpreted.
          
Usage:

 The EQUATE macro takes 15 keyword arguments.  The X= and Y=
 parameters are required.  

 You may wish to reset the defaults below to be more
 suited to your devices.  As well, use GOPTIONS HSIZE= VSIZE=; to
 allow the maximum plot size if you  specify the XMAX= and
 YMAX= parameters as null values.

 As an additional convenience (particularly for use within other
 macros) EQUATE will calculate reasonable tick mark increments
 from the data, to give about 6 tick marks on an axis, if the
 XINC= or YINC= parameters are specified as null values.

Parameters:

* DATA=       Name of the input data set [Default: DATA=_LAST_]

* ANNO=       Name of an Annotate data set (used only if PLOT=YES).
              [Default: ANNO=&DATA]

* X=          Name of the X variable [Default: X=X]

* Y=          Name of the Y variable [Default: Y=Y]

* XMAX=       Maximum X axis length (inches).  If XMAX= (a null value)
              the macro queries the device driver (using the DSGI)
                                  to determine the maximum axis length. [Default: XMAX=6.5]

* YMAX=       Maximum Y axis length (inches).  If YMAX= (a null value)
              the macro queries the device driver (using the DSGI)
                                  to determine the maximum axis length. [Default: YMAX=8.5]

* XINC=       X axis tick increment.  If XINC= (a null value),
              the macro calculates an increment from the data
                                  which is 1, 2, 2.5, 4, or 5 times a power of 10
                                  so that about 6 tick marks will appear on the X
                                  axis. [Default: XINC=0.1]

* YINC=       Y axis tick increment.  If XINC= (a null value),
              the macro calculates an increment from the data
                                  which is 1, 2, 2.5, 4, or 5 times a power of 10
                                  so that about 6 tick marks will appear on the X
                                  axis. [Default: YINC=0.1]

* XPEXTRA=    Number of extra X axis tick marks at the high end.
              Use the XPEXTRA= and XMEXTRA= parameters to extend the
              range of the X variable beyond the data values, e.g.,
                                  to accommodate labels for points in a plot.
                                  [Default: XPEXTRA=0]

* XMEXTRA=    Number of extra X axis tick marks at the low end.
                                  [Default: XMEXTRA=0]

* YPEXTRA=    Number of extra Y axis tick marks at the high end.
              Use the YPEYTRA= and YMEYTRA= parameters to extend the
              range of the Y variable beyond the data values, e.g.,
                                  to accommodate additional annotations in a plot.
                                  [Default: YPEXTRA=0]

* YMEXTRA=    Number of extra Y axis tick marks at the low end.
                                  [Default: XMEXTRA=0]

* VAXIS=      Name of the AXIS statement for Y axis [Default: VAXIS=AXIS98]

* HAXIS=      Name of the AXIS statement for X axis [Default: HAXIS=AXIS99]

* PLOT=       Draw the plot? [Default: PLOT=NO]
                

    This macro performs no error checking.                  

 */
                                                            
   /*---------------------------------------------------------*/
%macro equate(
                        data=_last_,  /* Name of input data set             */
                        anno=&data,   /* Name of Annotate data set          */
                        x=x,          /* Name of X variable                 */
                        y=y,          /* Name of Y variable                 */
                        xmax=6.5,     /* maximum x axis inches              */
                        ymax=8.5,     /* maximum y axis inches              */
                        xinc=0.1,     /* x axis tick increment              */
                        yinc=0.1,     /* y axis tick increment              */
                        xpextra=0,    /* include extra + end x axis ticks   */
                        xmextra=0,    /* include extra - end x axis ticks   */
                        ypextra=0,    /* include extra + end y axis ticks   */
                        ymextra=0,    /* include extra - end y axis ticks   */
                        vaxis=axis98, /* AXIS statement for Y axis          */
                        haxis=axis99, /* AXIS statement for X axis          */
                        plot=NO       /* Draw the plot?                     */
                                  );
 
 
   %if %upcase(&data)=_LAST_ %then %let data=&syslast;
        
   *---Find the Minima and Maxima---;
        options nonotes;
   proc means noprint data=&data;
      var &y &x;
      output out=__temp__ min=ymin xmin max=ymax xmax;
      run;
 
   data _null_;
      set __temp__;
 
      *-- Select increments if values are empty --;
                %if %length(&xinc)=0 %then %do;
                        min=xmin;  max=xmax; link doinc; xinc=inc;
         %end;
                %else %do;
              xinc = &xinc;
                        %end;
                
                %if %length(&yinc)=0 %then %do;
                        min=ymin;  max=ymax; link doinc; yinc=inc;
         %end;
                %else %do;
              yinc = &yinc;
                        %end;
                
      *---Scale Minima and Maxima to Multiples of the Increments---;
      yinc = &yinc;
      ymin = (floor(ymin / yinc) - (&ymextra)) * yinc;
      xmin = (floor(xmin / xinc) - (&xmextra)) * xinc;
      ymax = (ceil (ymax / yinc) + (&ypextra)) * yinc;
      xmax = (ceil (xmax / xinc) + (&xpextra)) * xinc;
 
                *-- Should check that # tics is reasonable;
                xtic = (xmax - xmin) / xinc;
                ytic = (ymax - ymin) / yinc;
                
                *-- Determine XMAX, YMAX if not specified;
                %if %length(&xmax)=0 or %length(&ymax)=0 %then %do;
                        rc=ginit();
                        call gask('maxdisp',units,_xmax_,_ymax_,xpix,ypix,rc2);
                        rc3 = gterm();
                        *-- Convert to inches;
                        _xmax_ = _xmax_ * 100 / 2.54;
                        _ymax_ = _ymax_ * 100 / 2.54;
                        %end;
                %else %do;
                        _xmax_ = &xmax;
                        _ymax_ = &ymax;
                        %end;
                        
      *---Compute the Axis Lengths---;
 
      ytox = (ymax - ymin) / (xmax - xmin);
      if ytox le ((_ymax_) / (_xmax_)) then do;
         xlen =  _xmax_;
         ylen = (_xmax_) * ytox;
         end;
      else do;
         ylen = _ymax_;
         xlen = (_ymax_) / ytox;
         end;
 
      *---Write Results to Symbolic Variables---;
 
      call symput('len1',compress(put(ylen, best6.)));
      call symput('len2',compress(put(xlen, best6.)));
      call symput('min1',compress(put(ymin, best6.)));
      call symput('min2',compress(put(xmin, best6.)));
      call symput('max1',compress(put(ymax, best6.)));
      call symput('max2',compress(put(xmax, best6.)));
      call symput('inc1',compress(put(yinc, best6.)));
      call symput('inc2',compress(put(xinc, best6.)));
      return;

doinc:
                *-- Determine increment to give a nice number with about 6 ticks --;
                inc= abs(max - min)/6;
                pow = 10**floor( log10(inc) );
                nice=1000;
                do in = 1, 2, 2.5, 4, 5;
                        ut = in * pow;
                        if abs(inc-ut) < nice then do;
                                nice = abs(inc-ut);
                                best = ut;
                        end;
                end;
                inc=best;
                return;
run;
 
   *options notes;
 
   *---Write the Generated AXIS Statements to the Log---;
 
   %put EQUATE: The following statements were generated.;
   %put &vaxis length=&len1 IN order=(&min1 to &max1 by &inc1) label=(a=90)%str(;);
   %put &haxis length=&len2 IN order=(&min2 to &max2 by &inc2)%str(;);
        %put;
 
        *-- Create the AXIS statements;
        &vaxis length=&len1 IN order=&min1 to &max1 by &inc1 label=(a=90);
        &haxis length=&len2 IN order=&min2 to &max2 by &inc2;

   *---Create the GPLOT---;
   %if %upcase(&plot)=YES %then %do;
   proc gplot data=&data;
      symbol1 v=none;
      plot &y*&x=1 / annotate=&anno frame haxis=&haxis vaxis=&vaxis
                     href=0 vref=0 lvref=3 lhref=3;
      run;
   %end;
%mend equate;

/*-------------------------------------------------------------------*
  *    Name: inflglim.sas                                             *
  *   Title: Influence plots for generalized linear models            *
  *     Doc: http://www.math.yorku.ca/SCS/vcd/inflglim.html           *
  *                                                                   *
  *-------------------------------------------------------------------*
  *  Author:  Michael Friendly            <friendly@yorku.ca>         *
  * Created:  24 Nov 1997 10:36:05                                    *
  * Revised:  10 Nov 2000 09:29:04                                    *
  * Version:  1.4                                                     *
  *  - Fixed error if DIST= not specified. Added FREQ= parm           *
  *  - Added MOPT= parm, INFL= parm (what's influential?)             *
  *  1.4   Fixed make ... noprint for V7+                             *
  *    Fixed numerous problems with GENMOD for V7+ (sigh)             *
  *                                                                   *
  * Dependencies:  %gskip (needed for eps/gif only)                   *
  *                                                                   *
  * From ``Visualizing Categorical Data'', Michael Friendly (2000)    *         
  *-------------------------------------------------------------------
*/
 /*
Description:
 
 The INFLGLIM macro produces various influence plots for a generalized
 linear model fit by PROC GENMOD.  Each of these is a bubble plot of 
one
 diagnostic measure (specified by the GY= parameter) against another
 (GX=), with the bubble size proportional to a measure of influence
 (usually, BUBBLE=COOKD).  One plot is produced for each combination
 of the GY= and GX= parameters.
 
Usage:

 The macro normally takes an input data set of raw data and fits the
 GLM specified by the RESP=, and MODEL= parameters, using an error
 distribution given by the DIST= parameter.  It fits the model,
 obtains the OBSTATS and PARMEST data sets, and uses these to compute
 some additional influence diagnostics (HAT, COOKD, DIFCHI, DIFDEV,
 SERES), any of which may be used as the GY= and GX= variables.

 Alternatively, if you have fit a model with PROC GENMOD and saved
 the OBSTATS and PARMEST data sets, you may specify these with the
 OBSTATS= and PARMEST= parameters.  The same additional diagnostics
 are calculated and plotted.

 The INFLGLIM macro is called with keyword parameters.  The MODEL=
 and RESP= parameters are required, and you must supply the DIST=
 parameter for any model with non-normal errors.
 The arguments may be listed within parentheses in any order, separated
 by commas. For example: 
 
   %inflglim(data=berkeley,
      class=dept gender admit,
      resp=freq, model=dept|gender dept|admit,
      dist=poisson,
      id=cell,
      gx=hat, gy=streschi);

 
Parameters:

* DATA=       Name of input (raw data) data set. [Default: DATA=_LAST_]

* RESP=       The name of response variable.  For a loglin model, this
              is usually the frequency or cell count variable when the
              data are in grouped form (specify DIST=POISSON in this
              case).

* MODEL=      Gives the model specification.  You may use the '|' and
              '@' symbols to specify the model.

* CLASS=      Specifies the names of any class variables used in the 
model.

* DIST=       The name of the PROC GENMOD error distribution.  If you
              don't specify the error distribution, PROC GENMOD uses
              DIST=NORMAL.

* LINK=       The name of the link function.  The default is the 
canonical
              link function for the error distribution given by the
                                  DIST= parameter.

* MOPT=       Other options on the MODEL statement (e.g., MOPT=NOINT
              to fit a model without an intercept).

* FREQ=       The name of a frequency variable, when the data are in
              grouped form.

* WEIGHT=     The name of an observation weight (SCWGT) variable, used, 
for
              example, to specify structural zeros in a loglin model.

* ID=         Gives the name of a character observation ID variable
              which is used to label influential observations in the
              plots. Usually you will want to construct a character
              variable which combines the CLASS= variables into a
              compact cell identifier.

* GY=         The names of variables in the OBSTATS data set used as
              ordinates for in the plot(s).  One plot is produced for
                                  each combination of the words in GY by the 
words in GX.
                                  [Default: GY=DIFCHI STRESCHI]

* GX=         Abscissa(s) for plot, usually  PRED or HAT. [Default: 
GX=HAT]

* OUT=        Name of output data set, containing the observation
              statistics. [Default: OUT=COOKD]

* OBSTATS=    Specifies the name of the OBSTATS data set (containing
              residuala and other observation statistics) for a model 
              already fitted.

* PARMEST=    Specifies the name of the PARMEST data set (containing
              parameter estimates) for a model already fitted.

* BUBBLE=     Gives the name of the variable to which the bubble size 
is
              proportional. [Default: BUBBLE=COOKD]

* LABEL=      Determines which observations, if any, are labeled in the
              plots.  If LABEL=NONE, no observations are labeled; if
              LABEL=ALL, all are labeled; if LABEL=INFL, only possibly
              influential points are labeled, as determined by the
              INFL= parameter. [Default: LABEL=INFL]

* INFL=       Specifies the criterion used to determine which 
observations
              are influential (when used with LABEL=INFL).
              [Default: INFL=%STR(DIFCHI > 4 OR HAT > &HCRIT OR &BUBBLE 
> 1)]

* LSIZE=      Observation label size. [Default: LSIZE=1.5]. The height 
of
              other text (e.g., axis labels) is controlled by the 
HTEXT=
              goption.

* LCOLOR=     Observation label color. [Default: LCOLOR=BLACK]

* LPOS=       Observation label position, relative to the point.
              [Default: LPOS=5]

* BSIZE=      Bubble size scale factor. [Default: BSIZE=10]

* BSCALE=     Specifies whether the bubble size is proportional to AREA 
              or RADIUS. [Default: BSCALE=AREA]

* BCOLOR=     The color of the bubble symbol. [Default: BCOLOR=RED]

* REFCOL=     Color of reference lines.  Reference
              lines are drawn at nominally 'large' values for HAT 
values,
              standardized residuals, and change in chi square values.
                                  [Default: REFCOL=BLACK]

* REFLIN=     Line style for reference lines. Use REFLIN=0 to suppress
              these reference lines. [Default: REFLIN=33]

* NAME=       Name of the graph in the graphic catalog [Default:
              NAME=INFLGLIM]

* GOUT=       Name of the graphics catalog.

 =*/
 

%macro inflglim(
     data=_last_,    /* Name of input data set                  */
     resp=,          /* Name of criterion variable              */
     model=,         /* Model specification                     */
     class=,         /* Names of class variables                */
     dist=,          /* Error distribution                      */
     link=,          /* Link function                           */
     mopt=,          /* other model options (e.g., NOINT)       */
     freq=,          /* Freq variable                           */
     weight=,        /* Observation weight variable (zeros)     */
     id=,            /* Name of observation ID variable (char)  */
     gy=DIFCHI STRESCHI,      /* Ordinate(s) for plot(s)        */
     gx=HAT,         /* Abscissa(s_ for plot: PRED or HAT       */
     out=cookd,      /* Name of output data set                 */
     obstats=,       /* For a model already fitted              */
     parmest=,       /*  "      "      "      "                 */
     bubble=COOKD,   /* Bubble proportional to: COOKD           */
     label=INFL,     /* Points to label: ALL, NONE, or INFL     */
     infl=%str(difchi > 4 or hat > &hcrit or &bubble > 1),
     lsize=1.5,      /* obs label size.  The height of other    */
                     /* text is controlled by the HTEXT= goption*/
     lcolor=BLACK,   /* obs label color                         */
     lpos=5,         /* obs label position                      */
     bsize=10,       /* bubble size scale factor                */
     bscale=AREA,    /* bubble size proportional to AREA or RADIUS */
     bcolor=RED,     /* bubble color                            */
     refcol=BLACK,   /* color of reference lines                */
     reflin=33,      /* line style for reference lines; 0->NONE */
     name=INFLGLIM,  /* Name of the graph in the graphic catalog */
     gout=
);

        %*-- Reset required global options;
        %if &sysver >= 7 %then %do;
                %local o1 o2;
                %let o1 = %sysfunc(getoption(notes));
                %let o2 = %sysfunc(getoption(validvarname,keyword));
                options nonotes validvarname=V6;
                %end;
        %else %do;
           options nonotes;
                %end;

%let abort=0;
%let gx=%upcase(&gx);
%let gy=%upcase(&gy);
%let dist=%upcase(&dist);
%let label=%upcase(&label);
%let bubble=%upcase(&bubble);

%if %length(&model) = 0 %then %do;
    %put ERROR: List of model terms (MODEL=) is empty.;
    %let abort=1;
    %goto done;
    %end;

%if %length(&resp) = 0 %then %do;
    %put ERROR: No response (RESP=) has been specified.;
    %let abort=1;
    %goto done;
    %end;

%if %length(&dist) = 0 %then %do;
    %put WARNING: No distribution (DIST=) has been specified.;
    %put WARNING: GENMOD will use DIST=NORMAL.;
    %end;

%let nx = %numwords(&gx);       /* number of abscissa vars */
%let ny = %numwords(&gy);       /* number of ordinate vars */

%if &sysver < 6.12 %then %do;
   %if %upcase(&dist)=BINOMIAL %then %do;
      %if %length(%scan(&resp,2,/))=0 %then %do;
         %put ERROR: Response must be specified as RESP=events/trials 
for DIST=BINOMIAL;
         %let abort=1;
         %goto done;
         %end;
      %if %length(&link)=0 %then %let link=logit;
      %end;
   %end;

%if %length(&obstats)=0 or %length(&parmest)=0 %then %do;
proc genmod data=&data;
  class &class;
        %if %length(&freq)>0 %then %do;  freq &freq; %end;
        %if %length(&weight)>0 %then %do;  scwgt &weight; %end;
  model &resp = &model /
        %if %length(&dist)>0 %then %do;  dist=&dist %end;
        %if %length(&link)>0 %then %do;  link=&link %end;
        %if %length(&mopt)>0 %then %do;  %str(&mopt) %end;
        obstats residuals;
        %if &sysver<7 %then %do;
        make 'obstats'  out=_obstat_ noprint;
        make 'parmest'  out=_parms_ noprint;
                %end;
        %else %do;
                ods output ObStats=_obstat_;
                ods output ParameterEstimates=_parms_;
                *proc print data=_parms_;
                %end;
  run;

%if &syserr > 4 %then %let abort=1; %if &abort %then %goto DONE;
%let obstats=_obstat_;
%let parmest=_parms_;
%end;

options nonotes;
%let parms=0;
data _null_;
   set &parmest end=eof;
   parms + df;
   if eof then do;
      call symput('parms', left(put(parms,5.)));
      end;
run;
%*put parms=&parms;
%if &parms=0 %then %let abort=1; %if &abort %then %goto DONE;

data &out;
  /* GENMOD seems to make all class variables character */
  /* keep only the GENMOD computed variables */
  merge &data 
                &obstats(keep=pred--reslik)
                end=eof;
  drop hcrit obs;
  obs=_N_;
  label hat = 'Leverage (H value)'
        cookd = "Cook's Distance"
        difchi = 'Change in Pearson ChiSquare'
        difdev = 'Change in Deviance'
        reschi = 'Pearson residual'
        resdev = 'Deviance residual'
        streschi = 'Adjusted Pearson residual'
        stresdev = 'Adjusted Deviance residual'
        seres = 'Residual Std. Error'
        pred = 'Fitted value';

  /* hat is the leverage */
  hat = Std*Hesswgt*Std;
  if hat<1 then do;
     cookd = hat*Streschi**2/((&parms)*(1-hat));
     seres = sqrt(1-hat);
     end;

  difchi = streschi**2;
  difdev = stresdev**2;

  if eof then do;
     hcrit =  &parms / obs;
     call symput('hcrit', put(hcrit,4.3));
     end;
  run;

proc print data=&out noobs label;
   id &id ;
   format pred hesswgt lower upper 6.2 
          xbeta std resraw--reslik hat cookd difchi difdev seres 7.3 ;
  run;

%do i=1 %to &ny;
   %let gyi = %scan(&gy, &i);
   %do j=1 %to &nx;
   %let gxj = %scan(&gx, &j);
   %put Plotting &gyi vs &gxj ;

   %if &label ^= NONE %then %do;
   data _label_;
      set &out nobs=n;
      length xsys $1 ysys $1 function $8 position $1 text $12 color $8;
      retain xsys '2' ysys '2' function 'LABEL' color "&lcolor";
      keep &id &class x y xsys ysys function position text color size
           position hat difchi &bubble;
      x = &gxj;
      y = &gyi;
      %if &id ^= %str() %then %do;
         text = left( &id );
         %end;
      %else %if %length(&class) %then %do;
         %let c = 1;
         %let v = %scan(&class,&c);
         text = '';
         %do %while (%length(&v) > 0);
            text = trim(text) || trim(&v);
            %let c = %eval(&c + 1);
            %let v = %scan(&class,&c);
            %end;
            %end;
      %else %do;
         text = put(_n_,3.0);
         %end;
      size=&lsize;
      position="&lpos";
      %if &label = INFL %then %do;
         if &infl then output;
         %end;
   run;
   %end;  /* &label ^= NONE */

   proc gplot data=&out &GOUT ;
     bubble &gyi * &gxj = &bubble /
      %if &label ^= NONE %then %do;
      annotate=_label_
      %end;
      frame
      vaxis=axis1 vminor=1 hminor=1
      %if &reflin ^= 0 %then %do;
      %if (&gyi=DIFCHI) or (&gyi=DIFDEV) %then %do;
         vref=4       lvref=&reflin  cvref=&refcol
         %end;
      %else %if (&gyi=STRESCHI) or (&gyi=STRESDEV) %then %do;
         vref=0 -2 2       lvref=&reflin  cvref=&refcol
         %end;
      %if (&gxj = HAT) %then %do;
         href= &hcrit lhref=&reflin  chref=&refcol
         %end;
      %end;
      bsize=&bsize  bcolor=&bcolor  bscale=&bscale
      name="&name"
      Des="Influence plot for &resp (&gyi vs. &gxj)";
     axis1 label=(a=90 r=0);
   run; quit;
   %gskip;
   %end;   /* gx loop */
%end;   /* gy loop */


%done:
%if &abort %then %put ERROR: The INFLGLIM macro ended abnormally.;
        %*-- Restore global options;
        %if &sysver >= 7 %then %do;
                options &o1 &o2;
                %end;
        %else %do;
           options notes;
                %end;
%mend;

%macro numwords(lst);
   %let i = 1;
   %let v = %scan(&lst,&i);
   %do %while (%length(&v) > 0);
      %let i = %eval(&i + 1);
      %let v = %scan(&lst,&i);
      %end;
   %eval(&i - 1)
%mend;
INDEX   
addvar.sas      Added variable plots for logistic regression
bars.sas        Create an annotate data set to draw error bars
biplot.sas      Generalized biplot of observations and variables
catplot.sas     Plot observed and predicted logits for logit models
corresp.sas     Correspondence analysis of contingency tables
distplot.sas    Plots for discrete distributions
dummy.sas       Macro to create dummy variables
equate.sas      Creates AXIS statements for a GPLOT with equated axes
gdispla.sas     Device-independent DISPLAY/NODISPLAY control
gensym.sas      Macro to generate SYMBOL statement for each GROUP
goodfit.sas     Goodness of fit tests for discrete distributions
gskip.sas       Device-independent macro for multiple plots
halfnorm.sas    Half normal plot for generalized linear models
inflglim.sas    Influence plots for generalized linear models
inflogis.sas    Influence plot for logistic regression models
interact.sas    Create interaction variables
label.sas       Create an Annotate dataset to label observations
lags.sas        Macro for lag sequential analysis
logodds.sas     Plot empirical log-odds for logistic regression
mosaic.sas      Macro interface for mosaic displays
mosmat.sas      Macro interface for mosaic matrices
ordplot.sas     Diagnose form of discrete frequency distribution
panels.sas      Macro to display a set of plots in rectangular panels
points.sas      Create an Annotate dataset to draw points in a plot
poisplot.sas    Poissonness plot for discrete distributions
power2x2.sas    Power for testing two independent proportions
powerlog.sas    Power for logistic regression, quantitative predictor
pscale.sas      Construct annotations for a probability scale
robust.sas      M-estimation for robust models fitting via IRLS
rootgram.sas    Hanging rootograms for discrete distributions
sort.sas        Generalized dataset sorting by format or statistic
table.sas       Construct a grouped frequency table, with recoding
triplot.sas     Macro for trilinear plots
/*-------------------------------------------------------------------*
  *    Name: powerlog.sas                                             *
  *   Title: Power for logistic regression, quantitative predictor    *
  *     Doc: http://www.math.yorku.ca/SCS/vcd/powerlog.html           *
  *-------------------------------------------------------------------*
  *  Author:  Michael Friendly            <friendly@yorku.ca>         *
  * Created: 17 Apr 98 16:43                                          *
  * Revised: 17 Apr 98 16:43                                          *
  * Version: 1.0                                                      *
  *                                                                   *
  *                                                                   *
  * From ``Visualizing Categorical Data'', Michael Friendly (2000)    *         
  *-------------------------------------------------------------------
*/
 /*
Description:
 
 The POWERLOG macro calculates sample size required to achived given
 power values for a logistic regression model with one or more 
 quantitative predictors.  Results are displayed as a table of
 sample sizes required for a range of power values, and as a graph.

Usage:

 The POWERLOG macro is called with keyword parameters.  The arguments 
 may be listed within parentheses in any order, separated by commas. 
 You must supply either an input data set containing the 
 variables P1, P2, ALPHA, POWER, and RSQ (one observation for each
 combination for which power is desired), or the macro parameters
 P1= and P2=.
 For example: 
 
        %powerlog(p1=.08, p2=%str(.16, .24));
 
Parameters:

* DATA=  Specifies the name of an input data set containing the 
          variables P1, P2, ALPHA, POWER, and RSQ in all combinations
                    for which power is desired.  If an input DATA= data set 
is
                    specified, the program ignores values for the P1=, P2=, 
ALPHA=, 
                    POWER=, and RSQ= parameters.

* P1=     is the estimated probability of the event at the mean value
          of the quantitative predictor.

* P2=     is the estimated probability of the event at an X-value equal
          to the X-mean plus one standard deviation.
                         You may specify a list of values separated by 
commas, a
                         range of the form x TO y BY z, or a combination of 
these.
                         However, you must surround the P2= value with
                         %STR() if any commas appear in it.  For example,
                        
                                p2=.10 to .30 by .05
                                p2=%str(.10, .13, .20)


* ALPHA=  is the desired Type I error probability for a *one-sided* 
test
          of H0: beta(x) = 0

* POWER=  is the desired power of the test.

* RSQ=    is the squared multiple correlation of the predictor with all
          other predictors.  Use RSQ=0 for a 1-predictor model.

* PLOT=   is a specification for plotting the results.  The default
          is PLOT=N * POWER=RSQ.  No plots are produced if  PLOT=  is
                    blank.

* PLOTBY= is another variable in the OUT= data set.  Separate plots are
          drawn for each level of the PLOTBY= variable.

* OUT=   Specifies the name of the output data set

Reference:  Agresti, 'Introduction to Categorical Data Analysis', p.131

Example:  

Modelling the relation of the probability of heart disease
on X = cholesterol.  If previous studies suggest that heart disease
occurs with P1=0.08 at the mean level of cholesterol, what is the
sample size required to detect a 50% increase (P2 = 1.5*.08 = .12),
or an 87.5% increase (P2 = 1.875*.08 = .15) in the probability of
heart disease, when cholesterol increases by one standard deviation?
        
If age is another predictor, how does sample size vary with the RSQ
between cholesterol and age?
        
   %powerlog(p1=.08, p2=%str(.12, .15), rsq=%str(.2, .4) );

*/

%macro powerlog(
        data=,
        p1=,
        p2=,
        alpha=.05,
        power=.7 to .9 by .05,
        rsq=0 to .6 by .2,
        plot=N * power = rsq,
        plotby=theta,
        out=_power_
        );
         
%if %length(&data)=0 %then %do;
        %if %length(&p1)=0 or %length(&p2) =0 %then %do;
                %put ERROR:  P1= and P2= must be specified.;
                %goto done;
        %end;

        %let data=_in_;
        data _in_;
                label p1='Pr(event) at X-mean'
                                p2='Pr(event) at X-mean+std'
                                alpha = 'Type I risk'
                                power = 'Desired power'
                                rsq   = 'R**2 (X, other Xs)';
                alpha = &alpha;
                do p1 = &p1;
                        do p2 = &p2;
                                do rsq = &rsq;
                                        do power = &power;
                                        output;
                                        end;
                                end;
                        end;
                end;
%end;

data &out;
        set &data;
        drop l1 l2 za zb lambda;
        label theta = 'Odds ratio'
              lambda= 'Log(odds ratio)'
                        delta = 'Delta'
                        N     = 'Sample size';
        format theta 6.3 delta 6.2 n 6.1;
        l1 = p1 / (1-p1);
        l2 = p2 / (1-p2);
        theta = l2 / l1;
        lambda = log(theta);
        
        za = probit(1-alpha);
        zb = probit(power);
        
        delta = (1 + (1+lambda**2) * exp(5*lambda**2/4))
              / (1 + exp(- lambda**2 / 4));
        
        N =  ( (za + zb * exp(- lambda**2 / 4))**2  * (1 + 2*p1*delta))
          / (p1 * lambda**2); 
        
        N = N / (1-rsq);
/*      
proc print label;
        id p1 p2;
        by p1 p2;
*/
proc tabulate data=&out format=6.0;
        class alpha p2 rsq power;
        var n;
        table power='Power',  p2 *n=' '*f=5. * rsq * sum=' ';
        title2 "One-tailed test, alpha=&alpha, p1=&p1 p2=&p2";

%if %length(&plot) %then %do;
%if %length(&plotby) %then %do;
proc sort data=&out;
        by &plotby;
%end;
proc gplot data=&out uniform;
        plot &plot / frame hminor=1 vaxis=axis1 haxis=axis2;
        %if %length(&plotby) %then %do;
                by &plotby;
        %end;

        axis1 label=(a=90);
        axis2 offset=(3);
        symbol1 v=circle   i=join l=1 c=black w=3;
        symbol2 v=dot      i=join l=3 c=red;
        symbol3 v=square   i=join l=5 c=blue;
        symbol4 v=triangle i=join l=7 c=green;
        symbol5 v=hash     i=join l=9 c=black;
        symbol6 v=diamond  i=join l=11 c=red;
        symbol7 v=star     i=join l=13 c=blue;
        format n 5.;
run; quit;
        title2;
        goptions reset=symbol;
%end;
%done:
%mend;

/*-------------------------------------------------------------------*
  *    Name: triplot.sas                                              *
  *   Title: Macro for trilinear plots                                *
  *     Doc: http://www.math.yorku.ca/SCS/vcd/triplot.html            *
  *-------------------------------------------------------------------*
  *  Author:  Michael Friendly            <friendly@yorku.ca>         *
  * Created:  6 Aug 1996 09:11:33                                     *
  * Revised:  7 Apr 1998 13:05:32                                     *
  * Version:  1.2                                                     *
  *  - Fixed legend bug; added label location (LABLOC=) parameter     *
  *    Fixed xsys/ysys error                                          *
  *                                                                   *
  * From ``Visualizing Categorical Data'', Michael Friendly (2000)    *         
  *-------------------------------------------------------------------
*/
 /*
Description:
 
 The TRIPLOT macro plots three variables (rows of an n x 3 table)
 in an equilateral triangle, so that each point represents the
 proportions of each variable to the total for that observation.

Usage:

 The TRIPLOT macro is called with keyword parameters.
 The names of three variables must be given in the VAR= parameter.
 The arguments may be listed within parentheses in any order, separated
 by commas. For example: 
 
        data tridemo;
                input A B C point $12.;
                label point='Point';
        cards;
        40 30 30  (40,30,30)
        20 60 20  (20,60,20)
        10 10 80  (10,10,80)
        ;
        %triplot(var=A B C, class=Point, id=point, gridby=25,
                symbols=dot dot dot, idht=1.6, axes=bot,
                symht=4, gridclr=gray);
 
Parameters:

* DATA=       The name of data set to be plotted. [Default: 
DATA=_LAST_]

* VAR=        The names of three variables used as the axes in the
              plot.  The values of each observation are normally all
                                  non-negative.  Missing values are treated as 
0.

* CLASS=      The name of a class variable determining plotting symbol.
              Different values of the CLASS= variable are represented
                                  by the values in the COLORS= and SYMBOLS= 
lists, used
                                  sequentially.

* ID=         The name of an observation identifier (label) variable

* BY=         The name of a BY variable, for separate plots

* WHERE=      WHERE-clause to subset observations to be plotted.

* IDHT=       Height of ID label [Default: IDHT=2]

* IDCLR=      Color of ID label [Default: IDCLR='BLACK']

* IDPOS=      Position of ID label [Default: IDPOS=8]

* IDSUBSET=   A SAS expression (which may use any data set variables
              used to subset ID labels.  If an ID= variable is given,
                                  and the IDSUBSET= expression evaluates to 
non-zero,
                                  the observation is labelled in the plot.
                                  [Default: IDSUBSET=1]

* INTERP=     Interpolation between points, a SYMBOL statement option.
              If INTERP=JOIN, points within the same CLASS= value are
                                  connected by lines.  Most other SYMBOL 
statement
                                  interpolation options would give bizare 
results.
                                  [Default: INTERP=NONE]

* SYMHT=      Height of point symbols [Default: SYMHT=2]

* SYMBOLS=    A list of one or more symbols for points, corresponding
              to the levels of the CLASS= variable.  The symbols are
                                  reused cyclically if there are more class 
levels than
                                  symbols.
                                  [Default: SYMBOLS=%STR(DOT CIRCLE SQUARE $ : 
TRIANGLE = X _ Y)]

* COLORS=     A list of one or more colors for points, corresponding
              to the levels of the CLASS= variable.  The colors are
                                  also reused cyclically as required.
              [Default: COLORS=BLACK RED BLUE GREEN BROWN ORANGE PURPLE 
YELLOW]

* BACKCLR=    Background color inside the trilinear plot.
              [Default: BACKCLR=WHITE]

* BACKPAT=    Background fill pattern. For a plot with a light gray
              background, for example, specify BACKPAT=SOLID and
                                  BACKCLR=GRAYD0. [Default: BACKPAT=EMPTY]

* GRIDBY=     Grid line interval. For grid lines at 25, 50, and 75%,
              for example, specify GRIDBY=25. [Default: GRIDBY=20]

* GRIDCLR=    Grid line color [Default: GRIDCLR=GRAY]

* GRIDLINE=   Style of grid lines [Default: GRIDLINE=34]

* AXES=       Type of axes, one of NONE, FULL, TOP, or BOT.
              AXES=NONE draws no coordinate axes;  AXES=FULL draws
                                  a line from 0 to 100% for each of the three 
coordinates;
                                  AXES=TOP draws a line from the apex to the 
centroid only;
                                  AXES=BOT draws a line from the centroid to 
the base only. 
              [Default: AXES=NONE]

* AXISCLR=    Color of axis lines [Default: AXISCLR=BLUE]

* AXISLINE=   Style of axis lines [Default: AXISLINE=1]

* XOFF=       X offset, in %, for adjusting the plot [Default: XOFF=2]

* XSCALE=     X scale factor for adjusting the plot.  Before plotting
              the X coordinates are adjusted by X = XOFF + XSCALE * X.
                                  [Default: XSCALE=.96]

* YOFF=       X offset, in %, for adjusting the plot [Default: YOFF=2]

* YSCALE=     Y scale factor for adjusting the plot.  Before plotting
              the Y coordinates are adjusted by Y = YOFF + YSCALE * Y.
                                  [Default: YSCALE=.96]

* LEGEND=     The name of legend statement or 'NONE'.  If LEGEND= is
              not specified, and there is more than one group defined
                                  by a CLASS= variable, a legend statement is 
constructed
                                  internally. If LEGEND=NONE, no legend is 
drawn; otherwise
                                  the LEGEND= value is used as the name of a 
legend statement.

* LABHT=      Height of variable labels, in GUNITs [Default: LABHT=2]

* LABLOC=     Location of variable label: 0 or 100 [Default: 
LABLOC=100]

* NAME=       Name of the graphics catalog entry [Default: NAME=TRIPLT]
                

 */
 /*

Ideas from: Fedencuk & Bercov, "TERNPLOT - SAS creation of ternary 
plots",
        SUGI 16, 1991, 771-778.
*/

%macro triplot(
        data=_last_,    /* name of data set to be plotted           */
        var=,           /* 3 variable names for plot axes           */
        class=,         /* class variable defining plotting symbol  */
        id=,            /* point identifier (label) variable        */
        by=,            /* BY variable for separate plots           */
        where=,         /* where-clause to subset observations      */

        idht=2,         /* height of ID label                       */
        idclr='BLACK',  /* color of ID label                        */
        idpos=8,        /* position of ID label                     */
        idsubset=1,     /* expression to subset ID labels           */
        interp=none,    /* interpolation between points             */
        symht=2,        /* height of point symbols                  */
        symbols=%str(dot circle square $ : triangle = X _ Y),
        symfont=,
        colors=BLACK RED BLUE GREEN BROWN ORANGE PURPLE YELLOW,

        backclr=WHITE,  /* background color                         */
        backpat=EMPTY,  /* background fill pattern                  */
        gridby=20,      /* grid line interval                       */
        gridclr=gray,   /* grid line color                          */
        gridline=34,    /* style of grid lines                      */
        axes=none,      /* type of axes: NONE, FULL, TOP, BOT       */
        axisclr=blue,   /* color of axis lines                      */
        axisline=1,     /* style of axis lines                      */

        xoff=2,         /* X offset, in %, for adjusting the plot   */
        xscale=.96,     /* X scale factor, for adjusting the plot   */
        yoff=2,         /* Y offset, in %, for adjusting the plot   */
        yscale=.96,     /* Y scale factor, for adjusting the plot   */
        legend=,        /* name of legend statement or 'NONE'       */
        labht=2,        /* height of variable labels, in GUNITs     */
        labloc=100,     /* location of variable label: 0 or 100     */
        name=triplot    /* name of graphic output in the catalog    */
        );

%let scale = 1;
%let lab = 100;
%let lbc = 100;
%let lac = 100;
%let abort=0;
%let legend=%upcase(&legend);
%let axes  =%upcase(&axes);

%local xvar yavr zvar xlab ylab zlab;

%*-- parse variable names and labels;
%let pre=x y z;
%do i=1 %to 3;
        %let l = %scan(&pre, &i);
        %let &l.var = %scan(&var, &i);
        %end;
%if %length(&zvar)=0 %then %do;
        %put ERROR:  VAR= must give three variable names;
        %let abort=1;
        %goto done;
%end;
%*put xvar=&xvar yvar=&yvar zvar=&zvar;

   %*-- make &data reusable if _LAST_ was specified;
%if %bquote(&data) = %bquote(_last_) %then %let data = &syslast;

*options nonotes;
proc contents data=&data out=_work_ noprint;
%if &syserr > 4 %then %let abort=1; %if &abort %then %goto DONE;

   %*-- get variable labels, assign names if no labels;
data _null_;
   set _work_(keep=name type label);
                select (name);
                        when (upcase("&xvar")) do;
                                if label = ' ' then call symput('xlab', 
"&xvar");
                                                        else call symput('xlab', 
trim(label));
                                end;
                        when (upcase("&yvar")) do;
                                if label = ' ' then call symput('ylab', 
"&yvar");
                                                        else call symput('ylab', 
trim(label));
                                end;
                        when (upcase("&zvar")) do;
                                if label = ' ' then call symput('zlab', 
"&zvar");
                                                        else call symput('zlab', 
trim(label));
                                end;
                        otherwise ;
                        end;
run;
%*put xlab=&xlab ylab=&ylab zlab=&zlab;

data trianno;
        retain xsys ysys  '1';
        drop triht;
        length text $20;
        format x y 6.1;

        line  = 1;
        size  = 4;

        triht = &scale * sqrt(3) * &lab /2      ;
        %if %upcase(&backpat) ^= EMPTY %then %do;
                style = "&backpat";
                color = "&backclr";
                %end;

        x = 0;      y = 0;     function='poly    '; output;

        color = 'BLACK';       /* color of triangular frame */
        x = &lab;   y = 0;     function='polycont'; output;
        x = &lab/2; y = triht; function='polycont'; output;
        x = 0;      y = 0;     function='polycont'; output;

        size = &labht;
        style = ' ';
        function='label';
        %if &labloc=100 %then %do;    /* labels at corners */
                position='F';
                text = "&xlab";  x = 0;      y = 0;     output;
                position='D';
                text = "&ylab";  x =100;     y = 0;     output;
                position='2';
                text = "&zlab";  x = &lab/2; y = triht; output;
                %end;
        %else %do;                    /* labels at sides  */
                position='2'; angle=-60;
                text = "&xlab";  x = .75*&lab;  y = triht/2; output;
                position='2'; angle=60;
                text = "&ylab";  x = .25*&lab;  y = triht/2; output;
                position='E'; angle=0;
                text = "&zlab";  x = &lab/2;    y = 0;       output;
                %end;
        
run;

%if &gridby > 0 %then %do;
%let mg=9;     /* max number of grid positions */
data trigrid;
        retain xsys ysys  '1';
        length function $8;
        drop triht i gridby ngrid ni;
        array gabx [&mg] _temporary_ ;
        array gaby [&mg] _temporary_ ;
        array gacx [&mg] _temporary_ ;
        array gacy [&mg] _temporary_ ;
        array gbcx [&mg] _temporary_ ;
        array gbcy [&mg] _temporary_ ;

        array tick [&mg] $2 _temporary_ ;
        array tax [&mg]     _temporary_ ;
        array tbx [&mg]     _temporary_ ;
        array tcx [&mg]     _temporary_ ;
        array tay [&mg]     _temporary_ ;
        array tby [&mg]     _temporary_ ;
        array tcy [&mg]     _temporary_ ;

        triht = &scale * sqrt(3) * &lab /2      ;

        gridby = &gridby;
        ngrid  = min(&mg, (100/gridby) - 1);
        do i = 1 to ngrid;
                gabx[i] = gridby * i;
                gaby[i] = 0;
                gacx[i] = gridby * i / 2;
                gacy[i] = triht * i * gridby / 100;
                gbcx[i] = 50 + (gridby * i) / 2;
                gbcy[i] = triht * (ngrid+1 - i) * gridby / 100;

                tick[i] = put( (gridby * i), 2.);

                tax[i]  = 75 - (.75 * gridby * i);
                tay[i]  = (triht - (triht * (i*gridby/100))) / 2;
                tbx[i]  = 25 + (.75 * gridby * i);
                tby[i]  = tay[i];
                tcx[i]  = 50;
                tcy[i]  = triht * i * gridby / 100;

                end;

        line = &gridline;
        color = "&gridclr";
        size = 1;
        position = 'B';
        rotate = 0;

        *-- grid lines parallel to AB;
        angle  = 0;
        do i = 1 to ngrid;
                ni = ngrid+1 - i;
                x = gbcx[i];    y = gbcy[i];     function='move';  output;
                x = gacx[ni];   y = gacy[ni];    function='draw';  output;
                text = tick[i];
                x = tcx[i];     y = tcy[i];      function='label'; output;
                end;

        *-- grid lines parallel to AC;
        angle  = -120;
        do i = 1 to ngrid;
                x = gabx[i];    y = gaby[i];     function='move';  output;
                x = gbcx[i];    y = gbcy[i];     function='draw';  output;
                text = tick[i];
                x = tbx[i];     y = tby[i];      function='label'; output;
                end;

        *-- grid lines parallel to BC;
        angle  = 120;
        do i = 1 to ngrid;
                x = gabx[i];    y = gaby[i];     function='move';  output;
                x = gacx[i];    y = gacy[i];     function='draw';  output;
                text = tick[i];
                x = tax[i];     y = tay[i];      function='label'; output;
                end;
%end; /*-- &gridby > 0 */

*let axes=BOT;
%if &axes ^= NONE %then %do;
                        %if &axes=TOP %then %do; %let f1=draw; %let f2=move; 
%end;
        %else %if &axes=BOT %then %do; %let f1=move; %let f2=draw; %end;
        %else  /* &axes=FULL */   %do; %let f1=draw; %let f2=draw; %end;

data triaxes;
        retain xsys ysys  '1' line &axisline color "&axisclr";
        length function $8;

        root3 = sqrt(3);
        triht = &scale * root3 * &lab /2;
        cx = &lab/2; cy = triht/3;
        drop root3 triht cx cy;

        x = &lab/2;  y = triht;         function='move'; output;
        y = cy;                         function="&f1";  output;
        y = 0;                          function="&f2";  output;

        x = 0;  y = 0;                  function='move'; output;
        x = cx; y = cy;                 function="&f1";  output;
        x = &lab * .75; y=triht/2;      function="&f2";  output;

        x = &lab; y=0;                  function='move'; output;
        x = cx; y = cy;                 function="&f1";  output;
        x = &lab * .25; y=triht/2;      function="&f2";  output;
%end;  /* &axes ^= NONE */

data trianno;
        set trianno
                %if &axes ^=NONE %then
                triaxes ;
                %if &gridby > 0 %then
                trigrid ;
                                ;
        x = &xoff + &xscale * x;
        y = &yoff + &yscale * y;

%if %length(&by) or %length(&class) %then %do;
        proc sort data=&data;
        by &by &class;
        %end;

data tridata;
        retain xsys ysys  '1'
                xa  ya
                xb  yb
                xc  yc
                root3 xa1 ya1 coef_bc _class_ 0;
        drop xa ya xb yb xc yc root3 coef_bc sum xa1 ya1 xaa yaa;
        drop &xvar &yvar &zvar;

        if _n_ = 1 then do;
                root3 = sqrt(3);
                xa = 0;   ya=0;
                xb = 100; yb=0;
                xc = (xb - xa) / 2;
                yc = root3 * xc;
                ya1= yc / 2;
                xa1= root3 * ya1;
                coef_bc = (yc - yb) / (xc - xb);
                end;

        set &data end=eof;
        %if %length(&class)
                %then %do; by &class; if first.&class then _class_+1; %end;
                %else %do;   _class_=1; %end;

        %if %length(&where) %then
                where (&where)%str(;);

        if (&xvar = .) then &xvar=0;
        if (&yvar = .) then &yvar=0;
        if (&zvar = .) then &zvar=0;
        if &xvar < 0 | &yvar < 0 | &zvar < 0
                then put 'WARNING: One or more values are negative in obs' 
_n_
                &xvar= &yvar= &zvar=;

        sum = &xvar + &yvar + &zvar;
        if (sum=0) then delete;

        &xvar = &xvar / sum;
        &yvar = &yvar / sum;
        &zvar = &zvar / sum;

        xaa = xa1 * (1 - &xvar);
        yaa = xaa / root3;

        y = yc * &zvar;
        x = ( (y-yaa)/coef_bc ) + xaa;

        %if %length(&id) %then %do;
                if (&idsubset) then do;
                        text = &id;
                        size = &idht;
                        color= &idclr;
                        position = "&idpos";;
                        function = 'label';
                        end;
        %end;

        x = &xoff + &xscale * x;
        y = &yoff + &yscale * y;

        if eof then do;
                call symput('ngroups', put(_class_, 3.));
        end;
run;
%*put ngroups=&ngroups;
%if %length(&class)=0 %then %let class=_class_;

data trianno;
        set trianno
        %if %length(&id) %then %do;
        tridata(keep=xsys ysys x y text size color position function
                                where=(function='label'))
        %end;
        ;

*proc print;

%gensym(n=&ngroups,
        h=&symht, symbols=&symbols, colors=&colors, interp=&interp,
                  font=&symfont);

axis1 offset=(0) order=(0 to 100)
        label=none value=none major=none minor=none style=0;

%if %length(&legend)=0 & &ngroups>1 %then %do;
        legend1  position=(top right inside) across=1 offset=(0,-2) 
                mode=share frame;
        %let legend=legend=legend1;
        %end;
%else %if &legend=NONE or &ngroups=1 %then %do;
        %let legend=nolegend;
        %end;
%else %do;
        %let legend=legend=&legend;
        %end;

proc gplot data=tridata anno=trianno;
        plot y * x = &class
                                / vaxis=axis1 haxis=axis1 noaxes &legend 
                                name="&name"
                                des="Triplot of &var";
        %if %length(&by) %then %do;
                                by &by ;
                                %end;
run; quit;

%DONE:
%if &abort %then %put ERROR: The TRIPLOT macro ended abnormally.;
options notes;

%mend;
/*-------------------------------------------------------------------*
  *    Name: bars.sas                                                 *
  *   Title: Create an annotate data set to draw error bars           *
  *     Doc: http://www.math.yorku.ca/SCS/vcd/bars.html               *
  *-------------------------------------------------------------------*
  *  Author:  Michael Friendly            <friendly@yorku.ca>         *
  * Created: 23 Nov 1997 11:32                                        *
  * Revised:  9 Nov 2000 11:30:08                                     *
  * Version: 1.2                                                      *
  * 1.2  Fixed bug with CLASS= not uppercase                          *
  *                                                                   *
  * From ``Visualizing Categorical Data'', Michael Friendly (2000)    *         
  *-------------------------------------------------------------------
*/
 /*=
Description:
 
 The BARS macro creates an Annotate data set to draw error bars
 in a plot.  The error bars may be drawn for a response variable 
 displayed on the Y axis or on the X axis.  The other (CLASS=)
 variable may be character or numeric.  

Usage:

 The BARS macro is called with keyword parameters.  The VAR= and CLASS=
 variables must be specified.  The length of the error bars should be
 specified with either the BARLEN= parameter or the LOWER= and UPPER=
 parameters.

 The arguments may be listed within parentheses in any order, separated
 by commas. For example: 
 
        %bars(class=age, var=logodds, lower=lower, upper=upper);
        proc gplot data=mydata;
                plot logodds * age / anno=_bars_;
         
Parameters:

* DATA=       Name of input data set [Default: DATA=_LAST_]

* VAR=        Name of the response variable, to be plotted on the
              axis given by the BAXIS= parameter.

* CLASS=      Name of the independent variable, plotted on the
              other axis.

* CVAR=       Name of a curve variable, when PROC GPLOT is used with
              the statement PLOT &VAR * &CLASS = &CVAR.

* BY=         Name of a BY variable for multiple plots, when PROC GPLOT
              is used with the statement BY &BY;.

* BAXIS=      One of X or Y, indicating the axis along which error bars
              are drawn [Default: BAXIS=Y]

* BARLEN=     A numeric variable or constant giving the error bar 
length,
              for example, when the input data set contains a standard
                                  error variable or multiple thereof.  If 
BARLEN= is given,
                                  the LOWER= and UPPER= values are ignored, and 
error bars
                                  are drawn at the values &VAR +- &Z * &BARLEN.

* Z=          A numeric value giving the multiplier of the BARLEN=
              value used to determine the lower and upper error bar
                                  values.

* LOWER=      A numeric variable or constant giving the lower error bar 
value.
              Use the LOWER= and UPPER= parameters if the error bars 
are
                                  non-symmetric or if the lower and upper 
values are contained
                                  as separate variables in the input data set.

* UPPER=      A numeric variable or constant giving the upper error bar 
value.

* TYPE=       Type of error bars to be drawn: one of UPPER, LOWER, or 
BOTH
              and possibly one of ALT or ALTBY.  TYPE=LOWER draws only 
the
                                  lower error bars; TYPE=UPPER draws only the 
upper error bars;
                                  TYPE=BOTH draws both upper and lower error 
bars.  Use
                                  TYPE=ALT BOTH to have the error bars 
alternate (lower, upper)
                                  over observations in the input data set;  use
                                  TYPE=ALTBY BOTH to have the error bars 
alternate over values
                                  of the BY= variable. [Default: TYPE=BOTH]

* SYMBOL=     The plotting symbol, drawn at (&CLASS, &var).  If not 
specified,
              no symbols are drawn.

* COLOR=      Color for lines and symbols, a character constant 
(enclosed
              in quotes), or variable name [Default: COLOR='BLACK']

* LINE=       The Annotate line style used for error bars [Default: 
LINE=1]

* SIZE=       Size of symbols and thickness of lines [Default: SIZE=1]

* BARWIDTH=   The width of error bar tops, in data units [Default:
              BARWIDTH=.5]

* OUT=        Name of the output data set, to be used as an Annotate
              data set with PROC GPLOT [Default: OUT=_BARS_]
                

 */
 

%macro bars(
        data=_LAST_,   /* name of input data set                       */
        var=,          /* name of response variable                    */
        class=,        /* name of independent variable                 */
        cvar=,         /* name of a curve variable                     */
        by=,           /* name of a BY variable, for multiple curves   */
        baxis=y,       /* axis along which error bars are drawn        */
        barlen=,       /* variable or constant giving error bar length */
        z=1,           /* barlen multiplier                            */
        lower=,        /* or, var or constant giving lower bar value   */
        upper=,        /* + var or constant giving upperr bar value    */
        type=both,         /* type of bars: UPPER, LOWER, BOTH, ALT, ALTBY */
        symbol=,       /* plotting symbol, placed at (&class, &var)    */
        color='black', /* color for lines and symbols, 'const' or var  */
        line=1,        /* line style for error bars                    */
        size=1,        /* thickness of lines, size of symbols          */
        barwidth=.5,   /* width of bar tops                            */
        out=_bars_     /* name of output data set                      */
        );

%let abort=0;
%if &var=%str() | &class=%str()
   %then %do;
      %put ERROR: The VAR= and CLASS= parameters must be specified;
      %let abort=1;
      %goto DONE;
   %end;
%let class=%upcase(&class);

%if %upcase(&baxis) = Y
        %then %let oaxis = X;
        %else %let oaxis = Y;

%let type = %upcase(&type);
%let alt1=1; %let alt2=1;
%if %index(&type,ALTBY)  %then %do;
        %let alt1=mod(nby,2)=1;
        %let alt2=mod(nby,2)=0;
        %end;
%else %if %index(&type,ALT) %then %do;
        %let alt1=mod(_n_,2)=1;
        %let alt2=mod(_n_,2)=0;
        %end;

%let ay = &baxis;
%let ax = &oaxis;
%let gp = &class;

%if %length(&barlen) %then %do;
        %let lower = &var - &z * &barlen;
        %let upper = &var + &z * &barlen;
        %end;

/* determine if class variable is char or num */
proc contents data=&data out=_work_ noprint;
data _null_;
        set _work_;
        where (upcase(name)="&CLASS");
        if type=1 then call symput('gtype', 'NUM');
                                else call symput('gtype', 'CHAR');
run; 
%if "&gtype" = "CHAR" %then %let ax = &ax.c;

proc sort data=&data out=_work_;
        by &by &cvar &class;

data &out;
   length function $8 color $8;
   retain xsys ysys '2';
   set _work_ /*(keep=&var &class &by &upper &lower &barlen) */;
        by &by &cvar &class;
        %if %length(&cvar) %then %do;
                if first.&by then ncv+1;
                %end;
        %if %length(&by) %then %do;
                if first.&by then nby+1;
                %end;

   color = &color;
   line  = &line;
   size  = &size;
        
        &ax = &class;   *-- sets x or xc (or y or yc);
        
   &ay= &var              ; function='MOVE'; output;
        %if %length(&symbol) %then %do;
        text="&symbol"      ; function='SYMBOL'; output;
                %end;

        %if %index(&type,UPPER)=0 %then %do;
                if (&alt1) then do;
                &ay= &lower            ; function='DRAW'; output;
                %if "&gtype" = "NUM" %then %do;
                &ax = &gp - &barwidth  ; function='MOVE'; output;
                &ax = &gp + &barwidth  ; function='DRAW'; output;
                &ax = &gp              ; function='MOVE'; output;
                %end;
                end;
        %end;
         
        %if %index(&type,LOWER)=0 %then %do;
                if (&alt2) then do;
                &ay= &upper            ; function='DRAW'; output;
                %if "&gtype" = "NUM" %then %do;
                &ax = &gp - &barwidth  ; function='MOVE'; output;
                &ax = &gp + &barwidth  ; function='DRAW'; output;
                %end;
                end;
        %end;
%done:
%mend;
/*-------------------------------------------------------------------*
  *    Name: inflogis.sas                                             *
  *   Title: Influence plot for logistic regression models            *
  *     Doc: http://www.math.yorku.ca/SCS/vcd/inflogis.html           *
  *                                                                   *
  *-------------------------------------------------------------------*
  *  Author:  Michael Friendly            <friendly@yorku.ca>         *
  * Created:  14 Nov 1993 10:42:11                                    *
  * Revised:  24 May 2000 14:49:18                                    *
  * Version:  1.3                                                     *
  *  - Added TRIALS= parameter(for event/trials syntax)               *
  *  - Added OUT= parameter                                           *
  *  - Added INFL= parameter (what's influential?)                    *
  *                                                                   *
  * Dependencies:  %gskip (needed for eps/gif only)                   *
  *                                                                   *
  * From ``Visualizing Categorical Data'', Michael Friendly (2000)    *         
  *-------------------------------------------------------------------
*/
 /*
Description:

This SAS macro produces influence plots for a logistic regression
model.  The plot shows a measure of badness of fit for a given
case (DIFDEV or DIFCHISQ) vs.  the fitted probability (PRED) or
leverage (HAT), using an influence measure (C or CBAR) as the
size of a bubble symbol.

Usage:

 The inflogis macro is called with keyword parameters.
 The arguments may be listed within parentheses in any order, separated
 by commas. For example:

        %include data(arthrit);
        %inflogis(data=arthrit,
                        y=better,
                        x=_sex_ _treat_ age,
                        id=case,
                        );

Parameters:

* DATA=       Specifies the name of the input data set to be analyzed.
              [Default: DATA=_LAST_]

* Y=          Name of the response variable

* TRIALS=     Name of trials variable (for event/trials syntax)

* X=          Names of predictors

* CLASS=      Names of class variables among predictors (V8)

* ID=         Name of observation ID variable (char)

* OUT=        Name of the output data set [Default: OUT=_DIAG_]

* GY=         Ordinate for plot: DIFDEV or DIFCHISQ [Default: 
GY=DIFDEV]

* GX=         Abscissa for plot: PRED or HAT [Default: GX=PRED]

* BUBBLE=     Bubble proportional to: C or CBAR [Default: BUBBLE=C]

* LABEL=      Points to label: ALL, NONE, or INFL [Default: LABEL=INFL]

* DEV=        DIFDEV/DIFCHISQ criterion for infl pts [Default: DEV=4]

* INFL=       Specifies the criterion used to determine which 
observations
              are influential (when used with LABEL=INFL).
              [Default: INFL=%STR(DIFCHISQ > &DEV OR &BUBBLE > 1)]

* LSIZE=      Observation label size.  The height of other text is 
controlled
              by the HTEXT= goption. [Default: LSIZE=1.5]

* LCOLOR=     Observation label color [Default: LCOLOR=BLACK]

* LPOS=       Observation label position [Default: LPOS=5]

* BSIZE=      Bubble size scale factor [Default: BSIZE=10]

* BSCALE=     Bubble size proportional to AREA or RADIUS [Default: 
BSCALE=AREA]

* BCOLOR=     Bubble color [Default: BCOLOR=BLACK]

* REFCOL=     Color of reference lines [Default: REFCOL=BLACK]

* REFLIN=     Line style for reference lines; 0->NONE [Default: 
REFLIN=33]

* LOPTIONS=   Options for PROC LOGISTIC [Default: LOPTIONS=NOPRINT]

* NAME=       Name of the graph in the graphic catalog [Default: 
NAME=INFLOGIS]

* GOUT=       Name of the graphics catalog

 */

%macro inflogis(
   data=_last_,    /* Name of input data set                  */
   y=,             /* Name of criterion variable              */
   trials=,        /* Name of trials variable                 */
   x=,             /* Names of predictors                     */
        class=,         /* Names of class variables (V8+)          */
   id=,            /* Name of observation ID variable (char)  */
   out=_diag_,     /* Name of the output data set             */
   gy=DIFDEV,      /* Ordinate for plot: DIFDEV or DIFCHISQ   */
   gx=PRED,        /* Abscissa for plot: PRED or HAT          */
   bubble=C,       /* Bubble proportional to: C or CBAR       */
   label=INFL,     /* Points to label: ALL, NONE, or INFL     */
   infl=%str(difchisq > &dev or &bubble > 1),
   dev=4,          /* DIFDEV/DIFCHISQ criterion for infl pts  */
   lsize=1.5,      /* obs label size.  The height of other    */
                   /* text is controlled by the HTEXT= goption*/
   lcolor=BLACK,   /* obs label color                         */
   lpos=5,         /* obs label position                      */
   bsize=10,       /* bubble size scale factor                */
   bscale=AREA,    /* bubble size proportional to AREA or RADIUS */
   bcolor=BLACK,   /* bubble color                            */
   refcol=BLACK,   /* color of reference lines                */
   reflin=33,      /* line style for reference lines; 0->NONE */
   loptions=noprint,/* options for PROC LOGISTIC              */
   name=INFLOGIS,
   gout=
   );

%let nv = %numwords(&x);        /* number of predictors */
%let nx = %numwords(&gx);       /* number of abscissa vars */
%let ny = %numwords(&gy);       /* number of ordinate vars */
%if &nv = 0 %then %do;
    %put ERROR: List of predictors (X=) is empty;
    %goto done;
    %end;

%let gx=%upcase(&gx);
%let gy=%upcase(&gy);
%let label=%upcase(&label);
%let bubble=%upcase(&bubble);
%if not ((%bquote(&bubble) = C)
    or   (%bquote(&bubble) = CBAR)) %then %do;
    %put BUBBLE=%bquote(&bubble) is not valid. BUBBLE=C will be used;
    %let bubble=C;
    %end;

%if %length(&class) > 0 and &sysver < 8 %then %do;
        %let class=;
        %put INFLOGIS:  The CLASS= parameter is not supported in SAS 
&sysver;
        %end;

proc logistic nosimple data=&data &loptions ;
   %if %length(&class)>0 %then %do;
                class &class;
                %end;
   %if %length(&trials)=0 %then %do;
      model &y         = &x / influence;
      %end;
   %else %do;
      model &y/&trials = &x / influence;
      %end;
   output out=&out h=hat pred=pred
                     difdev=difdev
                     difchisq=difchisq
                     c=c  cbar=cbar
                     resdev=resdev;

data &out;
   set &out;
   label difdev='Change in Deviance'
         difchisq='Change in Pearson Chi Square'
         hat = 'Leverage (Hat value)'
         studres = 'Studentized deviance residual';
   studres = resdev / sqrt(1-hat);

%do i=1 %to &ny;
   %let gyi = %scan(&gy, &i);
   %do j=1 %to &nx;
   %let gxj = %scan(&gx, &j);
   %put Plotting &gyi vs &gxj ;

   %if &label ^= NONE %then %do;
   data _label_;
      set &out nobs=n;
      length xsys $1 ysys $1 function $8 position $1 text $16 color $8;
      retain xsys '2' ysys '2' function 'LABEL' color "&lcolor";
      retain hcrit;
      drop hcrit;
      *keep &id x y xsys ysys function position text color size 
position
      hat difchisq difdev &bubble;
      x = &gxj;
      y = &gyi;
      %if &id ^= %str() %then %do;
         text = left( &id );
         %end;
      %else %do;
         text = put(_n_,3.0);
         %end;
      if _n_=1 then do;
         hcrit = 2 * (&nv+1)/n;
         put 'Hatvalue criterion: ' hcrit;
         call symput('hcrit',put(hcrit,4.3));
         end;
      size=&lsize;
      position="&lpos";
      %if &label = INFL %then %do;
/*         if %scan(&gy,1) > &dev
                   or difchisq > &dev
         or hat > hcrit  
         or &bubble > 1
            then output;   */
         if &infl then output;
         %end;
   run;
   %if &i=1 and &j=1 %then %do;
      proc print data=_label_;
         var &y &x pred studres hat difchisq difdev &bubble;
                   format hat 3.2 pred 4.3 studres 6.3 difdev difchisq 6.3;
      %if &id ^= %str() %then %do;
         id &id;
         %end;
      %else %do;
         id text;
         %put WARNING:  Observations are identified by sequential 
number (TEXT) because no ID= variable was specified.;
         %end;
      %end;
   %end;  /* &label ^= NONE */

   proc gplot data=&out &GOUT ;
     bubble &gyi * &gxj = &bubble /
           %if &label ^= NONE %then %do;
           annotate=_label_
           %end;
           frame
           vaxis=axis1 vminor=1 hminor=1
           %if &reflin ^= 0 %then %do;
           %if (&gyi = DIFDEV) or (&gyi = DIFCHISQ) %then %do;
              vref=4       lvref=&reflin  cvref=&refcol
              %end;
           %if (&gxj = HAT) %then %do;
              href= &hcrit lhref=&reflin  chref=&refcol
              %end;
           %end;
           bsize=&bsize  bcolor=&bcolor  bscale=&bscale
           name="&name"
           Des="Logistic influence plot for &y";
     axis1 label=(a=90 r=0);
   run; quit;
   %gskip;
   %end;   /* gx loop */
%end;   /* gy loop */
%done:
quit;
%mend;

%macro numwords(lst);
   %let i = 1;
   %let v = %scan(&lst,&i);
   %do %while (%length(&v) > 0);
      %let i = %eval(&i + 1);
      %let v = %scan(&lst,&i);
      %end;
   %eval(&i - 1)
%mend;

/*-------------------------------------------------------------------*
  *    Name: mosmat.sas                                               *
  *   Title: Macro interface for mosaic matrices                      *
  *     Doc: http://www.math.yorku.ca/SCS/vcd/mosmat.html             *
  *-------------------------------------------------------------------*
  *  Author:  Michael Friendly            <friendly@yorku.ca>         *
  * Created:   3 Nov 1998 09:27:20                                    *
  * Revised:  10 Nov 2000 10:27:37                                    *
  * Version: 1.1                                                      *
  * 1.1  Fixed bug in call to panels when mosmat was called more      *
  *      than once in a job or session.                               *
  *                                                                   *
  * Requires: %gdispla, %panels                                       *
  *                                                                   *
  * From ``Visualizing Categorical Data'', Michael Friendly (2000)    *         
  *-------------------------------------------------------------------
*/
 /*
Description:
 
 The MOSMAT macro provides an easily used macro interface to the
 MOSAICS and MOSMAT SAS/IML programs, to create a scatterplot
 matrix of mosaic displays for all pairs of categorical variables.

 Each pairwise plot shows the marginal frequencies to the order 
specified
 by the PLOTS= parameter.  When PLOTS=2, these are the bivariate 
margins,
 and the residuals from marginal independence are shown by shading.
 When PLOTS>2, the observed frequencies in a higher-order marginal
 table are displayed, and the model fit to that marginal table is
 determined by the FITTYPE= parameter.
 
Usage:

 The parameters for the mosaic macro are like those of the SAS/IML
 programs, except:
 
* DATA=    Specifies the name of the input dataset.  Should contain
           one observation per cell, the variables listed in VAR=
           and COUNT=.  [Default: DATA=_LAST_]

* VAR=     Specifies the names of the factor variables for the 
           contingency table. Abbreviated variable lists (e.g., V1-V3) 
           are not allowed. The levels of the factor variables may be
           character or numeric, but are used as is in the input data.
           Upper/lower case in the variable names is respected in the
           diagonal label panels.
           You may omit the VAR= variables if variable names are used 
in
           the VORDER= parameter.

* COUNT=   Specifies the name of the frequency variable in the dataset.
           The COUNT= variable must be specified.

* PLOTS=   The PLOTS= parameter determines the number of table 
variables
           displayed in each pairwise mosaic. [Default: PLOTS=2]

* CONFIG=  For a user-specified model, config= gives the terms in the
           model, separated by '/'.  For example, to fit the model of
           no-three-way association, specify CONFIG=1 2 / 1 3 / 2 3,
           or (using variable names) CONFIG = A B / A C / B C.
           Note that the numbers refer to the variables after they
           have been reordered, either sorting the data set, or by the
           VORDER= parameter.

* VORDER=  Specifies either the names of the variables or their indices
           in the desired order in the mosaic.  Note that the using the
           VORDER= parameter keeps the factor levels in their order in
           the data.

* SORT=    Specifies whether and how the input data set is to be sorted
           to produce the desired order of variables in the mosaic.
           SORT=YES sorts the data in the reverse order that they are
           listed in the VAR= paraemter, so that the variables are 
           entered in the order given in the VAR= parameter.  
Otherwise,
           SORT= lists the variable names, possibly with the DESENDING
           or NOTSORTED options in the reverse of the desired order.
           e.g., SORT=C DESCENDING B DESCENDING A.
*/

%macro mosmat(
   data=_last_,     /* Name of input dataset              */
   var=,            /* Names of factor variables          */
   count=count,     /* Name of the frequency variable     */
   fittype=joint,   /* Type of models to fit              */
   config=,         /* User model for fittype='USER'      */
   devtype=gf,      /* Residual type                      */
   shade=,          /* shading levels for residuals       */
   plots=2,         /* which plots to produce             */
   colors=blue red, /* colors for + and - residuals       */
   fill=HLS HLS,    /* fill type for + and - residuals    */
   split=V H,       /* split directions                   */
   vorder=,         /* order of variables in mosaic       */
   htext=,          /* height of text labels              */
   font=,           /* font for text labels               */
   title=,          /* title for plot(s)                  */
   space=,          /* room for spacing the tiles         */
   fuzz=,           /* smallest abs resid treated as zero */
        abbrev=,         /* abbreviate variable names in model */
   sort=YES,        /* Sort variables first?              */
   );

%if %length(&var)=0 & %length(&vorder)>0 %then %do;
        %if %verify(&vorder, %str(0123456789 ))>0 %then %let var=&vorder;
        %end;

%if %length(&var)=0 %then %do;
   %put ERROR: You must specify the VAR= classification variables.;
   %goto done;
   %end;

%if %length(&count)=0 %then %do;
   %put ERROR: You must specify the COUNT= frequency variable.;
   %goto done;
   %end;

%if %upcase(&data)=_LAST_ %then %let data = &syslast;
%let sort=%upcase(&sort);
%if &sort^=NO %then %do;
   %if &sort=YES %then %let sort=%reverse(&var);
   proc sort data=&data out=_sorted_;
      by &sort;
        %let data=_sorted_;
%end;
   
%if %upcase(&fittype)=USER and %length(&config)=0 %then %do;
   %put ERROR: You must specify the USER model with the CONFIG= 
argument;
   %goto done;
   %end;
   
%if %length(&config) %then %do;
   %*-- Translate / in config to , for iml;
   data _null_;
      length config $ 200;
      config = "&config";
      config = translate(config, ',', '/');
      call symput('config', trim(config));
   run;
   %put config: &config;
%end;

%gdispla(OFF);

   %*--Becuase of the large number of modules loaded, it may be
       necessary to adjust the symsize value;
proc iml /* worksize=10000 */ symsize=256;
   reset storage=mosaic.mosaic;
   load module=_all_;

   %include mosaics(mosmat); 

start str2vec(string);
   *-- String to character vector;
   free out;
   i=1;
   sub = scan(string,i,' ');
   do while(sub ^=' ');
      out = out || sub;
      i = i+1;
      sub = scan(string,i,' ');
   end;
   return(out);
   finish;

start symput(name, val);
   *-- Create a macro variable from a char/numeric scalar;
   if type(val) ='N'
      then value = trim(char(val));
      else value = val;
   call execute('%let ', name, '=', value, ';');
   finish;

   vnames = str2vec("&var");    *-- Preserve case of var names;

   %*-- Read and reorder counts;
   run readtab("&data","&count", vnames, table, levels, lnames);
   nv = ncol(levels);
   run symput('nv', nv);
   %if %length(&vorder) %then %do;
      vorder  = { &vorder };
      run reorder(levels, table, vnames, lnames, vorder);
      %end;

   %*-- These variables have their defaults set in mosaics(globals);
        %*   (Dont set them here unless passed as parameters.);
   %if %length(&space)>0 %then %do;
      space={&space};
      %end;
   %if %length(&shade)>0 %then %do;
      shade={&shade};
      %end;

   %*-- These variables have their defaults set in mosmat module;
   %if %length(&htext)>0 %then %do;
      htext=&htext;
      %end;
   %if %length(&font)>0 %then %do;
      font = "&font";
      %end;

   colors={&colors};
   filltype={&fill};
   split={&split};
   title = "&title";
   fittype = "&fittype";
   devtype = "&devtype";
   %if %length(&config)>0 %then %do; config=t({&config}); %end;
   %if %length(&fuzz)>0   %then %do; fuzz=&fuzz;  %end;
   %if %length(&abbrev)>0 %then %do; abbrev=&abbrev;  %end;

   plots = &plots;
   run mosmat(levels, table, vnames, lnames, plots, title);
quit;
%gdispla(ON);
%if &nv>0 %then %do;
        %let first = %eval(1 - &nv*&nv);
   %panels(rows=&nv, cols=&nv, order=down, first=&first, last=0);
        %end;

%done:

%mend;

%macro reverse(list);
   %local result i v;
   %let result =;
   %let i = 1;
   %let v = %scan(&list,&i,%str( ));
   %do %while (%length(&v) > 0);
      %let result = &v &result;
      %let i = %eval(&i + 1);
      %let v = %scan(&list,&i,%str( ));
      %end;
   &result
%mend;

/*-------------------------------------------------------------------*
  *    Name: pscale.sas                                               *
  *   Title: Construct annotations for a probability scale            *
  *     Doc: http://www.math.yorku.ca/SCS/vcd/pscale.html             *
  *-------------------------------------------------------------------*
  *  Author:  Michael Friendly            <friendly@yorku.ca>         *
  * Created:  2 Nov 1995 12:07:41                                     *
  * Revised:  4 Dec 1997 12:07:41                                     *
  * Version: 1.0                                                      *
  *                                                                   *
  * From ``Visualizing Categorical Data'', Michael Friendly (2000)    *         
  *-------------------------------------------------------------------
*/
 /* Description:
 
 The PSCALE macro constructs an annotate data set to draw an unequally-
 spaced scale of probability values on the vertical axis of a plot
 (at either the left or right).  The probabilities are assumed to 
 correspond to equally-spaced values on a scale corresponding to
 Normal quantiles (using the probit transformation) or Logistic
 quantiles (using the logit transformation).

Usage:

 The PSCALE macro is called with keyword parameters.
 The arguments may be listed within parentheses in any order, separated
 by commas. For example: 
 
        %pscale(out=pscale);
        proc gplot;
                plot logit * X / anno=pscale;
 
Parameters:

* ANNO=       Name of annotate data set [Default: ANNO=PSCALE]

* OUT=        Synonym for ANNO=

* SCALE=      Linear scale: logit or probit [Default: SCALE=LOGIT]

* LO=         Low scale value [Default: LO=-3]

* HI=         High scale value [Default: HI=3]

* PROB=       List of probability values to be displayed on the axis,
              in the form of a list acceptable in a DO statement.
              [Default: PROB=%str(.05, .1 to .9 by .1, .95)]

* AT=         X-axis percent for the axis.  AT=100 plots the axis at
              the right; AT=0 plots the axis at the left. [Default: 
AT=100]

* TICKLEN=    Length of tick marks [Default: TICKLEN=1.3]

* SIZE=       Size of value labels

* FONT=       Font for value labels
                
 */
 
%macro pscale(
        anno=pscale,      /* name of annotate data set     */
        out=,             /* synonym for anno=             */
        scale=logit,      /* linear scale: logit or probit */
        lo=-3,            /* low scale value               */
        hi=3,             /* high scale value              */
        prob=%str(.05, .1 to .9 by .1, .95),
        at=100,           /* x-axis percent for the axis   */
        ticklen=1.3,      /* length of tick marks          */
        size=,            /* size of value labels          */
        font=             /* font for value labels         */
        );

%let scale=%upcase(&scale);
%if %length(&out) %then %let anno=&out;

data &anno;
   xsys = '1';            * percent values for x;
   ysys = '2';            * data values for y;
   length text $4 function $8;
        %if %length(&font) %then %do;
           style="&font";
                %end;
        %if %length(&size) %then %do;
           size=&size;
                %end;
        drop prob scale;
        loc = &at;
   do prob = &prob ;
                %if &scale=LOGIT %then %do;
        scale = log( prob / (1-prob) );   * convert to logit;
                        %end;
                %else %if &scale=PROBIT %then %do;
        scale = probit( prob );           * convert to normal 
quantile;
                        %end;
                %else %do;
                        scale = &scale;
                        %end;

      if (&lo <= scale <= &hi) then do;
         y = scale;
         x = &at + sign(50-&at)*&ticklen; function='MOVE';
                                output;   * tick marks;
         x = &at;  function='DRAW ';
                                output;
         text = put(prob,3.2);
                        position='6';  * values;
         function='LABEL  '; output;
         end;
      end;
        run;
%mend;

/*-------------------------------------------------------------------*         
  *    Name: biplot.sas                                               *
  *   Title: Generalized biplot of observations and variables         *
  *          Uses IML.                                                *         
  *     Doc: http://www.math.yorku.ca/SCS/vcd/biplot.html             *
  *-------------------------------------------------------------------*         
  *  Author:  Michael Friendly            <friendly@yorku.ca>         *         
  * Created:   1 Mar 1989 13:16:36                                    *         
  * Revised:   9 Nov 2000 11:33:21                                    *         
  * Version:  1.9                                                     *         
  * 1.5  Added dimension labels, fixed problem with dim=3,            *
  *      Added colors option, Fixed problem with var=_NUM_            *
  * 1.6  Added power transformation (for log(freq))                   *
  *      Added point symbols, marker styles (interp=)                 *
  *      Made ID optional, can be char or numeric                     *
  *      Fixed bug introduced with ID                                 *
  * 1.7  Added code to equate axes if HAXIS= and VAXIS= are omitted   *
  *      Added code to preserve case of variable names                *
  *      Fixed positioning of variable names                          *
  * 1.8  Allow abbreviated variable lists (X1-X5, etc.)               *
  *      Allow glm-style input (var=A B, response=Y, id=)             *
  *      Added VTOH for PPLOT printer plots                           *
  *      Added FACTYPE=COV and VARDEF=N-1 (Tokuhisa SUZUKI)           *
  * 1.9  Aded POWER= for analysis of log freq & other generalizations *
  *      Added HTEXT= to control size of obs/var labels               *
  *                                                                   *         
  *      From ``SAS System for Statistical Graphics, First Edition''  *         
  *      Copyright(c) 1991 by SAS Institute Inc., Cary, NC, USA       *         
  *      ``Visualizing Categorical Data'', Michael Friendly (2000)    *         
  *-------------------------------------------------------------------
*/        

 /* Description:
 
 The BIPLOT macro produces generalized biplot displays for multivariate
 data, and for two-way and multi-way tables of either quantitative
 or frequency data.  It also produces labeled plots of the row and
 column points in 2 dimensions, with a variety of graphic options,
 and the facility to equate the axes automatically.


Input data:

 The macro takes input in one of two forms:

 (a) A data set in table form, where the columns are separate
 variables and the rows are separate observations (identified by
 a row ID variable).    In this arrangment, use the VAR= argument
 to specify this list of variables and the ID= variable to specify
 an additional variable whose values are labels for the rows.
 
 Assume a dataset of reaction times to 4 topics in 3 experimental 
tasks,
 in a SAS dataset like this:

     TASK   TOPIC1   TOPIC2   TOPIC3   TOPIC4
          Easy     2.43     3.12     3.68     4.04
          Medium   3.41     3.91     4.07     5.10
          Hard     4.21     4.65     5.87     5.69
             
 For this arrangment, the macro would be invoked as follows:
   %biplot(var=topic1-topic4, id=task);

 (b) A contingency table in frequency form (e.g., the output from
 PROC FREQ), or multi-way data in the univariate format used as
 input to PROC GLM.  In this case, there will be two or more factor
 (class) variables, and one response variable, with one observation
 per cell.  For this form, you must use the VAR= argument to
 specify the two (or more) factor (class) variables, and specify
 the name of response variable as the RESPONSE= parameter. Do not 
specify
 an ID= variable for this form.

 For contingency table data, the response will be the cell frequency, 
and
 you will usually use the POWER=0 parameter to perform an analysis of
 the log frequency.
  
 The same data in this format would have 12 observations, and look 
like:

                TASK  TOPIC    RT
                Easy    1     2.43
                Easy    2     3.12
                Easy    3     3.68
                ...
                Hard    4     5.69
                
 For this arrangment, the macro would be invoked as follows:
   %biplot(var=topic task, response=RT);

 In this arrangement, the order of the VAR= variables does not matter.
 The columns of the two-way table are determined by the variable which
 varies most rapidly in the input dataset (topic, in the example).

Usage:

 The BIPLOT macro is defined with keyword parameters.  The VAR=
 parameter must be specified, together with either one ID= variable
 or one RESPONSE= variable.

 The arguments may be listed within parentheses in any order, separated
 by commas. For example: 
 
        %biplot();
 
 The plot may be re-drawn or customized using the output OUT=
 data set of coordinates and the ANNO= Annotate data set.

 The graphical representation of biplots requires that the axes in the
 plot are equated, so that equal distances on the ordinate and abscissa
 represent equal data units (to perserve distances and angles in the 
plot).
 A '+', whose vertical and horizontal lengths should be equal,
 is drawn at the origin to indicate whether this has been achieved.

 If you do not specifiy the HAXIS= and YAXIS= parameters, the EQUATE
 macro is called to generate the AXIS statements to equate the
 axes.  In this case the INC=, XEXTRA=, and YEXTRA=, parameters
 may be used to control the details of the generated AXIS statements.

 By default, the macro produces and plots a two-dimensional solution.

Parameters:

* DATA=          Specifies the name of the input data set to be 
analyzed.
             [Default: DATA=_LAST_]

* VAR=           Specifies the names of the column variables 
             when the data are in table form, or the names of the
                                 factor variables when the data are in 
frequency form
                                 or GLM form. [Default: VAR=_NUM_]

* ID=        Observation ID variable when the data are in table form.

* RESPONSE=  Name of the response variable (for GLM form)    

* DIM=       Specifies the number of dimensions of the CA/MCA solution.
             Only two dimensions are plotted by the PPLOT and GPLOT 
options,
                                 however. [Default: DIM=2]

* FACTYPE=   Biplot factor type: GH, SYM, JK or COV [Default: 
FACTYPE=SYM]

* VARDEF=    Variance def for FACTYPE=COV: DF | N [Default: VARDEF=DF]

* SCALE=     Scale factor for variable vectors [Default: SCALE=1]

* POWER=     Power transform of response [Default: POWER=1]

* OUT=           Specifies the name of the output data set of 
coordinates.
             [Default: OUT=BIPLOT]

* ANNO=          Specifies the name of the annotate data set of 
labels
             produced by the macro.  [Default: ANNO=BIANNO]

* STD=       How to standardize columns: NONE|MEAN|STD [Default: 
STD=MEAN]

* COLORS=    Colors for OBS and VARS [Default: COLORS=BLUE RED]

* SYMBOLS=   Symbols for OBS and VARS [Default: SYMBOLS=NONE NONE]

* INTERP=    Markers/interpolation for OBS and VARS. [Default: 
INTERP=NONE VEC]

* LINES=     Lines for OBS and VARS interpolation [Default: LINES=33 
20]

* PPLOT=     Produce a printer plot? [Default: PPLOT=NO]

* VTOH=      The vertical to horizontal aspect ratio (height of one
             character divided by the width of one character) of the
                                 printer device, used to equate axes for a 
printer plot,
                                 when PPLOT=YES.  [Default: VTOH=2]

* GPLOT=     Produce a graphics plot? [Default: GPLOT=YES]

* PLOTREQ=   The dimensions to be plotted [Default: PLOTREQ=DIM2*DIM1]

* HAXIS=     AXIS statement for horizontal axis.  If both HAXIS= and
             VAXIS= are omitted, the program calls the EQUATE macro to
                                 define suitable axis statements.  This creates 
the axis
                                 statements AXIS98 and AXIS99, whether or not a 
graph
                                 is produced.

* VAXIS=     The name of an AXIS statement for the vertical axis.

* INC=       The length of X and Y axis tick increments, in data units
             (for the EQUATE macro).  Ignored if HAXIS= and VAXIS= are
             specified. [Default: INC=0.5 0.5]

* XEXTRA=    # of extra X axis tick marks at the left and right.  Use 
to
             allow extra space for labels. [Default: XEXTRA=0 0]

* YEXTRA=    # of extra Y axis tick marks at the bottom and top.
             [Default: YEXTRA=0 0]

* M0=        Length of origin marker, in data units. [Default: M0=0.5]

* DIMLAB=    Prefix for dimension labels [Default: DIMLAB=Dimension]

* NAME=      Name of the graphics catalog entry [Default: NAME=BIPLOT]        

 */
                                                                                 
%macro biplot(                                                                  
        data=_LAST_,       /* Data set for biplot                       
*/         
        var=_NUM_,         /* Variables for biplot                      
*/         
        id=,               /* Observation ID variable (obs x var input) 
*/
        response=,         /* Name of response variable (glm input)     
*/    
        dim=2,             /* Number of biplot dimensions               
*/         
   factype=SYM,       /* Biplot factor type: GH, SYM, JK or COV    */
        vardef=DF,         /* Variance def for factype=COV: DF | N      
*/
        scale=1,           /* Scale factor for variable vectors         
*/
        power=1,           /* Power transform of response               
*/
        out=BIPLOT,        /* Output dataset: biplot coordinates        
*/         
        anno=BIANNO,       /* Output dataset: annotate labels           
*/         
        std=MEAN,          /* How to standardize columns: NONE|MEAN|STD 
*/
        colors=BLUE RED,   /* Colors for OBS and VARS                   
*/
        symbols=none none, /* Symbols for OBS and VARS                  
*/
        interp=none vec,   /* Markers/interpolation for OBS and VARS    
*/          
        lines=33 20,       /* Lines for OBS and VARS interpolation      
*/
        pplot=NO,          /* Produce printer plot?                     
*/
        vtoh=2,            /* PPLOT cell aspect ratio                   
*/
        gplot=YES,         /* Produce hi-res plot?                      
*/
        plotreq=,          /* dimensions to be plotted                  
*/
        haxis=,            /* AXIS statement for horizontal axis        
*/
        vaxis=,            /* and for vertical axis- use to equate axes 
*/
        inc=0.5 0.5,       /* x, y axis tick increments                 
*/
        xextra=0 0,        /* # of extra x axis tick marks              
*/
        yextra=0 0,        /* # of extra y axis tick marks              
*/
        m0=0.5,            /* Length of origin marker                   
*/
        dimlab=,           /* Dimension label                           
*/
        htext=1.5,
        name=biplot        /* Name for graphics catalog entry           
*/
        );        
                                                                                
%let abort=0;
%let std=%upcase(&std);
%let pplot=%upcase(&pplot);
%let gplot=%upcase(&gplot);
                                              
%if %length(&vardef) = 0 %then %let vardef=N;
%if %upcase(&vardef) = DF %then %let vardef = %str( N - 1 ) ;

%let factype=%upcase(&factype);                                                 
      %if &factype=GH  %then %let p=0;                                          
%else %if &factype=SYM %then %let p=.5;                                         
%else %if &factype=JK  %then %let p=1;                                          
%else %if &factype=COV %then %let p=0 ;
%else %do;                                                                      
   %put BIPLOT: FACTYPE must be GH, SYM, JK, or COV "&factype" is not 
valid.;
        %let abort=1;       
   %goto done;                                                                  
   %end;                                                                        
%if &data=_LAST_ %then %let data=&syslast;

%* --- Transform variable lists (X1-X10) into expanded form for IML ---
;
%if %index(&var,-) >0 or 
        "%upcase(&var)"="_NUM_" or 
        "%upcase(&var)"="_NUMERIC_"  %then
 %do;
 %let ovar = &var;
 data _null_;
    set &data (obs=1);
       %*-- convert shorthand variable list to long form;
     length _vname_ $ 8 _vlist_ $ 200;
     array _xx_ &var;
     _vname_ = ' ';
          i=0;
     do over _xx_;
        call vname(_xx_,_vname_);
        _vlist_ = trim(_vlist_)|| ' ' || trim(_vname_);
                  i+1;
     end;
     call symput( 'VAR', trim(_vlist_) );
     call symput( 'NV', trim(put(i,2.0)) );
          put "NOTE:  Variable list (&ovar) expanded to VAR=" _vlist_;
  run;
  %if &nv=0 %then %do;
    %put ERROR: No variables were found in the VAR=&ovar list;
    %goto DONE;
                %end;
  %end;

%if %length(&id) = 0 %then %do;
    %if %bquote(%scan(&var,2,%str( ))) = %str() or
             %length(&response)=0 %then %do;
    %put ERROR: When no ID= variable is specified, you must supply
              two+ VAR= variable names, and the name of the RESPONSE=
                        variable.;
    %goto DONE;
         %end;
%end;

                                                                                
%*-- Set defaults which depend on other options;
%if %length(&plotreq)=0 %then %do;
        %if &dim=2 %then %let plotreq =  dim2 * dim1;
        %if &dim=3 %then %let plotreq =  dim2 * dim1 = dim3;
        %else %let plotreq =  dim2 * dim1;
        %end;

%if %length(&dimlab)=0 %then %do;
        %if &dim=2 %then %let dimlab = Dimension;
        %if &dim=3 %then %let dimlab = Dim;
        %end;

proc iml;                                                                       
start biplot(y,id,vars,out, g, scale);                                      
   N = nrow(Y);                                                                 
   P = ncol(Y);                                                                 
   %if &std = NONE                                                              
       %then Y = Y - Y[:] %str(;);             /* remove grand mean */          
       %else Y = Y - J(N,1,1)*Y[:,] %str(;);   /* remove column means 
*/        
   %if &std = STD %then %do;                                                    
      S = sqrt(Y[##,] / ( &vardef ) );
      Y = Y * diag (1 / S );                                                    
   %end;                                                                        
   print "Standardization Type: &std  (VARDEF = &vardef) " ;
                                                                                
   *-- Singular value decomposition:                                            
        Y is expressed as U diag(Q) V prime                                     
        Q contains singular values, in descending order;                        
   call svd(u,q,v,y);                                                           
                                                                                
   reset fw=8 noname;                                                           
   percent = 100*q##2 / q[##];                                                  
   cum = cusum(percent);                                                           
   c1={'Singular Values'};                                                      
   c2={'Percent'};                                                              
   c3={'Cum % '};                                                               
        ls = 40;
        do i=1 to nrow(q);
                row = cshape('*', 1, 1, round(ls#percent[i]/max(percent)));
                hist = hist // cshape(row,1,1,ls,' ');
                end;
   print "Singular values and variance accounted for",,                         
         q       [colname=c1 format=9.4 ]                                       
         percent [colname=c2 format=8.2 ]                                       
         cum     [colname=c3 format=8.2 ]
                        hist    [colname={'Histogram of %'}];                                     
                                                                                
   d = &dim ;
        *-- Assign macro variables for dimension labels;
        lab = '%let p' + char(t(1:d),1) + '=' + 
left(char(percent[t(1:d)],8,1)) + ';';
        call execute(lab);

   *-- Extract first  d  columns of U & V, and first  d  elements of Q;         
   U = U[,1:d];                                                                 
   V = V[,1:d];                                                                 
   Q = Q[1:d];                                                                  
                                                                                
   *-- Scale the vectors by QL, QR;                                             
   * Scale factor 'scale' allows expanding or contracting the variable          
     vectors to plot in the same space as the observations;                     
   QL= diag(Q ## g );                                                       
   QR= diag(Q ## (1-g));                                                    
   A = U * QL;                                                                  
   B = V * QR;
        ratio = max(sqrt(A[,##])) /  max(sqrt(B[,##]));
        if scale=0 then scale=ratio;                                                         
        print 'OBS / VARS ratio:' ratio 'Scale:' scale;
   B = B # scale;                                                          

  %if %upcase( &factype ) = COV %then
    %do ;
      A = sqrt( &vardef       ) # A ;
      B = ( 1 / sqrt(&vardef) ) # B ;
    %end ;
   OUT=A // B;                                                                  
                                                                                
   *-- Create observation labels;                                               
   id = shape(id,n,1) // shape(vars,p,1);                                                            
   type = repeat({"OBS "},n,1) // repeat({"VAR "},p,1);                         
   id  = concat(type, id);                                                      
                                                                                
        if upcase("&factype")='COV'
                then factype='COV';
        else factype = {"GH" "Symmetric" "JK"}[1 + 2#g];                              
   print "Biplot Factor Type", factype;                                         
                                                                                
   cvar = concat(shape({"DIM"},1,d), char(1:d,1.));                             
   print "Biplot coordinates",                                                  
         out[rowname=id colname=cvar f=9.4];

   create &out  from out[rowname=id colname=cvar];                              
   append from out[rowname=id];                                                 
finish;                                                                         

start power(x, pow);
                if pow=1 then return(x);
                if any(x <= 0) then x = x + ceil(min(x)+.5);
                if abs(pow)<.001 then xt =  log(x);
                        else xt = ((x##pow)-1) / pow;
        return (xt);
        finish;

start str2vec(string);
        *-- String to character vector;
   free out;
   i=1;
   sub = scan(string,i,' ');
   do while(sub ^=' ');
      out = out || sub;
      i = i+1;
      sub = scan(string,i,' ');
   end;
        return(out);
        finish;

/* --------------------------------------------------------------------
 Routine to read frequency and index/label variables from a SAS dataset
 and construct the appropriate levels, and lnames variables

 Input:  dataset - name of SAS dataset (e.g., 'mydata' or 'lib.nydata')
         variable - name of variable containing the response
                   vnames - character vector of names of index variables
 Output: dim (numeric levels vector)
         lnames (K x max(dim)) 
  --------------------------------------------------------------------
*/
 
start readtab(dataset, variable, vnames, table, dim, lnames);
        if type(vnames)^='C'
                then do;
                        print 'VNAMES argument must be a character vector';
                        show vnames;
                        return;
                end;            
   if nrow(vnames)=1 then vnames=vnames`;

        call execute('use ', dataset, ';');
        read all var variable into table;
        run readlab(dim, lnames, vnames);
        call execute('close ', dataset, ';');
        reset noname;
        print 'Variable' variable 'read from dataset' dataset,
                'Factors ordered:' vnames lnames;
        reset name;
        finish;

/* Read variable index labels from an open dataset, construct a dim 
   vector and lnames matrix so that variables are ordered correctly
   for mosaics and ipf (first varying most rapidly).

        The data set is assumed to be sorted by all index variables.  If
        the observations were sorted by A B C, the output will place
        C first, then B, then A.
   Input:    vnames (character K-vector)
 */

start readlab(  dim, lnames, vnames);
        free span lnames dim;
        nv = nrow(vnames);

        spc = '        ';
        do i=1 to nv;
                vi = vnames[i,];
                read all var vi into cli;
                if type(cli) = 'N'
                        then do;
                                tmp = trim(left(char(cli,8)));
                                tmp = substr(tmp,1,max(length(tmp)));
                                cli = tmp;
                                end; 
                cli = trim(cli); 
                span = span || loc(0=(cli[1,] = cli))[1];
                d=design( cli );
                dim = dim || ncol(d);
                free row1;
                *-- find position of each first distinct value;
                do j=1 to ncol(d);
                        row1 = row1 || loc(d[,j]=1)[1];
                        end;
                *-- sort elements in row1 so that var labels are in data 
order;
                order = rank(row1);
                tmp = row1;
                row1[,order]=tmp;       

                li = t(cli[row1]);
                if i=1 then lnames = li;
                        else do;
                                if ncol(lnames) < ncol(row1) 
                                        then lnames=lnames || repeat(spc, i-1, 
ncol(row1)-ncol(lnames));
                                if ncol(lnames) > ncol(row1)
                                        then li = li || repeat(spc, 1, 
ncol(lnames)-ncol(li));
                                lnames = lnames // li;
                        end;
                end;

        *-- sort index variables by span so that last varies most slowly;
        order = rank(span);
        tmp = span;    span[,order] = tmp;
        tmp = dim;     dim[,order] = tmp;
        tmp = lnames;  lnames[order,] = tmp;
        tmp = vnames;  vnames[order,] = tmp;
        finish;
 
start cellname(dim,lnames);
   cn = '';
        d = dim;
   if nrow(dim)=1 then d = dim`;
   do f=nrow(d) to 1 by -1;
      r  = nrow(cn);
      ol = repeat( cn, 1, d[f]);
      ol = shape(ol, r#d[f], 1);
      nl = repeat( (lnames[f,(1:d[f])])`, r,1);
                if f=nrow(d)
                        then cn = trim(nl);
        else cn = trim(nl)+':'+trim(ol);
      end;
   return(cn);
   finish;
 

        /*--- Main routine */

        %if %length(&id) = 0 %then %do;
                run readtab("&data", "&response", {&var}, y, dim, lnames);
                cvar = 1;
                rvar = (cvar+1):ncol(dim);
                y = shape(y, (dim[rvar])[#], dim[1:cvar]);
                vars = lnames[1,1:dim[1]];
                if ncol(dim)=2 
                        then id = t(lnames[2,1:dim[2]]);
                        else id = cellname( dim[rvar], lnames[rvar,]);
                %end;
   %else %do; 
          use &data;
     read all var{&var} into y;
     read all var{&id} into id;
     vars = str2vec("&var");          *-- Preserve case of var names;
          %end;

        %if &power ^= 1 %then %do;
                y = power(y, &power);
                %end;

   scale = &scale;                                                              
   run biplot(y, id, vars, out, &p, scale );                                  
   quit;                                                                        
                                                                                
 /*----------------------------------*                                          
  |  Split ID into _TYPE_ and _NAME_ |                                          
  *----------------------------------*/                                         

data &out;                                                                      
   set &out;                                                                    
   drop id;                                                                     
   length _type_ $3 _name_ $16;                                                 
   _type_ = substr(id,1,3);                                                     
   _name_ = substr(id,5);                                                       
   label                                                                        
   %do i=1 %to &dim;                                                            
       dim&i = "&dimlab &i (&&p&i%str(%%))"                                                   
       %end;                                                                    
   ;                                                                            
 /*--------------------------------------------------*                          
  | Annotate observation labels and variable vectors |                          
  *--------------------------------------------------*/                         
        %local c1 c2 v1 v2 i1 i2 h1 h2;
        %*-- Assign colors and symbols;
        %let c1= %scan(&colors,1);                                                      
        %let c2= %scan(&colors,2);
        %if &c2=%str() %then %let c2=&c1;                                                   

        %let v1= %upcase(%scan(&symbols,1));                                                      
        %let v2= %upcase(%scan(&symbols,2));
        %if &v2=%str() %then %let v2=&v1;                                                   

        %let i1= %upcase(%scan(&interp,1));                                                      
        %let i2= %upcase(%scan(&interp,2));
        %if &i2=%str() %then %let i2=&i1;                                                   

        %let l1= %upcase(%scan(&lines,1));                                                      
        %let l2= %upcase(%scan(&lines,2));
        %if &l2=%str() %then %let l2=&l1;                                                   

        %if %length(&htext) %then %do;
                %let h1= %upcase(%scan(&htext,1));                                                      
                %let h2= %upcase(%scan(&htext,2,%str( )));
                %if &h2=%str() %then %let h2=&h1;                                                   
                %end;

        %*-- Plot increments;
        %let n1= %scan(&inc,1,%str( ));                                                      
        %let n2= %scan(&inc,2,%str( ));
        %if &n2=%str() %then %let n2=&n1;

        %*-- Find dimensions to be ploted;
        %let ya = %scan(&plotreq,1,%str(* ));
        %let xa = %scan(&plotreq,2,%str(* ));
        %let za = %scan(&plotreq,3,%str(=* ));

%if &pplot = YES %then %do;
        %put WARNING: Printer plots may not equate axes (using 
VTOH=&vtoh);                                                     
   %if &sysver < 6.08
       %then %do;
          %put WARNING: BIPLOT cannot label points adequately using
               PROC PLOT in SAS &sysver - use SAS 6.08 or later;
          %let symbol = %str( = _name_ );
          %let place =;
                         %let axes=;
       %end;
       %else %do;
           %let symbol = $ _name_ = '*';
           %let place = placement=((h=2 -2 : s=right left)
                                   (v=1 -1 * h=0 -1 to -3 by alt)) ;
                          %let axes = haxis = by &n1 vaxis = by &n2 ;
       %end;
 
        proc plot data=&out vtoh=&vtoh;
                plot &ya * &xa &symbol
                        / &axes &place box;
%end;


data &anno;                                                                     
   set &out;                                                                    
   length function color $8 text $16;                                                 
   xsys='2'; ysys='2'; %if &dim > 2 %then %str(zsys='2';);                                               
   text = _name_;                                                               
                                                                                
   if _type_ = 'OBS' then do;         /* Label observations (row 
points) */             
      color="&c1";                                                            
%*              if "&i1" = 'NIL' then return;                                                            
                if "&i1" = 'VEC' then link vec;                                                            
      x = &xa; y = &ya;                                                       
      %if &dim > 2 %then %str(z = &za;);
                %if &v1=NONE %then                                     
        %str(position='5';);
                %else %do;
      if &xa >=0                                                               
         then position='>';             /* rt justify         */                
         else position='<';             /* lt justify         */                
      if &ya >=0                                                               
         then position='2';             /* up justify         */                
         else position='E';             /* down justify       */
                        %end;
                size = &h1;                
      function='LABEL   '; output;                                              
      end;                                                                      
                                                                                
   if _type_ = 'VAR' then do;           /* Label variables (col points) 
*/                
      color="&c2";
                if "&i2" = 'VEC' then link vec;                                                            
      x = &xa; y = &ya;                
      %if &dim > 2 %then %str(z = &za;);
                %if &v2=NONE %then                                     
        %str(position='5';);
                %else %do;
      if &ya >=0                                                               
         then position='2';             /* up justify         */                
         else position='E';             /* down justify       */
                        %end;                
                size = &h2;                
      function='LABEL   '; output;      /* variable name      */                
      end;                                                                      
                return;

vec:                    /* Draw line from the origin to point */
      x = 0; y = 0;                
      %if &dim > 2 %then %str(z = 0;);                                          
      function='MOVE'    ; output;                                              
      x = &xa; y = &ya;                
      %if &dim > 2 %then %str(z = &za;);                                       
      function='DRAW'    ; output;                                              
                return;

 /*--------------------------------------------------*
  | Mark the origin                                  |
  *--------------------------------------------------*/
%if &m0 > 0 %then %do;
data _zero_;
        xsys='2';  ysys='2';
        %if &dim=3 %then %do; zsys='2'; z=0; %end;             
        x = -&m0;  y=0;   function='move'; output;
        x =  &m0;         function='draw'; output;
        x = 0;  y = -&m0; function='move'; output;
                y =  &m0; function='draw'; output;

data &anno;                                                                     
   set &anno _zero_;                                                                    
%end;

%if %length(&vaxis)=0 and %length(&haxis)=0 %then %do;
        %let x1= %scan(&xextra,1);                                                      
        %let x2= %scan(&xextra,2);
        %if &x2=%str() %then %let x2=&x1;
        %let y1= %scan(&yextra,1);                                                      
        %let y2= %scan(&yextra,2);
        %if &y2=%str() %then %let y2=&y1;

        %equate(data=&out, x=&xa, y=&ya, plot=no,
                vaxis=axis98, haxis=axis99, xinc=&n1, yinc=&n2,
                xmextra=&x1, xpextra=&x2, ymextra=&y1, ypextra=&y2);
        %let vaxis=axis98;
        %let haxis=axis99;
        options nonotes;
        %end;
%else %do;
%if %length(&vaxis)=0 %then %do;
        %let vaxis=axis98;
        %put WARNING:  You should use an AXISn statement and specify 
VAXIS=AXISn to equate axis units and length;
   axis98 label=(a=90);
        %end;
%if %length(&haxis)=0 %then %do;
        %let haxis=axis99;
        %put WARNING:  You should use an AXISm statement and specify 
HAXIS=AXISm to equate axis units and length;
   axis99 offset=(2);
        %end;
%end;


symbol1 v=&v1 c=&c1 i=&i1 l=&l1;
symbol2 v=&v2 c=&c2 i=&i2 l=&l2;
%if &gplot = YES %then %do;
        %if &i1=VEC %then %let i1=NONE;
        %if &i2=VEC %then %let i2=NONE;
        %let legend=nolegend;

/*
        %let warn=0;
   %if %length(&haxis)=0 %then %do;
                %let warn=1;
                axis2 offset=(1,5) ;
      %let haxis=axis2;
      %end;
   %if %length(&vaxis)=0 %then %do;
                %let warn=1;
                axis1 offset=(1,5) label=(a=90 r=0);
      %let vaxis=axis1;
      %end;
        %if &warn %then %do;
                %put WARNING:  No VAXIS= or HAXIS= parameter was specified, 
so the biplot axes have not;
                %put WARNING:  been equated.  This may lead to incorrect 
interpretation of distance and;
                %put WARNING:  angles.  See the documentation.;
                %end;
*/

        proc gplot data=&out &GOUT;
                plot &ya * &xa = _type_/ 
                        anno=&anno frame &legend
                        %if &m0=0 %then %do;
                        href=0 vref=0 lvref=3 lhref=3
                        %end;
                        vaxis=&vaxis haxis=&haxis
                        vminor=1 hminor=1
                        name="&name" des="Biplot of &data";
        run; quit;
*       goptions reset=symbol;
        %end;  /* %if &gplot=YES */

%done:                                                                          
%mend BIPLOT;                                                                   

/*-------------------------------------------------------------------*
  *    Name: gdispla.sas                                              *
  *   Title: Device-independent DISPLAY/NODISPLAY control             *
  *     Doc: http://www.math.yorku.ca/SCS/vcd/gdispla.html            *
  *-------------------------------------------------------------------*
  *  Author:  Michael Friendly            <friendly@yorku.ca>         *
  * Created: 14 Feb 91 11:19                                          *
  * Revised: 04 Jan 99 12:39                                          *
  * Version: 1.0                                                      *
  *                                                                   *
  * From ``Visualizing Categorical Data'', Michael Friendly (2000)    *         
  *-------------------------------------------------------------------
*/
 /* Description:
 
 The GDISPLA macro is used to switch graphics display off or on in
 a device-independent way.  It allows for the fact that for direct
 output to the display device, the required GOPTIONS are NODISPLAY
 or DISPLAY, whereas for output to a GSF, GIF, or EPS file, the options
 are GSFMODE=NONE or GSFMODE=APPEND.  It is usually used with the
 PANELS macro or the SCATMAT macro, or other programs which produce
 multiple plots and then join those plots in a template using
 PROC GREPLAY.

Usage:

 The GDISPLA macro is called with positional parameters.  The first
 (SWITCH) parameter must be specified.
 
        %let devtype=SCREEN;
        %gdispla(OFF);
        proc gplot;
                plot y * x;
                by group;
        %gdispla(ON);
        %panels(rows=1, cols=3);
 
Parameters:

* SWITCH        A string value, OFF or ON.

* IML           Specify any non-blank value to use the GDISPLA macro
                within SAS/IML.

Global parameters:

The macro uses one global macro parameter, DEVTYP, to determine the
appropriate action. This parameter is normally initialized either in 
the AUTOEXEC.SAS file, or in device-specific macros

* DEVTYP    String value, the type of graphic device driver.  The
            values EPS, GIF, CGM and WMF cause the macro to use
                                the GSMODE option; the value DEVTYP=SCREEN 
causes the
                                macro to use the DISPLAY or NODISPLAY option.
                                All other values are ignored.

 */

%macro gdispla(
          switch,
          iml
                         );

%global DISPLAY DEVTYP;

   %let switch=%upcase(&switch);
   %let devtyp=%upcase(&devtyp);
   %if &switch=ON %then %do;
                %let DISPLAY=ON;
       %if &devtyp=SCREEN /*and &sysver=5.18 */
           %then %let cmd= %str(GOPTIONS DISPLAY;);
           %else
                          %if &devtyp=EPS or &devtyp=GIF
                 or &devtyp=CGM or &devtyp=WMF 
                                        %then %let cmd= %str(GOPTIONS DISPLAY 
GSFMODE=REPLACE;);
                                        %else %let cmd= %str(GOPTIONS DISPLAY 
GSFMODE=APPEND;);
       %end;
   %else %if &switch=OFF %then %do;
                %let DISPLAY=OFF;
       %if &devtyp=SCREEN /*and &sysver=5.18  */
           %then %let cmd= %str(GOPTIONS NODISPLAY;);
           %else %let cmd= %str(GOPTIONS NODISPLAY GSFMODE=NONE;);
       %end;
   %else %put ERROR in GDISPLA: SWITCH must be ON or OFF;

        %if &iml = %str() %then %do;
                run;
                &cmd;
                %*put GDISPLA: &cmd;
                %end;
        %else %do;              /* Called from IML */
                start command(cmd);
                        call execute(cmd);
                finish;
                run command("&cmd");
                %end;
        
%mend gdispla;

/*-------------------------------------------------------------------*
  *    Name: interact.sas                                             *
  *   Title: Create interaction variables                             *
  *     Doc: http://www.math.yorku.ca/SCS/vcd/interact.html           *
  *-------------------------------------------------------------------*
  *  Author:  Michael Friendly            <friendly@yorku.ca>         *
  * Created: 18 Aug 98  8:32                                          *
  * Revised: 10 Jan 2000 08:50:52                                     *
  * Version: 1.0                                                      *
  *                                                                   *
  * From ``Visualizing Categorical Data'', Michael Friendly (2000)    *         
  *-------------------------------------------------------------------
*/
 /* Description:
 
 The INTERACT macro creates interaction variables, formed as the 
product
 of each of the variables given in one set (V1=) with each of the
 variables given in a second set (V2=).

Usage:

 The INTERACT macro is called with keyword parameters.
 The arguments may be listed within parentheses in any order, separated
 by commas. For example: 
 
   %interact(v1=age sex, v2=I1 I2 I3);
 
Parameters:

* DATA=         The name of the input dataset.  If not specified, the 
most
            recently created dataset is used.

* V1=           Specifies the name(s) of the first set of 
variable(s).

* V2=           Specifies the name(s) of the second set of 
variable(s).

* OUT=          The name of the output dataset.  If not specified, 
the new
            variables are appended to the input dataset.

* PREFIX=   Prefix(s) used to create the names of interaction 
variables.
            The default is 'I_'. The names are of the form I_11 I_12 
...
                                I_1m I_21 I_22 ... I_nm, where there are n 
variables in V1
                                and m variables in V2.

* CENTER=   If non-blank, the V1 and V2 variables are mean-centered
            prior to forming their interaction products.
 */
 
%macro interact( 
        data=_last_ ,    /* name of input dataset */
        out=&data,       /* name of output dataset */
        v1= ,            /* first variable(s) */
        v2= ,            /* second variable(s) */
        prefix = I_,     /* prefix for interaction variable names */
        names=,          /* or, a list of n*m names */
        center=,
        );

%let abort = 0;
%if (%length(&v1) = 0 or %length(&v2) = 0) %then %do;
                %put ERROR: INTERACT: V1=  and V2= must be specified;
                %goto done;
                %end;

%if %bquote(&data) = _last_ %then %let data = &syslast;
%if %bquote(&data) = _NULL_ %then %do;
        %put ERROR: There is no default input data set (_LAST_ is 
_NULL_);
        %goto DONE;
        %end;
        
%if %length(&center) %then %do;
proc standard data=&data out=&data m=0;
        var &v1 &v2;
        %end;

data &out;
        set &data;

        %local i j k w1 w2;
        
        %let k=0;
        %let i=1;
        %let w1 = %scan(&v1, &i, %str( ));
        %do %while(&w1 ^= );
        
                %let j=1;
                %let w2 = %scan(&v2, &j, %str( ));
        
                %do %while(&w2 ^= );
                        %* put i=&i j=&j;
                        %let k=%eval(&k+1);
                        %let name = %scan(&names, &k, %str( ));
                        %if %length(&name) %then %do;
                                &name = &w1 * &w2;
                                %end;
                        %else %do;
                                &prefix.&i.&j = &w1 * &w2;
                                %end;
                        %let j=%eval(&j+1);
                        %let w2 = %scan(&v2, &j, %str( ));
                        %end;
        
                %let i=%eval(&i+1);
                %let w1 = %scan(&v1, &i, %str( ));
                %end;
run;

%done:
%if &abort %then %put ERROR: The INTERACT macro ended abnormally.;

%mend;

/*-------------------------------------------------------------------*
  *   Name: ordplot.sas                                               *
  *  Title: Diagnose form of discrete frequency distribution          *
  *         Poisson, Binomial, Neg. Binomial, Log series              *
  *    Doc: http://www.math.yorku.ca/SCS/vcd/ordplot.html             *
  *                                                                   *
  *-------------------------------------------------------------------*
  *  Author:  Michael Friendly            <friendly@yorku.ca>         *
  * Created:   9 May 1993 11:30:54                                    *
  * Revised:   9 Nov 2000 11:26:00                                    *
  * Version:  1.3                                                     *
  *  1.1  Plot y * &count so label will not be required               *
  *  1.2  Wtd LS line in red, default LEGCLR='RED'                    *
  *  1.3   Fixed validvarname for V7+                                 *
  *                                                                   *
  * From ``Visualizing Categorical Data'', Michael Friendly (2000)    *         
  *-------------------------------------------------------------------
*/

 /* Description:
 
 The ORDPLOT macro constructs a plot whose slope and intercept can
 diagnose the form of a discrete frequency distribution.  This is
 a plot of k n(k) / n(k-1) against k, where k is the basic count
 and n(k) is the frequency of occurrence of k. The macro displays
 both a weighted and unweighted least squares line and uses the
 slope and intercept of the weighted line to determine the form
 of the distribution. Rough estimates of the parameters of the
 distribution are also computed from the slope and intercept.

Usage:

 The ORDPLOT macro is called with keyword parameters. The COUNT=
 and FREQ= variables are required.
 The arguments may be listed within parentheses in any order, separated
 by commas. For example: 
 
        data horskick;
                input deaths corpsyrs;
                label deaths='Number of Deaths'
                        corpsyrs='Number of Corps-Years';
        cards;
                        0    109
                        1     65
                        2     22
                        3      3
                        4      1
        ;
        %ordplot(count=Deaths, freq=corpsyrs);
 
Parameters:

* DATA=       Name of the input data set. [Default: DATA=_LAST_]

* COUNT=      The name of the basic count variable.

* FREQ=       The name of the variable giving the number of occurrences
              of COUNT.

* LABEL=      Label for the horizontal (COUNT=) variable.  If not 
specified
              the variable label for the COUNT= variable in the input
                                  data set is used.

* LEGLOC=     X,Y location for interpretive legend [Default: LEGLOC=3 
88]

* LEGCLR=     legend color [Default: LEGCLR=RED]

* OUT=        The name of the output data set. [Default: OUT=ORDPLOT]

* NAME=       Name of the graphics catalog entry. [Default: 
NAME=ORDPLOT]

 */

%macro ordplot(
     data=_last_,       /* input data set                           */
     count=,            /* basic count variable                     */
     freq=,             /* number of occurrences of count           */
     label=,            /* Horizontal (count) label                 */
          legloc=3 88,       /* x,y location for interpretive legend     
*/
          legclr=red,        /* legend color                             
*/
          out=ordplot,       /* The name of the output data set          
*/
     name=ordplot
     );
 
        %*-- Reset required global options;
        %if &sysver >= 7 %then %do;
                %local o1 o2;
                %let o1 = %sysfunc(getoption(notes));
                %let o2 = %sysfunc(getoption(validvarname,keyword));
                options nonotes validvarname=V6;
                %end;
        %else %do;
           options nonotes;
                %end;

%let abort=0;
%if %length(&count)=0 | %length(&freq)=0
   %then %do;
      %put ERROR: The COUNT= and FREQ= variables must be specified;
      %let abort=1;
      %goto DONE;
   %end;

%*if &label=%str() %then %let label=&count;
proc means data=&data noprint;
   var &count;
   weight &freq;
   output out=sum sumwgt=N sum=sum mean=mean min=min max=max;
*proc print data=sum;

data &out;
   set &data;
   if _n_=1 then set sum(drop=_type_ _freq_);
   k = &count;
   nk= &freq;                  * n(k);
   nk1 = lag(&freq);           * n(k-1);
   y =  k * nk / nk1;
   if nk > 1
      then wk = sqrt(nk-1);            * weight for regression line;
      else wk = 0;

proc print data=ordplot;
   id &count;
   var nk nk1 wk y;
   sum nk;
 
proc reg data=&out outest=parms;
   weight wk;
   model y =  k;
%local lx ly;
%let lx=%scan(&legloc,1);
%let ly=%scan(&legloc,2);

data stats;
   set parms (keep=k intercep);
   set sum   (keep=mean min max);
   drop k intercep;
   length text type parm $30 function $8;
   xsys='1'; ysys='1';
   x=&lx;   y=&ly;
   function = 'LABEL';
*   size = 1.4;
   color = "&legclr";
   position='3'; text ='slope =   '||put(k,f6.3);      output;
   position='6'; text ='intercept='||put(intercep,f6.3); output;
 
   *-- Determine type of distribution;
   select;
      when (abs(k) < .1) do;
         type = 'Poisson';
         parm = 'lambda = '||put(intercep,6.3);
         end;
      when (k < -.1) do;
         type = 'Binomial';
         p = (k/(k-1));
         parm = 'p = '||put(p,6.3);
         end;
      otherwise do;  * positive slope;
         if intercep >-.05 then do;
            type = 'Negative binomial';
            parm = 'p ='||put(1-k,6.3);
            end;
         else do;
            type = 'Logarithmic series';
            parm = 'theta ='||put(k,6.3);
            end;
         end;
      end;
   y = &ly - 7;
   position='3'; text ='type: ' ||type;   output;
   position='6'; text ='parm: ' ||parm;   output;
 
   *-- Draw (weighted) regression line;
   xsys='2'; ysys='2';  size=3; color='red';
   x=min; y = intercep + k * x; function='MOVE'; output;
   x=max; y = intercep + k * x; function='DRAW'; output;

proc gplot data=&out;
   plot y * &count / anno=stats vaxis=axis1 haxis=axis2 vm=1
      name="&name"
      des="Ord plot of &count";
   symbol v=- h=2 i=rl l=34 c=black;
   axis1 label=(a=90 r=0
          'Frequency Ratio, (k n(k) / n(k-1))' )
         offset=(3);
   axis2 offset=(3) minor=none
        %if %length(&label) %then %do;
         label=("k  (&label)")
                        %end;
                        ;
   run;quit;
%done:
        %*-- Restore global options;
        %if &sysver >= 7 %then %do;
                options &o1 &o2;
                %end;
        %else %do;
           options notes;
                %end;
%mend;

/*-------------------------------------------------------------------*
  * Name:   robust.sas                                                *
  * Title:  M-estimation for robust models fitting via IRLS           *
  *   Doc:  http://www.math.yorku.ca/SCS/vcd/robust.html              *
  *                                                                   *
  *-------------------------------------------------------------------*
  *  Author:  Michael Friendly            <friendly@YorkU.ca>         *
  * Created:   2 Dec 1996 11:34:22                                    *
  * Revised:  23 Jun 2000 10:33:51                                    *
  * Version:  1.2                                                     *
  *  1.2  Fixed some errors with LOGISTIC                             *
  *       Allow CLASS= with LOGISTIC for V8+                          *
  *                                                                   *
  * From ``Visualizing Categorical Data'', Michael Friendly (2000)    *         
  *-------------------------------------------------------------------
*/
 /* Description:
 
 The ROBUST macro uses iteratively reweighted least squares to  
 fit linear models by M-estimation.  The weights are determined  
 by the BISQUARE, HUBER, LAV or OLS function.  The fitting procedure 
 can be PROC REG, GLM or LOGISTIC               

Usage:

 The ROBUST macro is called with keyword parameters.
 The RESPONSE= and MODEL= parameters are required.
 The arguments may be listed within parentheses in any order, separated
 by commas. For example: 
 
        %include data(icu);
        %robust(data=icu, response=died, model=age cancer uncons admit,
                proc=logistic, id=id, iter=3);
 
Parameters:

* DATA=       The name of the input data set [Default: DATA=_LAST_]

* RESPONSE=   The name of the response variable in the model

* MODEL=      The right-hand-side of the MODEL statement

* PROC=       The name of the estimation procedure to be used,
              one of REG, GLM, or LOGISTIC.  
                                  [Default: PROC=LOGISTIC]

* CLASS=      The names of any CLASS variables in the MODEL (for GLM 
              or LOGISTIC in V8+)

* ID=         The names of any observation ID variables.  These are
              simply copied to the OUT= data set.

* OUT=        The name of the output data set of observation 
statistics.
              [Default: OUT=RESIDS]

* OUTPARM=    The name of the output data set of parameter estimates
              on the final iteration.

* FUNCTION=   Weight function, one of HUBER, LAV (least absolute 
value),
              BISQUARE, or OLS. [Default: FUNCTION=BISQUARE]

* TUNE=       Tuning constant for BISQUARE or HUBER.  The weighting
              function is applied to the value _RESID_ / (&TUNE * MAD)
                                  where MAD is the median absolute value of the 
residuals.
              The default is TUNE=6 for the BISQUARE function, and 
                                  TUNE=2 for the HUBER function.

* ITER=       The maximum number of iterations [Default: ITER=10]

* CONVERGE=   The maximum change in observation weights for 
convergence.
              The value must have a leading 0. [Default: CONVERGE=0.05]

* PRINT=      Controls printing of intermediate and final results
              [Default: PRINT=NOPRINT].

 */

%macro robust(
                data=_LAST_, 
                response=,        /* response variable                         
*/
                model=,           /* RHS of model statement                    
*/
                proc=REG,         /* estimation procedure: GLM, REG, 
LOGISTIC  */ 
                class=,           /* class variables (GLM only)                
*/
                id=,              /* ID variables                              
*/
                out=resids,       /* output observations data set              
*/
                outparm=,         /* output parameters data set                
*/
                function=bisquare, /* weight function: BISQUARE, HUBER or 
LAV  */
                tune=,            /* tuning constant for bisquare/huber        
*/
                iter=,            /* max number of iterations                  
*/ 
                converge=0.05,  /* max change in weight for convergence.     
*/
                        /* NB: must have leading 0                   */
                print=noprint
                );

%let abort=0;
%let proc = %upcase(&proc);
%let doparm = %index(REG LOGISTIC,&proc) ;      %* Getting parameter 
estimates?;

%if %index(REG LOGISTIC,&proc)
        %then %let outparm = outest;
        %else %let outparm = outstat;


%let r=r;
%if &proc = GLM %then %let r=rstudent;
%if &proc = LOGISTIC %then %let r=resdev;

%if %length(&iter)=0 %then %do;
        %let iter=10;
        %if &proc = LOGISTIC %then %let iter=4;
        %end;
                
%let function = %upcase(&function);
%if &tune = %str() %then %do;
        %if &function = BISQUARE %then %let tune = 6;
        %else %let tune = 2;
%end;

%let print = %upcase(&print);

data resids;
        set &data;
        _weight_ = 1;
        lastwt = .;
                
%do it = 1 %to &iter;

        %let pr=noprint;
        %if &print = PRINT %then %let pr=;
        %else %if %index(&print,NOPRINT) %then %let pr=NOPRINT;
        %else %if %index(&print,&it)  %then %let pr=;
        
        %*-- Remove parmest data set from a prior run;
        %if &it=1 %then %do;
        proc datasets nolist nowarn;
                delete parmest;
        %end;
        
        %*-- Fit the model, using current weights;
        proc &proc data=resids %if &it > 1 %then (drop=_resid_ _fit_ 
_hat_);
                 &outparm=parms
                 &pr;
                weight _weight_;                        %*-- observation weights;
                %if %length(&class)>0 & (&proc=GLM or (&proc=LOGISTIC and 
&sysver>=8))
                  %then %do;
                        class &class;
                  %end;
                model &response = &model;
                output out=newres &r=_resid_ p=_fit_ h=_hat_;
                title3 "Iteration &it";
        run;
        %if &syserr > 4 %then %let abort=1; %if &abort %then %goto DONE;

        options nonotes;
        %*-- Find the median absolute residual;
        data resids;
                set newres;
                absres = abs(_resid_);
        
        %*-- Find median absolute deviation (MAD);
        proc univariate data=resids noprint;
                var absres;
                output out=sumry median=mad;
        
        %*-- Calculate new weights;
        data &out;
                set resids end=eof;
                drop w mad _maxdif_ absres lastwt;
                retain _maxdif_ 0;
                lastwt = _weight_;
                if _n_=1 then set sumry(keep=mad);

                if _resid_ ^= . then do;
                        %*-- scaled residual;
                        w = _resid_ / (&tune * mad);
                        %if &function = BISQUARE    %then %bisquare(w);
                        %else %if &function = HUBER %then %huber(w);
                        %else %if &function = LAV   %then %lav(w);
                        %else _weight_=1;        /* OLS */
                        
                        _maxdif_ = max(_maxdif_, abs(_weight_-lastwt));
                end;
                
                if eof then do;
                *       file print;
                        put "NOTE: iteration &it  " _maxdif_=;
                        call symput('maxdif',left(put(_maxdif_,6.4)));
                        end;    
        run;
        %*if &doparm %then %do;
        data parms;
                iter = &it;
                set parms;
                _maxdif_ = input("&maxdif", best.);
        proc append base=parmest new=parms;
        run;
        %*end;
                        
        %if &maxdif < &converge %then %goto fini;

%end;
%fini:;
        data parmest;
                set parmest;
        %if &doparm %then %do;
                drop _type_ 
                %if &proc=REG %then _model_ _depvar_  &response;
                ;
                title3 'Iteration history and parameter estimates';
        %end;
        %else %do;
                drop _name_ prob;
                if _type_='SS1' then delete;
                title3 'Iteration history and test statistics';
        %end;           
        proc print data=parmest;
                id iter;
                run;
        
        %if %length(&outparm)>0 %then %do;
        data &outparm;
                set parmest end=eof;
                drop iter _maxdif_;
                if eof then output;
        %end;

%if %index(&print,NONE)=0 %then %do;
proc print data=&out;
        %if &id ^= %str() | &class ^= %str() %then %do;
                id &class &id;
                var &response _fit_ _weight_ _resid_ _hat_ flag;
                title3 'Residuals, fitted values and weights';
                run;
        %end;
title3;
%end;
%done:
options notes;
%mend;

%macro bisquare(w);
        if abs(&w) < 1 
                then do; _weight_ = (1 - &w**2) **2;  flag=' '; end;
                else do; _weight_ = 0;                flag='*'; end;
%mend;

%macro huber(w);
        if abs(&w) < 1 
                then do; _weight_ = 1;               flag=' '; end;
                else do; _weight_ = 1/abs(&w);       flag='*'; end;
%mend;

%macro lav(w);
        _weight_ = 1/(absres +(absres=0));
%mend;

/*-------------------------------------------------------------------*
 *    Name: catplot.sas                                              *
 *   Title: Plot observed and predicted logits for logit models      *
 *          fit by PROC CATMOD.                                      *
 *     Doc: http://www.math.yorku.ca/SCS/vcd/catplot.html            *
 *                                                                   *
 *-------------------------------------------------------------------*
 *  Author:  Michael Friendly            <friendly@YorkU.CA>         *
 * Created:  9 May 1991 12:20:09         Copyright (c) 1992          *
 * Revised:  9 Nov 2000 11:37:13                                     *
 * Version:  1.4                                                     *
 *  1.4   Fixed validvarname for V7+                                 *
 * Requires: %gensym                                                 *
 *                                                                   *
 * From ``Visualizing Categorical Data'', Michael Friendly (2000)    *         
 *-------------------------------------------------------------------*/
 /* Description:
 
 The CATPLOT macro is designed to plot observed and/or predicted
 values for logit models fit by the CATMOD procedure.  The macro uses
 the output data set produced with the OUT= option on the RESPONSE
 statement.  This data set normally contains both logit values
 (_TYPE_='FUNCTION') and probability values (_TYPE_='PROB').  Either
 set may be plotted, as specified by the TYPE= parameter.

 The horizontal variable may be character (XC=) or numeric (X=).
 A separate curve is drawn for each value of the CLASS= variable,
 connecting predicted values, with optional standard error bars,
 and separate plots are drawn for each value of the BYVAR= variable.
 
Usage:

 The catplot macro is called with keyword parameters. Either the
 X= or the XC= parameters are required. Use the CLASS= parameter
 to give multiple curves in each plot for the levels of the CLASS
 variable. Use the BYVAR= parameter to give multiple plots for the
 levels of the BYVAR variable. The arguments may be listed within
 parentheses in any order, separated by commas.  For example:

        proc catmod;
                direct husinc;
                response / out=logits;
                model labour = husinc children; 
        %catplot(data=logits, x=husinc, y=_pred_, class=labor, 
byvar=children);
 
Parameters:

* DATA=  The name of the SAS dataset to be plotted, which must be
                         an output data set from PROC CATMOD.  If DATA= is 
not
                         specified, the most recently created data set is 
used.

* X=             Name of a numeric factor variable to be used as the 
horizontal
                         variable in plots.  Use the XC= parameter to specify 
a
                         character variable.  You must specify either the X= 
or XC=
                         variable.

* XC=            Name of a character factor variable used as the horizontal
                         variable in plots.

* Y=             Name of the ordinate variable.  Y=_PRED_ plots the 
predicted
                         value; Y=_OBS_ plots the observed value.  The 
default is
                         Y=_OBS_, but the predicted values are also drawn, 
connected
                         by lines. [Default: Y=_OBS_]

* CLASS=  The name of a factor variable, used to define separate curves
                         which are plotted for each level of this variable.

* BYVAR=         Name of one or more factor variables to be used to define 
                         multiple panels in plots.  

* BYFMT=  Name of a SAS format used to format the value of BYVARs
                         for display in one panel of the plot(s). [Default: 
BYFMT=$16.]

* TYPE=   The type of observations to be plotted.  TYPE=FUNCTION (the
                         default) gives plots of the logit value; TYPE=PROB 
gives
                         plots of the probability value. [Default: 
TYPE=FUNCTION]

* Z=      Standard error multiple for confidence intervals around
                         predicted values, e.g., Z=1.96 gives 95% CI. To 
suppress error
                         bars, use Z=0.  The default is Z=1, giving 67% CI.

* CLFMT=         Name of a SAS format used to format the value the CLASS=
                         variable for display in each panel of the plot(s).

* CLSIDE=       Specifies whether the values of the CLASS= variable should
                         be labelled by annotation in the plot or by a 
legend.  If
                         CLSIDE=LEFT or CLSIDE=FIRST, CLASS= values are 
written at the
                         left side of each curve.  If CLSIDE=RIGHT or 
CLSIDE=LAST,
                         CLASS= values are written at the right side of each 
curve.
                         If CLSIDE=NONE, or if a LEGEND= legend is specified, 
the 
                         CLASS= values appear in the legend.  You should
                         then define a LEGEND statment and use the LEGEND= 
parameter.
                         [Default: CLSIDE=LAST]

* XFMT=  Name of a SAS format used to format the values of the 
horizontal
                         variable.
                        
* POSFMT=   Format to translate the value of the CLASS variable to a 
          SAS/GRAPH annotate position.  This will almost always be a
                         user-specified format created with PROC FORMAT.

* ANNO=  Name of an additional input annotate data set

* SYMBOLS=      List of SAS/GRAPH symbols for the levels of the CLASS= 
variable.  
                         The specified symbols are reused cyclically if the 
number of 
                         distinct values of the \texttt{CLASS=} variable 
exceeds the 
                         number of symbols. [Default: SYMBOLS=CIRCLE SQUARE 
TRIANGLE]

* COLORS=       List of SAS/GRAPH colors for the levels of the CLASS= 
variable.
          The specified colors are reused cyclically if the number of 
                         distinct values of the \texttt{CLASS=} variable 
exceeds the 
                         number of colors. [Default: COLORS=BLACK RED BLUE 
GREEN]

* LINES=         List of SAS/GRAPH line styles for the levels of the CLASS= 
          variable.  The specified line styles are reused cyclically if 
the 
                         number of distinct values of the \texttt{CLASS=} 
variable
                         exceeds the number of line styles. [Default: LINES=1 
20 41 21 7 14 33 12]

* VAXIS=         Axis statement for custom response axis, e.g., 
VAXIS=AXIS1.
          [Default: VAXIS=AXIS1]

* HAXIS=         Axis statement for custom horizontal axis, e.g., 
HAXIS=AXIS2
          [Default: HAXIS=AXIS2]

* LEGEND=       Legend statement for custom CLASS legend, e.g., 
LEGEND=LEGEND1

* PLOC=   For multiple plots (with a BYVAR), PLOC defines the X,Y 
position
          of the panel label, in graph percentage units. [Default: 
PLOC=5 95]

* PRINT=  Print summarized input data set? [Default: PRINT=NO]

* NAME=   Name of graphic catalog entry. [Default: NANME=CATPLOT]

 */
 
%macro catplot(
    data=_last_,  /* OUT= data set from PROC CATMOD                 */
    x=,           /* horizontal value for plot (NUMERIC)            */
    xc=,          /* horizontal value for plot (CHAR)               */
    y=_obs_,      /* ordinate for plotted points (_PRED_ or _OBS_)  */
         ylab=,        /* ordinate label                                 
*/
    class=,       /* variable for curves within each plot           */
    byvar=,       /* one plot for each level of by variable(s)      */
    byfmt=$16.,   /* format for by variable                         */
         type=FUNCTION,/* type of obs. plotted: FUNCTION or PROB         
*/
    z=1,          /* std. error multiple for confidence intervals   */
                  /* e.g., z=1.96 gives 95% CI. No error bars: z=0  */
    anno=,        /* additional input annotate data set             */
    clfmt=,       /* how to format values of class variable         */
    clside=last,  /* side for labels of class var (FIRST|LAST|NONE) */
    xfmt=,        /* format for X variable                          */
    posfmt=,      /* format to translate class var to position      */
    vaxis=axis1,  /* axis statement for logit axis                  */
    haxis=axis2,  /* axis statement for horizontal axis             */
    legend=,      /* legend statement for custom CLASS legend       */
    colors=BLACK RED BLUE GREEN,   /* colors for class levels       */
         symbols=circle square triangle,  /* symbols for class levels    
*/
    lines=1 20 41 21 7 14 33 12,     /* line styles for class levels */
         ploc=5 95,  /* location of panel variable label               */
         print=NO,     /* print summarized input data set?               
*/
         name=catplot
    );
 
        %*-- Reset required global options;
        %if &sysver >= 7 %then %do;
                %local o1 o2;
                %let o1 = %sysfunc(getoption(notes));
                %let o2 = %sysfunc(getoption(validvarname,keyword));
                options nonotes validvarname=V6;
                %end;
        %else %do;
           options nonotes;
                %end;

%let type=%upcase(&type);
%let print=%upcase(&print);
%let legend=%upcase(&legend);
%let clside=%upcase(&clside);
%if &clside=LEFT %then %let clside=FIRST;
%if &clside=RIGHT %then %let clside=LAST;

%let abort=0;

%if &x ^= %str() %then %do;
   %let px = &x;
        %let ax = x;
   %end;
%else %do;
   %if &xc = %str() %then %do;
       %put CATPLOT: Either X= or XC= variable must be specified;
                 %let abort=1;
       %goto DONE;
       %end;
   %let px = &xc;
        %let ax = xc;
   %end;
 
%*-- Find the last by-variable;
%if %length(&byvar) > 0 %then %do;
   %let _byvars=;
   %let _bylast=;
   %let n=1;
   %let token=%qupcase(%qscan(&byvar,&n,%str( )));

   %do %while(&token^=);
      %if %index(&token,-) %then
         %put WARNING: Abbreviated BY list &token.  Specify by= 
individually.;
      %else %do;
         %let token=%unquote(&token);
         %let _byvars=&_byvars &token;
         %let _bylast=&token;
      %end;
      %let n=%eval(&n+1);
      %let token=%qupcase(%scan(&byvar,&n,%str( )));
   %end;
%let nby = %eval(&n-1);
%if %index(&byfmt,%str(.))=0 %then %let byfmt = &byfmt..;
%end;  /* %if &byvar */

 %*-- Select logit (_type_='FUNCTION'), or probability (_type_='PROB') 
obs. ;
/*
data _pred_;
   set &data;
   drop  _type_ ;
   if _type_="&type";
        %if &type=PROB %then %do;
        label _obs_ = 'Observed probability'
                _pred_ = "Predicted probability';
                %end;
        %else %do;
        label _obs_ = 'Observed logit'
                _pred_ = "Predicted logit';
                %end;
*/
 %*-- Average over any other factors not given in &byvar or &class;
proc summary data=&data nway;
   class &byvar &class &px;
   var _pred_ _obs_ _seobs_ _sepred_ _resid_;
        where (_type_="&type");
   output out=_pred_(drop=_type_) mean=;

proc sort;
   by &byvar &class &px;

%if %substr(&print,1,1)=Y %then %do;
proc print data=_pred_;
   id &byvar &class &px;
   var _obs_ _seobs_ _pred_ _sepred_ _resid_;
   format _obs_ _pred_ 8.3 _seobs_ _sepred_ _resid_ 8.4;
%end;

proc contents data=&data out=_work_ noprint;
%if &syserr > 4 %then %let abort=1; %if &abort %then %goto DONE;

data _null_;
   set _work_(keep=name type format);
        %if %length(&clfmt)=0 %then %do;
        if upcase(name) = upcase("&class") then do;
                if format=' ' then do;
                        if type = 2
                                then format='$16.';
                                else format='best.';
                        end;
                if index(format,'.')=0 then format=trim(format)||'.';
                call symput('clfmt', format);
                put name= format=;
                end;
        %end;
 
%let plx = %scan(&ploc,1);
%let ply = %scan(&ploc,2);
%if %length(&posfmt) 
        %then %if %index(&posfmt,%str(.))=0 %then %let posfmt = 
&posfmt..;
data _anno_;
   set _pred_;
   by &byvar &class;
   length function color $8 text $100;
   retain cl 0;
   drop _seobs_ _sepred_ _resid_ cl;
   %if &byvar ^= %str() %then %do;
                %*-- Label for byvar(s) in this plot;
      goptions hby=0;
      if first.&_bylast then do;
         xsys='1'; ysys='1';
                        x = &plx; y=&ply; 
                        position='6';
                        %if &nby=1 %then %do;
                                text = put(&byvar,&byfmt);
                                %end;
                        %else %do;
                                text=' ';
                                %do i=1 %to &nby;
                                        text = trim(text) || %scan(&byvar, &i) || 
' ';
                                        %end;
                                %end;
         function = 'LABEL'; output;
         end;
      if first.&_bylast then cl=0;
      %end;
 
   xsys = '2'; ysys='2';
   %*-- Set X or XC variable ;
        &ax = &px;
 
        *-- Index for line/color;
        %if &class = %str()
                %then %do; cl=1; %end;
        %else %do; if first.&class then cl+1; %end;
   line=input(scan("&lines", cl),5.);
   color = scan("&colors",cl);

        %if (&clside=FIRST or &clside=LAST) & %length(&legend)=0 %then 
%do;
                if &clside..&class then do;
                        y=_pred_;
                        %if %length(&clfmt)
                                %then %str(text = put(&class,&clfmt););
                                %else %str(text = trim(left(&class)););
                        *-- Use a null char to move label a bit;
                        %if %upcase(&clside) = LAST %then %do;
                                position = '6';  text = '00'x || ' ' || text;
                                %end;
                        %else %do;
                                position='4';  text = trim(text) || '00'x;
                                %end;
                        %if &posfmt ^= %str() %then %do;
                        position = put(&class,&posfmt);
                        %end;
                        function = 'LABEL'; output;
                        end;
                %end;

        %if &class = %str()
                %then %do; if _n_=1 then do; %end;
        %else %do; if first.&class then do; %end;
      y = _pred_; function='MOVE'; output;
      end;
   else do;
      y = _pred_; function='DRAW'; output;
      end;
 
    %if &z > 0 %then %do;
    %*-- plot value +- &z * std error;
    line = 33;
    y = _pred_ + &z * _sepred_ ; function='MOVE'; output;
    y = _pred_ ;                 function='DRAW'; output;
    y = _pred_ - &z * _sepred_ ; function='DRAW'; output;
    y = _pred_ ;                 function='MOVE'; output;
    %end;
 
%if &anno ^= %str() %then %do;
   data _anno_;
      set _anno_ &anno;
   %end;
*proc print data=_anno_;
 
%if &class = %str()
        %then %do;
                %let sym = 1;
                symbol1 i=none v=%scan(&symbols,1) h=1.8 
c=%scan(&colors,1);
        %end;
        %else %do;
                %let sym = &class;
                %if %length(&symbols) %then %do;
                *-- How many levels of class variable? --;
                proc freq data = _pred_;
                        tables &class / noprint out=_levels_;
                data _null_;
                        set _levels_(obs=1) nobs=ngroups;
                        call symput( 'NGROUPS', put(ngroups,3.) );
                        run;
                %gensym(n=&ngroups, interp=none, symbols=&symbols, 
colors=&colors);
                %end;
        %end;

%if %length(&legend) %then %let legend=legend=&legend;
%else %if &legend=NONE | &clside ^= NONE %then %let legend=nolegend;


proc gplot data=_pred_;
   plot &y * &px = &sym
        / anno=_anno_ frame
                         &legend
          haxis=&haxis hminor=0
          vaxis=&vaxis vminor=1 name="&name"
                         des="catplot of &data";
   %if &byvar ^= %str() %then %do;
      by &byvar;
      %end;
   %if &xfmt ^= %str() %then %do;
      format &px &xfmt;
      %end;
   %if &ylab ^= %str() %then %do;
      label &y="&ylab";
      %end;
run; quit;

%done:
%if &abort %then %put ERROR: The CATPLOT macro ended abnormally.;

   goptions hby=;
        %*-- Restore global options;
        %if &sysver >= 7 %then %do;
                options &o1 &o2;
                %end;
        %else %do;
           options notes;
                %end;
%mend catplot;

/*-------------------------------------------------------------------*
  *    Name: gensym.sas                                               *
  *   Title: Macro to generate SYMBOL statement for each GROUP        *
  *     Doc: http://www.math.yorku.ca/SCS/vcd/gensym.html             *
  *-------------------------------------------------------------------*
  *  Author:  Michael Friendly            <friendly@yorku.ca>         *
  * Created: 05 Jan 1999 12:55                                        *
  * Revised:  8 Jun 2000 15:43:36                                     *
  * Version: 1.1                                                      *
  * 1.1 - Added FONT= for those special symbol fonts (only one)       *
  *       Added START= for first SYMBOL stmt.                         *
  *       Fixed bug with large N=                                     *
  *                                                                   *
  * From ``Visualizing Categorical Data'', Michael Friendly (2000)    *         
  *-------------------------------------------------------------------
*/
 /* Description:
 
 The GENSYM macro generates a series of SYMBOL statements for multiple
 group plots of the form
 
        proc gplot;
                plot y * x = group;

 Separate plot symbols, colors, line styles and interpolation options
 may be generated for each group.

Usage:

 The GENSYM macro is called with keyword parameters.  All parameters
 have default values, but the N= parameter must usually be specified.
 The arguments may be listed within parentheses in any order, separated
 by commas. For example: 
 
        %gensym(n=4);
 
 The INTERP=, LINE=, SYMBOLS=, and COLORS= parameters are each lists
 of one or more values. If fewer than N (blank delimited) values are
 given,  the available values are reused cyclically as needed.

Parameters:

* N=           The number of symbol statements constructed,
               named SYMBOL&start, ..., SYMBOL&N+&start.

* START=       Number of the first symbol statement. [Default: START=1]

* H=           The height of the plotting symbol. The same H= value is
               used for all SYMBOL statements. [Default: H=1.5]

* INTERP=      List of one or more interpolation options. 
               [Default: INTERP=NONE]

* LINE=        List of one or more numbers in the range 1..46 giving
               SAS/GRAPH line styles [Default: LINE=1]

* SYMBOLS=     A list of one or more names of SAS/GRAPH plotting 
symbols.
               [Default: SYMBOLS=%STR(SQUARE TRIANGLE : $ = X _ Y)]

* COLORS=      A list of one or more names of SAS/GRAPH colors.
               [Default: COLORS=BLACK RED GREEN BLUE BROWN YELLOW 
ORANGE PURPLE]

* FONT=        Font used for the symbols (same for all SYMBOL 
statements)
                
Example:

 To plot the four combinations of age group (old, young) and sex, with
 separate plotting symbols (circle, dot) for old vs. young, and
 separate colors (red, blue) for females vs. males, use the macro as
 follows:
 
        proc gplot;
                plot y * x = agesex;
                %gensym(n=4, symbols=circle circle dot dot, colors=red 
blue,
                        interp=rl);

 This generates the following symbol statements:
 
        symbol1 v=circle h=1.5 i=rl c=red;
        symbol2 v=circle h=1.5 i=rl c=blue;
        symbol3 v=dot    h=1.5 i=rl c=red;
        symbol4 v=dot    h=1.5 i=rl c=blue;
 */

%macro gensym(
      n=1,
                start=1, 
                h=1.5,
                interp=none,
                line=1,
                symbols=%str(square triangle : $ = X _ Y),
                colors=BLACK RED GREEN BLUE BROWN YELLOW ORANGE PURPLE,
                font=,
                );

    %*--  if more than 8 groups symbols and colors are recycled;
  %local chr col int lin k;
  %do k=&start %to &n ;
     %if %length(%scan(&symbols, &k, %str( ))) = 0 
              %then %let symbols = &symbols &symbols;
     %if %length(%scan(&colors, &k, %str( ))) = 0 
              %then %let colors = &colors &colors;
     %if %length(%scan(&interp, &k, %str( ))) = 0 
              %then %let interp = &interp &interp;
     %if %length(%scan(&line,   &k, %str( ))) = 0 
              %then %let line = &line &line;
     %let chr =%scan(&symbols, &k,%str( ));
     %let col =%scan(&colors, &k, %str( ));
     %let int =%scan(&interp, &k, %str( ));
     %let lin =%scan(&line,   &k, %str( ));
     symbol&k
          %if %length(&font) %then font=&font;
          height=&h value=&chr color=&col i=&int l=&lin;
%*put symbol&k h=&h v=&chr c=&col i=&int l=&lin;

  %end;
%mend gensym;


/*------------------------------------------------------------------*
  *    Name: label.sas                                               *
  *   Title: Create an Annotate dataset to label observations        *
  *      in a scatterplot                                            *
  *     Doc: http://www.math.yorku.ca/SCS/vcd/label.html             *
  *------------------------------------------------------------------*
  *  Author:  Michael Friendly         <friendly@yorku.ca>           *
  * Created:  15 May 1991 12:27:15                                   *
  * Revised:  28 Aug 2000 12:56:56                                   *
  * Version:  1.6                                                    *
  *  - Added BY= parameter                                           *
  *  - Added POS=- to position up/down wrt mean y                    *
  *  - Added copy= parameter; fixed bug with subset & pos            *
  *  - Added angle= and rotate=, IN= (then fixed keep= bug)          *
  *                                                                  *
  * From ``Visualizing Categorical Data'', Michael Friendly (2000)   *         
  *------------------------------------------------------------------*/
 /* Description:
 
 The LABEL macro creates an Annotate data set used to label
 observations in a 2D (PROC GPLOT) or 3D (PROC G3D) scatterplot.
 The points which are labeled may be selected by an arbitrary
 logical expression from those in the input dataset. The macro
 offers flexible ways to position the text label relative to either
 the data point or the center of the plot. The resulting Annotate
 data set would then be used with the ANNO= option of PROC GPLOT
 or PROC G3D.


Usage:

 Values must be supplied for the X=, Y= and TEXT= parameters.  For a 
PROC
 G3D plot, supply a value for the Z= parameter as well.
 The label macro is called with keyword parameters.
 The arguments may be listed within parentheses in any order, separated
 by commas. For example: 
 
        %label(x=age, y=response, text=name);
 
Parameters:

* DATA=       The name of the input data set [Default: DATA=_LAST_]

* X=          The name of the X variable for the scatterplot

* Y=          The name of the Y variable for the scatterplot

* Z=          The name of the Z variable for a 3D scatterplot

* BY=         The name(s) of any BY variable(s) to be used for multiple
              plots.

* XOFF=       An X-offset for the text label. You may specify a 
              numeric constant
              (XOFF=-1) in data units, or the name of a variable in the
                                  input data set. Positive values move the 
label
              to the right relative to the point; negative values move 
it
                                  to the left.


* YOFF=       A Y-offset for the text label. Positive values move the 
label
              towards larger Y values.


* ZOFF=       A Z-offset for the text label, for a 3D plot.

* TEXT=       The text used to label each point. TEXT= may be
                   specified as a variable in the data set or a SAS
                                  expression involving dataset variables (e.g.,
                                  TEXT=SCAN(MODEL,1)) and/or string 
                                  constants.  If you supply an expression, use 
the 
                                  C<%str()> macro function,  e.g.,
                                  C<TEXT=%str(trim(name || '-' || place))> to 
protect special
                                  characters.

* LEN=        Length of the TEXT variable [Default: LEN=16]

* POS=        Specifies the position of the label relative to the
              data point.       The POS= value can be a character 
constant 
                                  (one of the characters in 
"123456789ABCDEF<+>", as used 
                                  by the Annotate POSITION variable), an 
expression involving 
                                  dataset variables which evaluates to one of 
these characters
                                  (e.g., POS=SCAN('9 1 3', _NUMBER_)) or one of 
the special
                                  characters, "/", "|", or "-".  The special 
position values 
                                  cause the point label to be out-justified 
(moved outward 
                                  toward the edges of the plot relative to the 
data point.)
                                  by comparing the coordinates of the pointto 
the mean of 
                                  X and Y (/), or to the mean of X only (|), or 
to the
                                  mean of Y only (-).
        
* SYS=        Specifies the Annotate XSYS & YSYS value [Default: SYS=2]

* COLOR=      Label color (the name of a dataset character variable or 
a
              string constant enclosed in quotes. [Default: 
COLOR='BLACK']

* SIZE=       The size of label (in whatever units are given by the
              GUNIT goption). There is no 
              default, which means that the labels inherit the 
              global HTEXT setting.

* FONT=       The name of the font used for the label.  There is no 
              default, which means that the labels inherit the 
              global FTEXT setting.

* ANGLE=      Baseline angle for label.

* ROTATE=     Character rotate for label

* SUBSET=     An expression (which may involve any dataset variables) 
to
              select points.  A point will be labeled if the expression 
                                  evaluates to non-zero for the current 
observation.
              [Default: SUBSET=1]

* COPY=       The names of any variables to be copied to output dataset

* IN=         The name of an optional input annotate data set.  If
              specified, the IN= data set is concatenated with the
                                  OUT= data set.

* OUT=        The name of the annotate data set produced.  [Default: 
OUT=_LABEL_]
                
Example:

This example plots Weight against Price for American cars in the Auto 
data,
labeling the most expensive cars.

        %label(data=auto, x=price, y=weight,
                                color='red', size=1.2,
                                subset=origin='A' and price>10000,
                                pos=1, text=scan(model,1));
        
        proc gplot data=auto(where=(origin='A'));
                plot weight * price / frame anno=_label_;
                symbol1 v='+'  i=none color=black h=1.5;

 */

%macro label(
   data=_LAST_,
   x=,             /* X variable for scatterplot       */
   y=,             /* Y variable for scatterplot       */
   z=,             /* Z variable for G3D (optional)    */
   by=,            /* BY variable(s) (mult plots)      */
   xoff=0,         /* X-offset for label (constant     */
   yoff=0,         /* Y-offset for label    or         */
   zoff=0,         /* Z-offset for label  variable)    */
   text=,          /* text variable or expression      */
   len=16,         /* length of text variable          */
   pos=,           /* position for label (/=out-just)  */
   sys=2,          /* XSYS & YSYS value                */
   color='BLACK',  /* label color (quote if const)     */
   size=,          /* size of label                    */
   font=,          /* font for label                   */
   angle=,         /* baseline angle for label         */
   rotate=,        /* character rotate for label       */
   subset=1,       /* expression to select points      */
   copy=,          /* vars copied to output dataset    */
   in=,            /* input annotate data set          */
   out=_label_     /* annotate data set produced       */
        );

options nonotes; 
%* -- pos can be a constant, an expression, or / or -;
%*    if a character constant, put "" around it;
%if %index(|/-,&pos) %then
   %do;  %*-- Out-justify wrt means of x,y;
      proc summary data=&data;
         var &x &y;
         output out=_means_ mean=mx my;
   %end;
%else %if "&pos" ^= "" %then %do;
   %if %verify(&pos,%str(123456789ABCDEF<+>)) = 0
       %then %let pos="&pos" ;
   %end;
%else %let pos = "5";

%if %length(&by) %then %do;
   proc sort data=&data;
   by &by;
   %end;
run;

options notes;
data &out;
   set &data;
   %if %length(&by) %then %do;
      by &by;
      %end;
   keep x y xsys ysys position function
      %if %length(&size) %then  size ;
      %if %length(&angle) %then angle ;
      %if %length(&rotate) %then rotate ;
        color text &by &copy;
   length function color $8 text $ &len position $1;
   xsys = "&sys"; ysys = "&sys"; function='LABEL';
   x = &x + &xoff ;
   y = &y + &yoff ;
   %if &z ^= %str() %then %do;
          retain zsys "&sys"; keep z zsys;
          z = &z + &zoff;
       %end;
   %if "&text" ^= ""
      %then %do; text=&text;  %end;
      %else %do; text=left(put(_n_,5.)); %end;
   %if %length(&size)   %then %str(size=&size;);
   %if %length(&angle)  %then %str(angle=&angle;);
   %if %length(&rotate) %then %str(rotatee=&rotate;);
   color=&color;
   %if &font ^= %str() %then %do;
      keep style;
      style = "&font";
      %end;
   %if "&pos" = "/" %then
      %do;
         retain mx my;
         if _n_=1 then set _means_(keep=mx my);
         if x > mx then
            if y > my then position = '3';
                      else position = '9';
            else
            if y > my then position = '1';
                      else position = '7';
      %end;
   %else %if "&pos" = "-" %then
      %do;
         retain mx my;
         if _n_=1 then set _means_(keep=mx my);
         if y > my then position = '2';
                   else position = '8';
      %end;
   %else %if "&pos" = "|" %then
      %do;
         retain mx my;
         if _n_=1 then set _means_(keep=mx my);
         if x > mx then position = '6';
                   else position = '4';
      %end;
      /* if pos has more than one character, use them cyclically */
   %else %if %qsubstr(&pos,1,1) eq %str(%") %then 
      %str(position=substr(&pos,1+mod(_n_,length(&pos)),1););
   %else %str(position = &pos;);
   if (&subset);
run;

%if %length(&in) %then %do;
   data &out;
      set &in &out;
      %if %length(&by) %then %do;
         by &by;
         %end;
   %end;

%mend label;

/*-------------------------------------------------------------------*
  *    Name: panels.sas                                               *
  *   Title: Macro to display a set of plots in rectangular panels    *
  *           using PROC GREPLAY.                                     *
  *     Doc: http://www.math.yorku.ca/SCS/vcd/panels.html             *
  *-------------------------------------------------------------------*
  *  Author:  Michael Friendly            <friendly@yorku.ca>         *         
  * Created:  1 Mar 1994 13:16:36                                     *         
  * Revised:  28 May 2000 10:36:45                                    *         
  * Version:  1.6                                                     *
  *  - EQUATE default changed to Y                                    *
  *  - Added ORDER=BYROWS support                                     *         
  *                                                                   *
  * From ``Visualizing Categorical Data'', Michael Friendly (2000)    *         
  *-------------------------------------------------------------------
*/

/* Description:

 The PANELS macro constructs a template in which to replay a series of
 graphs, assumed all the same size, in a rectangular array of R
 rows and C columns. By default, the panels are displayed
 left-to-right across rows,  starting either from the top
 (ORDER=DOWN) or bottom (ORDER=UP).  If the number of rows and
 columns are unequal, the aspect ratio of individual panels can
 be maintained by setting equate=Y.  It is assumed that all the
 plots have already been created, and stored in a graphics catalog
 (the default, WORK.GSEG is used automatically by SAS/GRAPH
 procedures).

 For interactive use within the SAS Session Manager you should be
 aware that all plots are stored cumulatively in the graphics
 catalog throughout your session, unless explicitly changed with
 the GOUT= option in graphics procedures or macros  To create multiple
 panelled plots you can use the FIRST= and LAST= parameters or a 
REPLAY=
 list to specify which plots are used in a given call.
   
Usage:

 Call the PANELS macro after the steps which create the graphs in the
 graphics catalog.  The GDISPLA macro may be used to suppress the
 display of the original full-sized graphs.
 The ROWS= and COLS= parameters must be specified.
 
   goptions hsize=7in vsize=5in;
        %gdispla(OFF);
        proc gplot data=mydata;
                plot y * x = group;
                by sex;
        
        %gdispla(ON);
        %panels(rows=1, cols=2);
        
Parameters:
   
* ROWS=
* COLS=
   The ROWS= and COLS= arguments are required, and specify the
   size of the array of plots to be displayed. These are the only
   required arguments.
   
* PLOTS=
   If there are fewer than &ROWS*&COLS plots, specify the number
   as the PLOTS= argument. Optionally, there can be an additional
   plot, which is displayed (as a GSLIDE title, for example) in
   the top nn% of the display, as specified by the TOP= argument.
   
* TOP=
   If TOP=nn is specified, the top nn% of the display is reserved
        for one additional panel (of width 100%), to serve as the plot
        title or annotation.

* ORDER=
   The ORDER= argument specifies the order of the panels in the
   REPLAY= list, when REPLAY= is not specified. 
        Typically, the panels are displayed across the columns.
        ORDER=UP means that the panels in the bottom row are
        are drawn first, and numbered 1, 2, ..., &COLs. ORDER=DOWN means
        that the panels in the top row are drawn first, numbered 1, 2, 
..., 
        &COLs.  If you add the keyword BYROWS to ORDER=, the panels are
        displayed up or down the rows.  For example, when ROWS=3, COLS=5,
        ORDER=DOWN BYROWS generates the REPLAY= list as,

       replay=1:1  2:4  3:7  4:10  5:13
              6:2  7:5  8:8  9:11 10:14
             11:3 12:6 13:9 14:12 15:15
   
* EQUATE=
        The EQUATE= argument determines if the size of the panels is 
adjusted
        so that the aspect ratio of the plots is preserved.  If EQUATE=Y,
        the size of each plot is adjusted to the maximum of &ROWS and 
&COLS.
        This is usually desired, as long as the graphic options HSIZE and
        VSIZE are the same when the plots are replayed in the panels
        template as when they were originally generated.  The default is
        EQUATE=Y.
        
* REPLAY=
   The REPLAY= argument specifies the list of plots to be replayed
   in the constructed template, in one of the forms used with the
   PROC GREPLAY REPLAY statement, for example, REPLAY=1:1 2:3 3:2
   4:4 or REPLAY=1:plot1 2:plot3 3:plot2 4:plot4
   
* TEMPLATE=
   The name of the template constructed to display the plots. The
        default is TEMPLATE=PANEL&ROWS.&COLS.

* TC=
   The name of the template catalog used to store the template.
        You may use a two-part SAS data set name to save the template
        permanently.

* FIRST=
* LAST=
   By default, the REPLAY= argument is constructed to replay plot
   i in panel i.  If the REPLAY= argument is not specified, you can
   override this default assignment by specifying FIRST= the sequential
   number of the first graph in the graphics catalog to plot (default:
   first=1), where:                       
                >0 means absolute number of first graph,          
                <1 means number of first graph relative to last (i.e. 0 
means 
                 last graph only, -1 means first is one before last, etc.) 
                                            
   last=0          Number of last graph to plot                        
                >0 means absolute number of last graph,           
                <1 means number of last graph relative to number  
                of graphs in the catalog (i.e. 0 means last graph 
                in the catalog, -1 means one before last, etc.)

* GIN=
        GIN specifies the name of the input graphics catalog, from which 
the
        plots to be replayed are taken. The default is GIN=WORK.GSEG.

* GOUT=
        GOUT= specifies the name of the graphics catalog in which the 
panelled
        plot is stored. The default is GOUT=WORK.GSEG.
 */

%macro panels(
        rows=,                              /* number of rows of plots                
*/
        cols=,                              /* number of columns of plots             
*/ 
        plots=&rows * &cols,  /* total number of plots                  
*/
        top=0,                              /* percent of display for top 
panel       */ 
        order=UP,             /* start at bottom (UP) or top (DOWN)?    
*/
        replay=,                                    /* List of plots to replay                
*/
        equate=Y,                           /* Adjust sizes to maintain aspect 
ratio? */ 
        template=panel&rows.&cols,   /* name of template                
*/
        tc=panels,                          /* name of template catalog         
                         */
        first=1,              /* number of the first catalog entry used 
*/
        last=0,               /* number of the last catalog entry used  
*/
        gin=gseg,
        gout=gseg);

%local i j panl panl1 lx ly tx ty sx sy;
%*put PANELS: plots= &plots;

%let order=%upcase(&order);
%let equate=%substr(%upcase(&equate),1,1);

%let abort=0;
%let showgout=0;
%if &rows=%str() | &cols=%str()
        %then %do;
                %put ERROR: The ROWS= and COLS= parameters must be 
specified;
      %let abort=1;
                %goto DONE;
        %end;

%if %length(&top)=0 %then %let top=0;
        
  /* determine how many graphs are in the catalog */
%let npics=0;
  proc catalog catalog=&gin et=grseg;
        contents out=_cont_;
        run;
%if &syserr > 4 %then %let abort=1; %if &abort %then %goto DONE;
  data _null_; 
   set _cont_ end=last; 
        if (last) then call symput('npics',trim(left(put(_n_,5.)))); 
        run;
%put PANELS: The graphics catalog &gin contains &npics graphic entries;
%if &npics=0 %then %goto DONE;
%if &syserr > 4 %then %let abort=1; %if &abort %then %goto DONE;

  %if (&last<1) %then %do;
    %if (%eval(&npics+&last)>=1) %then %let last = %eval(&npics + 
&last);
    %else %do;
      %put NOTE (panels): last=&last too low - setting last=1.;
      %let last = 1;
    %end;
  %end;
  %else %if (&last>&npics) %then %do;
    %put WARNING: last=&last too high - setting last=&npics.;
    %let last = &npics;
  %end;

  %if (&first>&last) %then %do;
        %put WARNING: first=&first too high - no plots done.;
        %goto done;
        %end;

  %if (&first<1) %then %do;
      %if (%eval(&last+&first)>=1) %then %let first = %eval(&last + 
&first);
      %else %do;
        %put WARNING: first=&first too low - setting first=1.;
        %let first = 1;
                  %let showgout=1;
      %end;
  %end;

    %put PANELS: plotting entries &first to &last .... to catalog 
&gout;
  

%*-- If the replay list is not given, construct it, ordering the
        plots across each row, but starting at either the bottom row
        (order=UP) or the top row (order=DOWN);
        
%if &replay=%str() %then %do;
        %if %index(&order,BYROWS)=0 %then %do; 
                %do i = 1 %to &plots;
                        %let j = %eval(&first+&i-1);
                        %let replay = &replay &i:&j ;
                        %end;
                %end;
        %else %do;  /* order= ... BYROWS */
                data _null_;
                        array r(&rows, &cols) $8 _temporary_;
                        length replay $ 200;
                
                        plot = &first;  
                        do j=1 to &cols;
                                do i=1 to &rows;
                                        r[i,j] = left(put(plot,2.));
                                        plot+1;
                                        end;
                                end;
                
                        do i=1 to &rows;
                                do j=1 to &cols;
                                        panel+1;
                                        r[i,j] = trim(left(put(panel,2.))) || ':' 
|| trim(r[i,j]);
                                        replay = trim(replay)  || ' ' || r[i,j];
                                        end;
                                end;
                        call symput('replay', replay);
                run;
                %end;
        %if &top>0 
                %then %let replay = &replay 
%eval(&plots+1):%eval(&first+&plots+1);
        %put PANELS: replay=&replay;
        %end;
        
%*-- Calculate panel size and starting location;
data _null_;
   %if &top>0
       %then %do;  ty=100;      %end;
       %else %do;  ty=100-&top; %end;
   tx=100;
        %if &equate=N %then %do;
                hsize = tx/&cols;
                vsize = ty/&rows;
                sx = 0; sy = 0;
        %end;
        %else %do;
                np = max(&rows,&cols);
                hsize = round(tx/np,.01);
                vsize = round(ty/np,.01);
                sx = round((np - &cols) * hsize/2,.01);
                sy = round((np - &rows) * vsize/2,.01);
        %end;
        lx = round(hsize + sx,.01);
        ly = round(vsize + sy,.01);
*       put hsize= vsize= tx= ty= sx= sy= lx= ly=;

        %if %index(&order,DOWN) %then %do;
                ly = round(ty - sy,.01);
                sy = ly - vsize;
                vsize = -vsize;
        %end;                   

        put 'PANELS: ' hsize= vsize= tx= ty= sx= sy= lx= ly=;
   call symput('hsize', put(hsize,6.2));
   call symput('vsize', put(vsize,6.2));
   call symput('lx', put(lx,6.2));
   call symput('ly', put(ly,6.2));
   call symput('tx', put(tx,6.2));
   call symput('ty', put(ty,6.2));
   call symput('sx', put(sx,6.2));
   call symput('sy', put(sy,6.2));
run;

proc greplay igout=&gin
              gout=&gout  nofs
             template=&template
             tc=&tc ;

%* ---------------------------------------------------------------;
%* Generate a TDEF statement for a plot matrix                    ;
%* Start with (1,1) panel in lower left, and copy it across & up  ;
%* ---------------------------------------------------------------;
   TDEF &template DES="panels template &rows x &cols"
   %let panl=0;
   %* let lx = &hsize;
   %* let ly = &vsize;
   %do i = 1 %to &rows;
   %do j = 1 %to &cols;
                 %if &panl > %eval(&plots) %then %goto fini;
       %let panl  = %eval(&panl + 1);
       %if &j=1 %then
          %do;
             %if &i=1 %then %do;      %* (1,1) panel;
               &panl/
                ULX=&sx  ULY=&ly     URX=&lx  URY=&ly
                LLX=&sx  LLY=&sy     LRX=&lx  LRY=&sy
                %end;
             %else
                %do;                  %* (i,1) panel;
                   %let panl1 = %eval(&panl - &cols );
               &panl/ copy= &panl1 xlatey= &vsize
                %end;
          %end;
       %else
          %do;
               %let panl1 = %eval(&panl - 1);
               &panl/ copy= &panl1 xlatex= &hsize
          %end;
   %end;
   %end;
%fini:
   %if &top>0 %then %do;
       %let panl = %eval(&panl + 1);
         &panl/
          ULX=0  ULY=100     URX=&tx URY=100
          LLX=0  LLY=&ty     LRX=&tx LRY=&ty
   %end;
     %str(;);      %* end the TDEF statement;
   %if &replay ^= %str() %then %do;
      TREPLAY &replay;
                LIST template;
   %end;
        %if &showgout %then %str(LIST IGOUT;);
run;    quit;

%DONE:
%if &abort %then %put ERROR: The PANELS macro ended abnormally.;
options notes;
%mend;

/*-------------------------------------------------------------------*
  *    Name: rootgram.sas                                             *
  *   Title: Hanging rootograms for discrete distributions            *
  *     Doc: http://www.math.yorku.ca/SCS/vcd/rootgram.html           *
  *-------------------------------------------------------------------*
  *  Author:  Michael Friendly            <friendly@yorku.ca>         *
  * Created: 23 Dec 97 16:28                                          *
  * Revised: 17 Nov 2000 11:40:07                                     *
  * Version: 1.0                                                      *
  *                                                                   *
  * From ``Visualizing Categorical Data'', Michael Friendly (2000)    *         
  *-------------------------------------------------------------------
*/
 /* Description:
 
 The ROOTGRAM macro produces histograms, rootograms, and hanging
 rootograms for the distribution of a discrete variable compared
 with expected frequencies according to a theoretical distribution.

Usage:

 The VAR= and  OBS= variables  must be specified.  The expected
 frequencies may be obtained with the GOODFIT macro.
 
 The ROOTGRAM macro is called with keyword parameters.
 The arguments may be listed within parentheses in any order, separated
 by commas. For example: 
 
        %include catdata(madison);
        %goodfit(data=madison, var=count, freq=blocks, dist=poisson);
        %rootgram(data=fit, var=count, obs=blocks);
 
Parameters:

* DATA=       Specifies the name of the input data set [Default: 
DATA=_LAST_]

* VAR=        Specifies the name of the analysis variable, used as the
              abscissa in the plot.

* OBS=        Specifies the observed frequency variable

* EXP=        Expected/fitted frequency [Default: EXP=EXP]

* FUNC=       Function applied to ordinate [Default: FUNC=SQRT]

* BWIDTH=     Bar width [Default: BWIDTH=.5]

* BCOLOR=     Bar color [Default: BCOLOR=GRAYB0]

* BTYPE=      Bar type: One of HANG, DEV, or NEEDLE. [Default: 
BTYPE=HANG]

* ANNO=       The name of an input annotate dataset

* NAME=       Name of the graphics catalog entry

 */

%macro rootgram(
        data=_last_,   /* input dataset                */
        var=,          /* Analysis variable            */
        obs=,          /* observed frequency           */
        exp=exp,       /* expected/fitted frequency    */
        func=Sqrt,     /* function applied to ordinate */
        bwidth=.5,     /* bar width                    */
        bcolor=grayb0, /* bar color                    */
        btype=hang,    /* bar type: HANG, DEV, NEEDLE  */
        anno=,         /* input annotate dataset       */
        name=rootgram  /* graphics catalog entry name  */
        );

%let btype=%upcase(&btype);
data roots;
        set &data;
        
        %if %upcase(&func)^=NONE %then %do;
                &obs = &func(&obs+.000001);
                &exp = &func(&exp+.000001);
                label &exp = "&func(frequency)";
        %end;
        %else %do;      
                label &exp = "Frequency";
        %end;

data bars;
        set roots(keep=&var &obs &exp) end=eof;
        xsys='2'; ysys='2';
        retain min 0 max 0;
        drop inc;
        style = 'solid  ';
        color = "&bcolor";

        %*-- top of bar;
        x = &var - &bwidth/2;
        %if &btype=HANG %then %do;
                y = &exp;
                %end;
        %else %if &btype=DEV %then %do;
                y = &exp - &obs ;
                min = min(y,min);
                max = max(&exp,max);
                %end; 
        %else %do;
                y = &obs;
                %end; 
        function = 'move    '; output;
        max = max(y,max);

        %*-- bottom of bar;
        x = &var + &bwidth/2;
        %if &btype=HANG %then %do;
                y = &exp - &obs;
                %end;
        %else %do;
                y = 0;
                %end; 
        function = 'bar     '; output;
        min = min(y,min);

        if eof then do;
                drop pow nice ut best;
                inc= abs(max - min)/6;
                pow = 10**floor( log10(inc) );
                nice=1000;
                do in = 1, 2, 2.5, 4, 5;
                        ut = in * pow;
                        if abs(inc-ut) < nice then do;
                                nice = abs(inc-ut);
                                best = ut;
                        end;
                end;
                inc=best;
                min = inc * floor(min/inc);
                max = inc * ceil (max/inc);
                put min= max= inc=;
                call symput('max', left(put(max,3.1)));
                call symput('min', left(put(min,3.1)));
                call symput('inc', left(put(inc,3.1)));
                end;
        run;
%put min=&min max=&max inc=&inc;

%if %length(&anno) %then %do;
data bars;
        set bars &anno;
%end;
        
proc gplot data=roots;
        plot &exp * &var / vaxis=axis1 haxis=axis2
                anno=bars hminor=0 vminor=1 vref=0 lvref=7;
        symbol i=spline v=dot c=red h=1.5;
        axis1 label=(a=90) order=(&min to &max by &inc);
        axis2 offset=(5,5);
        run; quit;
%done:
goptions reset=symbol;
%mend;

/*-------------------------------------------------------------------*
  *    Name: corresp.sas                                              *
  *   Title: Correspondence analysis of contingency tables            *
  *     Doc: http://www.math.yorku.ca/SCS/vcd/corresp.html            *
  *                                                                   *
  *-------------------------------------------------------------------*
  *  Author:  Michael Friendly            <friendly@yorku.ca>         *
  * Created:  19 Jan 1990 15:23:09                                    *
  * Revised:   9 Nov 2000 11:45:28                                    *
  * Version:  1.8                                                     *
  * 1.2  Added dim parameter and colors                               *
  * 1.3  Added graphics GPLOT plot, annotation controls               *
  * 1.5  Uses PROC CORRESP rather than IML , equate axes              *
  *      Now handles MCA, stacked analysis, and other options.        *
  * 1.6  Added 3D plotting (but cant equate axes or control rotation  *
  *      tilt, etc.)                                                  *
  * 1.7  Revised syntax to be more compatible with PROC CORRESP       *
  *      Added Version 8 MCA warning to use BENZECRI/GREENACRE        *
  * 1.8  Fixed validvarname for V7+                                   *
  *                                                                   *
  * Requires: %label, %equate                                         *
  *                                                                   *
  * From ``Visualizing Categorical Data'', Michael Friendly (2000)    *         
  *-------------------------------------------------------------------
*/
/* Description:
 
 The CORRESP macro carries out simple correspondence analysis of a
 two-way contingency table, and various extensions (stacked analysis,
 MCA) for a multiway table, as in the CORRESP procedures.  It also
 produces labeled plots of the category points in either 2 or 3
 dimensions, with a variety of graphic options, and the facility to
 equate the axes automatically. 

 The macro takes input in one of two forms:

(a) A data set in contingency table form, where the columns are
 separate variables and the rows are separate observations (identified
 by a row ID variable).  That is, the input data set contains R
 observations, and C variables (whose values are cell frequencies)
 for an R x C table.  For this form, specify:

                ID=ROWVAR, VAR=C1 C2 C3 C4 C5

(b) A contingency table in frequency form (e.g., the output from PROC 
FREQ),
 or raw data, where there is one variable for each factor.  In 
frequency
 form, there will be one observation for each cell.
 For this form, specify:

                TABLES=A B C

 Include the WEIGHT= parameter when the observations are in frequency
 form.
         
Usage:

 The CORRESP macro is called with keyword parameters.  Either the
 VAR= parameter or the TABLES= parameter (but not both) must be
 specified, but other parameters or OPTIONS may be needed to carry
 out the analysis you want.  The arguments may be listed within
 parentheses in any order, separated by commas.  For example:
 
        %corresp(var=response, id=sex year);
 
 The plot may be re-drawn or customized using the output OUT=
 data set of coordinates and the ANNO= Annotate data set.

 The graphical representation of CA plots requires that the axes in the
 plot are equated, so that equal distances on the ordinate and abscissa
 represent equal data units (to perserve distances and angles in the 
plot).
 A '+', whose vertical and horizontal lengths should be equal,
 is drawn at the origin to indicate whether this has been achieved.

 If you do not specifiy the HAXIS= and YAXIS= parameters, the EQUATE
 macro is called to generate the AXIS statements to equate the
 axes.  In this case the INC=, XEXTRA=, and YEXTRA=, parameters
 may be used to control the details of the generated AXIS statements.

 By default, the macro produces and plots a two-dimensional solution.
 
Parameters:

* DATA=         Specifies the name of the input data set to be 
analyzed.
            [Default: DATA=_LAST_]

* VAR=          Specifies the names of the column variables for 
simple
            CA, when the data are in contingency table form.
                                Not used for MCA (use the TABLES= parameter 
instead).

* ID=           Specifies the name(s) of the row variable(s) for simple
            CA.  Not used for MCA.

* TABLES=   Specifies the names of the factor variables used to create
            the rows and columns of the contingency table.  For a 
simple
            CA or stacked analysis, use a ',' or '/' to separate the
                                the row and column variables.

* WEIGHT=   Specifies the name of the frequency (WEIGHT) variable when
            the data set is in frequency form.  If WEIGHT= is omitted,
            the observations in the input data set are not weighted.

* SUP=      Specifies the name(s) of any variables treated as 
supplementary.
            The categories of these variables are included in the 
output,
            but not otherwise used in the computations.  
            These must be included among the variables in the VAR= or
                                TABLES= option.  

* DIM=      Specifies the number of dimensions of the CA/MCA solution.
            Only two dimensions are plotted by the PPLOT option,
                                however. [Default: DIM=2]

* OPTIONS=  Specifies options for PROC CORRESP.  Include MCA for an
            MCA analysis, CROSS=ROW|COL|BOTH for stacked analysis of
                                multiway tables, PROFILE=BOTH|ROW|COLUMN for 
various
                                coordinate scalings, etc.  [Default: 
OPTIONS=SHORT]

* OUT=          Specifies the name of the output data set of 
coordinates.
            [Default: OUT=COORD]

* ANNO=         Specifies the name of the annotate data set of labels
            produced by the macro.  [Default: ANNO=LABEL]

* PPLOT=    Produce a printer plot? [Default: PPLOT=NO]

* GPLOT=    Produce a graphics plot? [Default: GPLOT=YES]

* PLOTREQ=  The dimensions to be plotted [Default: PLOTREQ=DIM2*DIM1
            when DIM=2, PLOTREQ=DIM2*DIM1=DIM3 when DIM=3]

* HTEXT=    Height for row/col labels.  If not specified, the global
            HTEXT goption is used.  Otherwise, specify one or two 
numbers
                                to be used as the height for row and column 
labels.
                                The HTEXT= option overrides the separate ROWHT= 
and COLHT=
                                parameters (maintained for backward 
compatibility).

* ROWHT=    Height for row labels

* COLHT=    Height for col labels

* COLORS=   Colors for row and column points, labels, and 
interpolations.
            In an MCA analysis, only one color is used.
            [Default: COLORS=BLUE RED]

* POS=      Positions for row/col labels relative to the points.
            In addition to the standard Annotate position values, the
                                CORRESP macro also understands the special 
characters "/", 
                                "|", or "-".  [Default: POS=5 5]

* SYMBOLS=  Symbols for row and column points, as in a SYMBOL 
statement.
            [Default: SYMBOLS=NONE NONE]

* INTERP=   Interpolation options for row/column points. In addition to 
the
            standard interpolation options provided by the SYMBOL 
statement,
                                the CORRESP macro also understands the option 
VEC to mean
                                a vector from the origin to the row or column 
point.
                           The option JOIN may be useful for an ordered 
factor, and 
                                the option NEEDLE may be useful to focus on the 
positions 
                                of the row/column points on the horizontal 
variable.
                                [Default: INTERP=NONE NONE, INTERP=VEC for MCA]

* HAXIS=    AXIS statement for horizontal axis.  If both HAXIS= and
            VAXIS= are omitted, the program calls the EQUATE macro to
                                define suitable axis statements.  This creates 
the axis
                                statements AXIS98 and AXIS99, whether or not a 
graph
                                is produced.

* VAXIS=    The name of an AXIS statement for the vertical axis.

* VTOH=     The vertical to horizontal aspect ratio (height of one
            character divided by the width of one character) of the
                                printer device, used to equate axes for a 
printer plot,
                                when PPLOT=YES.  [Default: VTOH=2]

* INC=      The length of X and Y axis tick increments, in data units
            (for the EQUATE macro).  Ignored
            if HAXIS= and VAXIS= are specified. [Default: INC=0.1 0.1]

* XEXTRA=   # of extra X axis tick marks at the left and right.  Use to
            allow extra space for labels. [Default: XEXTRA=0 0]

* YEXTRA=   # of extra Y axis tick marks at the bottom and top.
            [Default: YEXTRA=0 0]

* M0=       Length of origin marker, in data units. [Default: M0=0.05]

* DIMLAB=   Prefix for dimension labels [Default: DIMLAB=Dimension]

* NAME=     Name of the graphics catalog entry [Default: NAME=Corresp]        

Dependencies:
 
 The CORRESP macro calls several other macros not included here.
 It is assumed these are stored in an autocall library.  If not,
 you'll have to %include them in your SAS session or batch program.
 
 LABEL macro - label points 
 EQUATE macro - equate axes
 
 These are all available by ftp://hotspur.psych.yorku.ca/pub/sas
 (though in different subdirectories).

Bugs:

 Using SUP= variables messes up the assignment of symbols and colors
 to the row and column coordinates.
 
*/

%macro CORRESP(
        data=_LAST_,        /* Name of input data set                */
        var=,               /* Column variable(s)                    */
        tables=,            /* TABLES statement variables            */
        id=,                /* Row variable or row labels            */
        weight=,            /* Frequency variable (obs. weight)      */
        count=,             /* Frequency variable (obs. weight)      */
        sup=,               /* Supplementary variable(s)             */
        dim=2,              /* Number of CA dimensions               */
        options=short,      /* options for PROC CORRESP              */       
        out=COORD,          /* output data set for coordinates       */
        anno=LABEL,         /* name of annotate data set for labels  */
        pplot=NO,           /* Produce printer plot?                 */
        gplot=YES,          /* Produce graphics plot?                */
        plotreq=,           /* dimensions to be plotted              */
        htext=,             /* height for row/col labels             */
        rowht=,             /* height for row labels                 */
        colht=,             /* height for col labels                 */
        colors=BLUE RED,    /* Colors for rows and cols              */
        pos=5 5,            /* positions for row/col labels          */
        symbols=none none,  /* symbols for row and column points     */
        interp=,            /* interpolations for row/column points  */
        haxis=,             /* AXIS statement for horizontal axis    */
        vaxis=,             /* and vertical axis- use to equate axes */
        vtoh=2,             /* PPLOT cell aspect ratio               */
        inc=0.1 0.1,        /* x, y axis tick increments             */
        xextra=0 0,         /* # of extra x axis tick marks          */
        yextra=0 0,         /* # of extra y axis tick marks          */
        m0=0.05,            /* Length of origin marker               */
        dimlab=,            /* Dimension label                       */
        name=corresp        /* Name for graphics catalog entry       */
     ); 

        %*-- Reset required global options;
        %if &sysver >= 7 %then %do;
                %local o1 o2;
                %let o1 = %sysfunc(getoption(notes));
                %let o2 = %sysfunc(getoption(validvarname,keyword));
                options nonotes validvarname=V6;
                %end;
        %else %do;
           options nonotes;
                %end;


%let abort=0;
%*-- Check for required parameters;
%if %length(&var)=0 and %length(&tables)=0
   %then %do;
      %put ERROR: Either the VAR= or TABLES= parameter must be 
specified;
      %let abort=1;
      %goto DONE;
   %end;

%*-- Use weight and count as synonyms;
%if %length(&count)=0 and %length(&weight)>0
        %then %let count=&weight;
        
%*-- Set defaults which depend on other options;
%if %length(&plotreq)=0 %then %do;
        %if &dim=2 %then %let plotreq =  dim2 * dim1;
        %if &dim=3 %then %let plotreq =  dim2 * dim1 = dim3;
        %else %let plotreq =  dim2 * dim1;
        %end;

%if %length(&dimlab)=0 %then %do;
        %if &dim=2 %then %let dimlab = Dimension;
        %if &dim=3 %then %let dimlab = Dim;
        %end;

%if %length(&interp)=0 %then %do;
        %if %index(&options, MCA) %then %let interp=vec;
        %else %let interp=none none;
        %end;

%if %index(&options, MCA) & &sysver>7 %then %do;
        %if %index(&options, BENZECRI)=0 & %index(&options, GREENACRE)=0 
%then %do;
                %put WARNING: For MCA in Version &sasver, you should use 
the BENZECRI or GREENACRE options.;
                %end;
        %end;

%*-- Make character options case-insensitive;
%let pplot=%upcase(&pplot);
%let gplot=%upcase(&gplot);
%let interp=%upcase(&interp);
%let options=%upcase(&options);
options nonotes;

%if %length(&tables) %then %do;
        %let i=%index(&tables,/);               %*-- allow '/' rather than 
',' in tables;
        %if &i>0 %then %do;
   data _null_;
      length tables $ 200;
      tables = "&tables";
      tables = translate(tables, ',', '/');
      call symput('tables', trim(tables));
   run;
                %end;
                
        proc corresp data=&data outc=&out dimens=&dim &options;
                %if %length(&count) %then %str(weight &count;);
                tables &tables;
                %if %length(&sup) %then %str(sup &sup;);
        %end;
%else %do;
        proc corresp data=&data outc=&out dimens=&dim &options;
                %if %length(&count) %then %str(weight &count;);
                var &var;
                id &id;
                %if %length(&sup) %then %str(sup &sup;);
        %end;

%if &syserr > 4 %then %let abort=1; %if &abort %then %goto DONE;

%if %index(&options,MCA) %then %do;
        %*-- Find number of table variables;
        data _null_;
                set &data;
                array tab{*} &tables;
                if _n_=1 then do;
                        nv = dim(tab);
                        call symput('nv', left(put(nv,2.)));
                        end;
        run; 
        %end;

 /*-----------------------------------------------*
  |  Add % inertia to DIM labels (fix for MCA)    |
  *-----------------------------------------------*/
data &out;
   set &out nobs=nobs;
   drop i percent label;
   length label $30;
        array dimen{*} dim1-dim&dim;
        array contr{*} contr1-contr&dim;
        
        if nobs=0 then do;     *-- Check for empty output data set;
                call symput('abort', '1');
                stop;
                end;

        %if %length(&id)>0 %then %do;
                rename &id = _name_;
                %end;

        if _type_='INERTIA' then do;
                do i=1 to &dim;
                        %if %index(&options,MCA) %then %do;
                                %*-- Benzecri formula for %inertia in MCA;
                                dimen{i} = (&nv/(&nv-1)) * (contr{i}-(1/&nv));
                                %end;
                        percent = ' (' ||compress(put((100*contr{i} / 
inertia), 5.1)) || '%)';
                        label = "&dimlab" || ' ' || put(i,1.) 
                        %if %index(&options,MCA) = 0
                                %then %str(||  trim(percent));
                                ;
                        call symput('p'||trim(left(put(i,2.))), label);
                        end;
                end;
run;
%if &abort %then %goto DONE;

data &out;
        set &out;
   label                                                                        
   %do i=1 %to &dim;                                                            
       dim&i = "&&p&i"                                                   
       %end;                                                                    
   ;
                                                                       
proc print data=&out;
   id _type_ _name_;
        var dim1-dim&dim quality;

%*-- Plot increments;
%let n1= %scan(&inc,1,%str( ));                                                      
%let n2= %scan(&inc,2,%str( ));
%if &n2=%str() %then %let n2=&n1;

%*-- Find dimensions to be ploted;
%let ya = %scan(&plotreq,1,%str(* ));
%let xa = %scan(&plotreq,2,%str(* ));
%let za = %scan(&plotreq,3,%str(=* ));
%*put Plotting &ya * &xa = &za;
        

%if &pplot = YES %then %do;
        %put WARNING: Printer plots may not equate axes (using 
VTOH=&vtoh);                                                     
   %if &sysver < 6.08
       %then %do;
          %put WARNING: CORRESP cannot label points adequately using
               PROC PLOT in SAS &sysver - use SAS 6.08 or later;
          %let symbol = %str( = _name_ );
          %let place =;
       %end;
       %else %do;
           %let symbol = $ _name_ = '*';
           %let place = placement=((h=2 -2 : s=right left)
                                   (v=1 -1 * h=0 -1 to -3 by alt)) ;
       %end;
 
        proc plot data=&out vtoh=&vtoh;
                where (_type_ ^= 'INERTIA');
                plot &ya * &xa &symbol
                        / haxis =by &n1 vaxis=by &n2 &place box;
%end;

 /*--------------------------------------------------*
  | Annotate row and column labels                   |
  *--------------------------------------------------*/
        %*-- Assign colors / positions;
        %let c1= %scan(&colors,1);                                                      
        %let c2= %scan(&colors,2);
        %if &c2=%str() %then %let c2=&c1;
        %let p1 = %scan(&pos,1,%str( ));
        %let p2 = %scan(&pos,2,%str( ));
        %if "&p2"="" %then %let p2=&p1;

        %if %length(&htext)>0 %then %do;
                %let rowht = %scan(&htext,1,%str( ));
                %let colht = %scan(&htext,2,%str( ));
                %if &colht=%str() %then %let colht=&rowht;
                %end;

        %*-- Assign symbols and interpolations;
        %let s1= %scan(&symbols,1);                                                      
        %let s2= %scan(&symbols,2);
        %if &s2=%str() %then %let s2=&s1;
        %let i1= %upcase(%scan(&interp,1));                                                      
        %let i2= %upcase(%scan(&interp,2));
        %if &i2=%str() %then %let i2=&i1;

data _lab_;
        set &out(keep=_type_ _name_ dim1-dim&dim);
        where (_type_ ^= 'INERTIA');

        %label(data=_lab_, x=&xa, y=&ya, z=&za, text=_name_, size=&rowht,
      color="&c1", subset=_type_='OBS', pos=&p1, out=_lab1_, len=16,
                copy=_type_);
        %label(data=_lab_, x=&xa, y=&ya, z=&za, text=_name_, size=&colht,
      color="&c2", subset=_type_='VAR', pos=&p2, out=_lab2_, len=16,
                copy=_type_);
        %if %length(&sup) %then %do;
        %label(data=_lab_, x=&xa, y=&ya, z=&za, text=_name_, size=&colht,
      color='black', subset=_type_=:'SUP', pos=&p2, out=_lab3_, len=16,
                copy=_type_);
                %end;
        
        options nonotes;

 /*--------------------------------------------------*
  | Handle vector interpolation                      |
  *--------------------------------------------------*/
                                               
%if &i1=VEC or &i2=VEC %then %do;
data _vector_;
        set &out(keep=_type_ _name_ dim1-dim&dim);
        where (_type_ ^= 'INERTIA');
        drop dim1-dim&dim;
        retain xsys ysys '2';
        %if &dim=3 %then %do; retain zsys '2'; %end;             
        %if &i1=VEC %then %do;
      color="&c1";
                if _type_ = 'OBS' then link vec;
                %end;
        %if &i2=VEC %then %do;
      color="&c2";
                if _type_ = 'VAR' then link vec;
                %end;
                return;
vec:                    /* Draw line from the origin to point */
      x = 0; y = 0;
                %if &dim=3 %then %do; z=0; %end;             
      function='MOVE'    ; output;                                              
      x = &xa; y = &ya;                
                %if &dim=3 %then %do; z=&za; %end;             
      function='DRAW'    ; output;                                              
                return;
%end;

 /*--------------------------------------------------*
  | Mark the origin                                  |
  *--------------------------------------------------*/
%if &m0 > 0 %then %do;
data _zero_;
        xsys='2';  ysys='2';
        %if &dim=3 %then %do; zsys='2'; z=0; %end;             
        x = -&m0;  y=0;   function='move'; output;
        x =  &m0;         function='draw'; output;
        x = 0;  y = -&m0; function='move'; output;
                y =  &m0; function='draw'; output;
%end;

 /*--------------------------------------------------*
  | Concatenate anotate data sets                    |
  *--------------------------------------------------*/
data &anno;
        set _lab1_ _lab2_ 
        %if %length(&sup) %then %str(_lab3_);
        %if &m0 > 0 %then _zero_ ;
        %if &i1=VEC or &i2=VEC %then _vector_;
        ;
%if &i1=VEC %then %let i1=none;
%if &i2=VEC %then %let i2=none;

%if %length(&vaxis)=0 and %length(&haxis)=0 %then %do;
        %let x1= %scan(&xextra,1);                                                      
        %let x2= %scan(&xextra,2);
        %if &x2=%str() %then %let x2=&x1;
        %let y1= %scan(&yextra,1);                                                      
        %let y2= %scan(&yextra,2);
        %if &y2=%str() %then %let y2=&y1;

        %equate(data=&out, x=&xa, y=&ya, plot=no,
                vaxis=axis98, haxis=axis99, xinc=&n1, yinc=&n2,
                xmextra=&x1, xpextra=&x2, ymextra=&y1, ypextra=&y2);
        %let vaxis=axis98;
        %let haxis=axis99;
        options nonotes;
        %end;
%else %do;
%if %length(&vaxis)=0 %then %do;
        %let vaxis=axis98;
        %put WARNING:  You should use an AXISn statement and specify 
VAXIS=AXISn to equate axis units and length;
   axis98 label=(a=90);
        %end;
%if %length(&haxis)=0 %then %do;
        %let haxis=axis99;
        %put WARNING:  You should use an AXISm statement and specify 
HAXIS=AXISm to equate axis units and length;
   axis99 offset=(2);
        %end;
%end;

symbol1 v=&s1 i=&i1 l=33 c=&c1;
symbol2 v=&s2 i=&i2 l=20 c=&c2;
symbol3 v=none;
%if &gplot = YES %then %do;
        %if &dim=2 or %length(&za)=0 %then %do;
        proc gplot data=&out  ;
                where (_type_ ^= 'INERTIA');
                plot &ya * &xa = _type_
                                / anno=&anno frame nolegend
                                        vaxis=&vaxis haxis=&haxis vminor=1 
hminor=1
                                        name="&name"
                                        des="CORRESP plot of &data";
        run;quit;
        %end;

        %else %if &dim=3 %then %do;
        %put WARNING: 3D plots do not equate axes.  Try GOPTIONS HSIZE 
and VSIZE.;                                                     
        proc g3d data=&out  ;
                where (_type_ ^= 'INERTIA');
                plot &ya * &xa = &za
                                / anno=&anno  
                                        xticknum=2 yticknum=2 zticknum=2 grid
                                        name="&name"
                                        des="3D CORRESP plot of &data";
        run;quit;
        %end;

%end;   /* %if &gplot = YES */

 /*------------------------------------*
  | Clean up datasets no longer needed |
  *------------------------------------*/
proc datasets nofs nolist library=work memtype=(data);
    delete _lab1_ _lab2_
        %if %length(&sup) %then _lab3_;
         %if &i1=VEC or &i2=VEC %then _vector_;
         ;
         run; quit;

%done:
%if &abort %then %put ERROR: The CORRESP macro ended abnormally.;
        %*-- Restore global options;
        %if &sysver >= 7 %then %do;
                options &o1 &o2;
                %end;
        %else %do;
           options notes;
                %end;
%mend;

/*-------------------------------------------------------------------*
  *    Name: goodfit.sas                                              *
  *   Title: Goodness of fit tests for discrete distributions         *
  *     Doc: http://www.math.yorku.ca/SCS/vcd/goodfit.html            *
  *-------------------------------------------------------------------*
  *  Author:  Michael Friendly            <friendly@yorku.ca>         *
  * Created: 09 Dec 1997 13:29                                        *
  * Revised: 9 Nov 2000 12:01:48                                     *
  * Version: 1.3                                                      *
  * 1.2  Corrected error in DF calculation with SUMAT= given          *
  * 1.3  Fixed validvarname for V7+                                   *
  *                                                                   *
  * From ``Visualizing Categorical Data'', Michael Friendly (2000)    *         
  *-------------------------------------------------------------------
*/
 /* Description:

 The GOODFIT macro carries out Chi-square goodness-of-fit tests for
 discrete distributions.  These include the uniform, binomial,
 Poisson, negative binomial, geometric, and logarithmic series
 distributions, as well as any discrete (multinomial) distribution
 whose probabilities you can specify.  Both the Pearson chi-square
 and likelihood-ratio chi-square are computed.

 The data may consist either of individual observations on a single
 variable, or a grouped frequency distribution.  

 The parameter(s) of the distribution may be specified as constants
 or may be estimated from the data.

Usage:
 
 The GOODFIT macro is called with keyword parameters.
 The arguments may be listed within parentheses in any order, separated 
by commas. For example: 
 
   %goodfit(var=k, freq=freq, dist=binomial);

 You must specify a VAR= analysis variable and the keyword for the
 distribution to be fit with the DIST= parameter.  All other parameters
 are optional.

Parameters:

* DATA= Specifies the name of the input data set to be analyzed.

* VAR=  Specifies the name of the variable to be analyzed, the 
basic
                        count variable.

* FREQ= Specifies the name of a frequency variable for a grouped 
data
                        set.  If no FREQ= variable is specified, the program 
assumes
                        the data set is ungrouped, and calculates frequencies 
using
                        PROC FREQ.  In this case you can specify a SAS format 
with the
                        FORMAT= parameter to control the way the observations 
are
                        grouped.

* DIST=  Specifies the name of the discrete distribution to be fit.
                        The allowable values are:
                        UNIFORM, DISCRETE, BINOMIAL, POISSON, NEGBIN, 
GEOMETRIC,
                        LOGSERIES.

* PARM= Specifies the value of parameter(s) for the distribution 
being
                        fit.  If PARM= is not specified, the parameter(s) are 
estimated
                        using maximum likelihood or method of moment 
estimators.

* SUMAT=        For a distribution where frequencies for values of the VAR=
                        variable >= k have been lumped into a single 
category, specify
                        SUMAT=k causes the macro to sum the probabilities and 
fitted
                        frequencies for all values >=k.

* FORMAT= The name of a SAS format used when no FREQ= variable has been
                        specified.

* OUT=  Name of the output data set containing the grouped 
frequency
                        distribution, estimated fitted frequencies (EXP) and 
the values
                        of the Pearson (CHI) and deviance (DEV) residuals.

* OUTSTAT= Name of the output data set containing goodness-of-fit
                        statistics.

Bugs:

See also:

* ROOTGRAM

 */
 

%macro goodfit(
        data=_last_,       /* name of the input data set             */
        var=,              /* analysis variable                      */
        freq=,             /* frequency variable                     */
        dist=,             /* distribution to be fit                 */
        parm=,             /* required distribution parameters       */
        sumat=100000,
        format=,           /* format for ungrouped analysis variable */
        out=fit,           /* output fit data set                    */
        outstat=stats);    /* output statistics data set             */


        %*-- Reset required global options;
        %if &sysver >= 7 %then %do;
                %local o1 o2;
                %let o1 = %sysfunc(getoption(notes));
                %let o2 = %sysfunc(getoption(validvarname,keyword));
                options nonotes validvarname=V6;
                %end;
        %else %do;
           options nonotes;
                %end;

%let usedata=&data;
%let dist=%upcase(&dist);
%let abort=0;

%if &var=%str() | &dist=%str()
   %then %do;
      %put ERROR: The VAR= and DIST= parameters must be specified.;
      %let abort=1;
      %goto DONE;
   %end;

%if %length(%scan(&var,2))
   %then %do;
      %put ERROR: Only one VAR= variable is allowed.;
      %let abort=1;
      %goto DONE;
   %end;

%if %index(UNIFORM DISCRETE BINOMIAL POISSON NEGBIN GEOMETRIC 
LOGSERIES,&dist)=0
   %then %do;
      %put ERROR: The DIST=&DIST is not a recognized distribution.;
      %let abort=1;
      %goto DONE;
   %end;

%*-- Assume individual observations if no freq was given;
%if %length(&freq)=0 %then %do;
        proc freq data=&data;
                tables &var / noprint out=_counts_;
                %if %length(&format) %then %do;
                        %if %index(&format,.)=0 %then %let format = 
&format..;
                        format &var &format;
                        %end;
                run;
        
        %let usedata=_counts_;
        %let freq=count;
%end;

%*-- Find total frequency, number of cells, mean, var;
proc summary data=&usedata vardef=weight;
        var &var;
        weight &freq;
        output out=_total_ sum=_total_ sumwgt=n mean=_mean_ max=_max_ 
var=_var_;
%if &syserr > 4 %then %let abort=1; %if &abort %then %goto DONE;

%*-- Find number of cells summed, if any;
data _total_;
        set &usedata end=eof nobs=_cells_;
        if _n_=1 then set _total_(drop=_type_);
        keep _total_ n _mean_ _max_ _var_ _sumd_;
        retain _sumd_ 0;
        
        if _cells_ > &sumat then do;
                if &var > &sumat then _sumd_+1;
                end;
        if eof then output;
        
*proc print;

%*-- Determine if any parameters were passed;
%if %length(&parm) > 0
        %then; %let nparm = %words(&parm);
%*put nparm= &nparm;
%let pname=;
%let eparm=;

data &out;
        set &usedata end=eof nobs=_cells_;
        if _n_=1 then set _total_;
        drop _max_ _total_ n _mean_ _sumd_;

        %if &nparm>0 %then %do;
                %*-- Store parameters in an array;
                array _xp_  _xp1-_xp&nparm ( &parm );
                drop _xp1-_xp&nparm;
        %end;
        
        %if &dist=UNIFORM %then %do;
                df = (_cells_ - _sumd_) - 1;
                if &var < &sumat
                        then phat = 1 / _cells_;
                        else phat = (_cells_ - &sumat + 1)/_cells_;
                %end;           

        %else %if &dist=DISCRETE %then %do;
                %let pname=cell proportions;
                %if &nparm=0 %then %do;
            phat = &freq * &var /n;
                        %end;
        %else %do;
                %*-- Parameters are proportional to cell probabilities; 
                if _n_=1 then do;
                        drop _tot_;
                        _tot_ = sum(of _xp1-_xp&nparm);
                        do _i_=1 to &nparm;
                                _xp_[_i_] = _xp_[_i_] / _tot_;
                                end;
                        end;
                        
                phat = _xp_[_n_];
        %end;
                df = (_cells_ - _sumd_) - 1;
                %end;

        %else %if &dist=POISSON %then %do;
                %let pname=lambda;
                %if &nparm=0 %then %do;
                        df = (_cells_ - _sumd_) - 2;
                        parm = _mean_;
                        call symput('eparm', left(put(parm, 7.4)));
                        %end;
                %else %do;
                        parm = _xp_[1];
                        df = (_cells_ - _sumd_) - 1;
                        %end;
                if &var < &sumat
         then phat = exp(-parm) * parm**&var / gamma(&var+1);
                        else phat = 1 - poisson(parm, &var-1);
                %end; 

        %else %if &dist=BINOMIAL %then %do;
                %let pname=p;
                %if &nparm=0 %then %do;
                        p = _mean_ / _max_;
                        call symput('eparm', left(put(p, 7.4)));
                        df = (_cells_ - _sumd_) - 2;
                        %end;
                %else  /* %if &nparm>0 %then */ %do;
                        p = _xp_[1];
                        df = (_cells_ - _sumd_) - 1;
                        %end;
                if &var=0
                        then phat = probbnml(p, _max_, &var);
                        else if &var < &sumat
                 then phat = probbnml(p, _max_, &var) -
                                     probbnml(p, _max_, &var-1);
            else phat =  1 - probbnml(p, _max_, &var-1);
                %end;

        %else %if &dist=NEGBIN %then %do;
                %let pname=n, p;
                %if &nparm=0 %then %do;
                        df = (_cells_ - _sumd_) - 3;
                p = _mean_ / _var_;
                parm = _mean_**2 / (_var_ - _mean_);
                        call symput('eparm', trim(left(put(parm, 7.4))) || ', 
' || left(put(p, 7.4)));
                %end;
        %else %do;              *-- parameters are: n, p;
                        parm = _xp_[1];
                        p =    _xp_[2];
                        df = (_cells_ - _sumd_) - 1;
           %end;
                if &var < &sumat then
                        phat = (gamma(parm+&var)/(gamma(&var+1)*gamma(parm))) 
*
               (p**parm)*(1-p)**&var;
                else do v=&var by 1 until (term < .00001); 
                        term = (gamma(parm+v)/(gamma(v+1)*gamma(parm))) *
               (p**parm)*(1-p)**v;
                        phat = sum(phat, term);
                        end;
                drop term v;
                %end;

        %else %if &dist=GEOMETRIC %then %do;    **-- INCOMPLETE --;
                %let pname=p;
                %if &nparm=0 %then %do;
                        df = (_cells_ - _sumd_) - 2;
         parm = 1/(_mean_);
                        call symput('eparm', left(put(parm, 7.4)));
                        %end;
                %else %do;
                        df = (_cells_ - _sumd_) - 1;
                        parm = _xp_[1];
                        %end;
*               phat = ((_mean_-1)**(&var-1)) /(_mean_**&var);
                if &var < &sumat then
                        phat = parm * (1-parm)**(&var-1);
                else do v=&var by 1 until (term < .00001); 
                        term =  parm * (1-parm)**(v-1);
                        phat = sum(phat, term);
                        end;
                drop term v;
                %end;

        %else %if &dist=LOGSERIES %then %do;
                %let pname=theta;
                %if &nparm=0 %then %do;
                        df = (_cells_ - _sumd_) - 2;
                        *Birch estimator;
                        parm = 1 - (1 / (1 + ((5/3)- log(_mean_)/16)*(_mean_-
1)+2)*log(_mean_));
                        call symput('eparm', left(put(parm, 7.4)));
                        %end;
                %else %do;
                        parm = _xp_[1];
                        df = (_cells_ - _sumd_) - 1;
                        %end;
                if &var < &sumat then
                        phat = parm**&var/(-&var*log(1-parm));
                else do v=&var by 1 until (term < .00001); 
                        term =  parm**v/(-v*log(1-parm));
                        phat = sum(phat, term);
                        end;
                drop term v;
                %end;

        exp  = n * phat;        
        chi = (&freq - exp) / sqrt(exp);
        
        if &freq = 0 
                then dev = 0;
                else dev = 2* &freq * log(&freq/(exp + (exp=0)));
        dev = sign(&freq - exp) * sqrt(abs(dev));
        if &var <= &sumat 
                then output;

        label exp='Fitted frequency'
        phat= 'Fitted probability'
                chi = 'Pearson residual'
                dev = 'Deviance residual';

proc print;             
        id &var;
        var &freq phat exp chi dev;
        sum &freq phat exp;

data &outstat;
        keep dist stat value df prob;
        set &out end=eof;
        chisq + (chi**2);
        g2 + sign(dev)*(dev**2);

        *-- Output statistics to dataset;
        if eof then do;
                pchisq = 1-probchi(chisq,df);
                pg2 = 1-probchi(g2,df);

                dist = "&dist";
                stat = 'Pearson Chi-square  ';
                value= chisq;
                prob = pchisq;
                output;

                stat = 'Likelihood ratio G2';
                value= g2;
                prob = pg2;
                output;

        *-- Prepare printed output summary;
                file print;
                length label $40;
                call label(&var, label);
                if upcase("&var")=label then label='';
                put /  @10 "Goodness-of-fit test for data set 
%upcase(&data)"
                    // @10 "Analysis variable:       %upcase(&var) " label
                    /  @10 "Distribution:            &dist";
                %if &nparm>0 %then %do;
                put    @10 "Specified Parameters:    &pname = &parm";
                %end;
                %if %length(&eparm)>0 %then %do;
                put    @10 "Estimated Parameters:    &pname = &eparm";
                %end;
                                
                put /  @10 'Pearson chi-square    = ' chisq 
                    /  @10 'Prob > chi-square     = ' pchisq
                    // @10 'Likelihood ratio G2   = ' g2 
                    /  @10 'Prob > chi-square     = ' pg2
                         // @10 'Degrees of freedom    = ' df;
                end;
   run;
*proc print;

%done:
%if &abort %then %put ERROR: The GOODFIT macro ended abnormally.;

        %*-- Restore global options;
        %if &sysver >= 7 %then %do;
                options &o1 &o2;
                %end;
        %else %do;
           options notes;
                %end;

%mend;

/*-------------------------------------------------------------------*
  *    Name: lags.sas                                                 *
  *   Title: Macro for lag sequential analysis                        *
  *     Doc: http://www.math.yorku.ca/SCS/vcd/lags.html               *
  *-------------------------------------------------------------------*
  *  Author:  Michael Friendly            <friendly@YorkU.ca>         *
  * Created: Mar 21 14:03:21 EST 1996                                 *
  * Revised: Apr 26 09:45:34 EDT 1996                                 *
  * Version:  1.1                                                     *
  *                                                                   *
  * From ``Visualizing Categorical Data'', Michael Friendly (2000)    *         
  *-------------------------------------------------------------------
*/

 /* Description:

Given a variable containing event codes (char or numeric), the LAGS 
macro
creates:
(a) a dataset containing n+1 lagged variables, _lag0 - _lagN (_lag0
                is just a copy of the input event variable)
(b) optionally, an (n+1)-way contingency table containing frequencies 
of
        all combinations of events at lag0 -- lagN

Either or both of these datasets may be used for subsequent analysis
of sequential dependencies.  One or more BY= variables may be 
specified,
in which case separate lags and frequencies are produced for each 
value of the BY variables.

Usage:

One event variable must be specified with the VAR= option.  All other 
options have default values.  If one or more BY= variables are 
specified,
lags and frequencies are calculated separately for each combination of
values of the BY= variable(s).

The arguments may be listed within parentheses in any order, separated
by commas. For example:

   %lags(data=codes, var=event, nlag=2)

Parameters:

* DATA= The name of the SAS dataset to be lagged.  If DATA= is not
                        specified, the most recently created data set is 
used.

* VAR=   The name of the event variable to be lagged.  The variable may
         be either character or numeric.

* BY=    The name of one or more BY variables.  Lags will be restarted
         for each level of the BY variable(s).  The BY variables may
                        be character or numeric.

* VARFMT=  An optional format for the event VAR= variable.  If the 
codes
         are numeric, and a format specifying what each number means
                        is used (e.g., 1='Active' 2='Passive'), the output 
lag variables
                        will be given the character values. 

* NLAG=   Number of lags to compute.  Default = 1.

* OUTLAG=  Name of the output dataset containing the lagged variables.  
This
         dataset contains the original variables plus the lagged 
variables,
                        named according to the PREFIX= option.

* PREFIX=   Prefix for the name of the created lag variables.  The 
default
         is PREFIX=_LAG, so the variables created are named _LAG1, 
_LAG2,
                        ..., up to _LAG&nlag.  For convenience, a copy of the 
event
                        variable is created as _LAG0.

* FREQOPT=  Options for the TABLES statement used in PROC FREQ for the
         frequencies of each of lag1-lagN vs lag0 (the event variable).
                        The default is FREQOPT= NOROW NOCOL NOPERCENT CHISQ.
                        
Arguments pertaining to the n-way frequency table:

* OUTFREQ=  Name of the output dataset containing the n-way frequency
         table.  The table is not produced if this argument is not
                        specified.

* COMPLETE=    NO, or ALL specifies whether the n-way frequency table
         is to be made 'complete', by filling in 0 frequencies for
                        lag combinations which do not occur in the data.

Example:

 Assume a series of 16 events have been coded with the 3 codes, a, b, 
c,
 for 2 subjects as follows:
 
  Sub1:   c   a   a   b   a   c   a   c   b   b   a   b   a   a   b   c
  Sub2:   c   c   b   b   a   c   a   c   c   a   c   b   c   b   c   c
  
 and these have been entered as the 2 variables SEQ (subject) and CODE
 in the dataset CODES:

                SEQ    CODE
                
                        1      c
                        1      a
                        1      a
                        1      b
                        ....
                        2      c
                        2      c
                        2      b
                        2      b
                        ....
 Then the macro call:

    %lags(data=codes, var=code, by=seq, outfreq=freq);

 produces the lags dataset _lags_ for NLAG=1 that looks like this:

   SEQ    CODE    _LAG0    _LAG1
   
    1      c        c         
           a        a        c
           a        a        a
           b        b        a
           a        a        b
           ....
   
    2      c        c         
           c        c        c
           b        b        c
           b        b        b
           a        a        b
            ....

 The output 2-way frequency table (outfreq=freq) looks liks this:

   SEQ    _LAG0    _LAG1    COUNT
   
    1       a        a        2
            b        a        3
            c        a        2
            a        b        3
            b        b        1
            c        b        1
            a        c        2
            b        c        1
            c        c        0
   
    2       a        a        0
            b        a        0
            c        a        3
            a        b        1
            b        b        1
            c        b        2
            a        c        2
            b        c        3
            c        c        3

 */

%macro lags(data=_last_, 
        outlag=_lags_,   /* output dataset containing lag variables */
        outfreq=,        /* output dataset containing nlag-way 
frequencies */
        var=,            /* variable containing codes for events */
        varfmt=,         /* format for event variable */
        nlag=1,          /* number of lags to compute in the outlag 
dataset */
        by=,             /* by variable: separate lags for each */
        freqopt=norow nocol nopercent chisq,
        complete=ALL,    /* Should the contingency table be made 
complete?  */
        prefix=_lag);    /* prefix for names of lag variables */
        
%if &nlag = %str() %then %do;
        %put NLAG= must be specified;
        %goto done;
        %end;

%let abort=0;
%let complete = %upcase(&complete);
%if %upcase(&data) = _LAST_  %then %let data=&syslast;

%if %bquote(&by) ^=  %then %do;
   %let _byvars=;
   %let _bylast=;
   %let n=1;
   %let token=%qupcase(%qscan(&by,&n,%str( )));

   %do %while(&token^=);
      %if %index(&token,-) %then
         %put WARNING: Abbreviated BY list &token.  Specify by= 
individually.;
      %else %do;
         %let token=%unquote(&token);
         %let _byvars=&_byvars &token;
         %let _bylast=&token;
      %end;
      %let n=%eval(&n+1);
      %let token=%qupcase(%scan(&by,&n,%str( )));
   %end;
%let nby = %eval(&n-1);

%* put found &nby by variable(s) : &_byvars;

        proc sort data=&data;
                by &by;

%end;
        
%*-- Find type/missing value code for &var ;
proc contents data=&data out=_work_ noprint;
%if &syserr > 4 %then %let abort=1; %if &abort %then %goto DONE;

options nonotes;
data _null_;
        set _work_(keep=name type);
        *-- set a missing macro variable for char or numeric variables;
        if name = upcase("&var") then do;
                if type = 2 then miss="' '";
                else miss='.';
                call symput('missing', miss);
                call symput('type', left(put(type,1.0)));
                put type=;
        end;

data &outlag;
        set &data;
        %if %bquote(&by) ^=  %then %do;
                by &by;
                drop cnt;
        %end;
        %do i= &nlag %to 1 %by -1;
                &&prefix.&i = lag&i( &var);
                %end;
   &prefix.0 = &var;

        %if %bquote(&by) ^=  %then %do;
                if first.&_bylast then cnt=0;
                cnt+1;
                %do i = 1 %to &nlag;
                        if cnt <= &i then &&prefix.&i = &missing ;
                        %end;           
        %end;
%if &syserr > 4 %then %let abort=1; %if &abort %then %goto DONE;
        

*-- Frequencies for each lag vs lag0;
proc freq data=&outlag;
        %if &varfmt ^= %str() %then %do;
        format
        %do i = 0 %to &nlag;
                &&prefix.&i
                %end;  &varfmt %str(;);
        %end;
        %do i = 1 %to &nlag;
                tables &&prefix.&i * &prefix.0   / &freqopt;
                %end;           
        %if %bquote(&by) ^=  %then %do;
                by &by;
                %end;
        run;
        
*-- Output nlag-way data set containg lag frequencies;
%if &outfreq ^= %str() %then %do;

        %let sparse=;
        %if &complete ^= NO %then %let sparse = sparse;
        
        proc freq data=&outlag;
                *-- generate a tables lagn * lagn-1 * ... lag 0 statement;
                tables  
                %do i= &nlag %to 1 %by -1;
                        &&prefix.&i   *
                        %end;
                        &prefix.0 / noprint &sparse out=&outfreq;
                %if %bquote(&by) ^=  %then %do;
                        by &by;
                        %end;
        
        %*-- delete any missing lags;
        data &outfreq;
                set &outfreq(drop=percent) ;
                %do i= &nlag %to 1 %by -1; 
                        if &&prefix.&i ^= &missing ;
                         %end;
                         
        %*-- Resort to put BY variable(s) first;
        %if %bquote(&by) ^=  %then %do;
        proc sort data=&outfreq;
                by &by 
                        %do i= &nlag %to 0 %by -1; &&prefix.&i 
                         %end;
                         %str(;);
        %end;
                                
%end;

%done:
%if &abort %then %put ERROR: The LAGS macro ended abnormally.;
options notes;
%mend;

/*-------------------------------------------------------------------*
  *    Name: points.sas                                               *
  *   Title: Create an Annotate dataset to draw points in a plot      *
  *     Doc: http://www.math.yorku.ca/SCS/vcd/points.html             *
  *-------------------------------------------------------------------*
  *  Author:  Michael Friendly         <friendly@yorku.ca>            *
  * Created:  12 Nov 1998 10:26:09                                    *
  * Revised:  18 Nov 1998 08:37:44                                    *
  * Version:  1.1                                                     *
  *  - Added BY= parameter, IN=                                       *
  *                                                                   *
  * From ``Visualizing Categorical Data'', Michael Friendly (2000)    *         
  *-------------------------------------------------------------------
*/
 /* Description:

 The POINTS macro creates an annotate data set to draw point symbols
 in a 2D or 3D scatterplot.  This is useful when you need to plot two
 variables (e.g, observed, predicted) against a common X, with separate
 curves for the levels of a class variable.  In PROC GPLOT, for 
example,
 you cannot do

    proc gplot;
       plot (obs fit) * X = group;

 However, you can add the OBS points to a plot of fit*X:

   %points(x=X, y=obs);
    proc gplot;
       plot fit * X = group / anno=_pts_;

Usage:

 The POINTS macro is called with keyword parameters.  The X= and Y=
 parameters are required.  For a plot with PROC G3D, you must also
 give the Z= variable.
 The arguments may be listed within parentheses in any order, separated
 by commas. 
 
Parameters:

* DATA=       The name of the input data set [Default: DATA=_LAST_]

* X=          The name of the X variable for the scatterplot

* Y=          The name of the Y variable for the scatterplot

* Z=          The name of the Z variable for a 3D scatterplot

* BY=         The name(s) of any BY variable(s) to be used for multiple
              plots.

* CLASS=      The name of a class variable, to be used with PROC GPLOT
              in the PLOT statement for multiple curves, in the form
                                  
                                     plot Y * X = CLASS;

* SYS=        Specifies the Annotate XSYS & YSYS value [Default: SYS=2]

* COLOR=      Point color(s): the name of a dataset character variable,
              or an expression which evaluates to a SAS/GRAPH color, or
              string constant enclosed in quotes. [Default: 
COLOR='BLACK']

* SYMBOL=     Point symbol(s): the name of a dataset character 
variable,
              or an expression which evaluates to a SAS/GRAPH color, or
              string constant enclosed in quotes.  [Default: 
SYMBOL='DOT']

* SIZE=       The size of the symbol (in GUNIT units).  If not 
specified,
              the global graphics option HTEXT value is used.

* FONT=       Font for symbol(s): the name of a dataset character 
variable,
              or an expression which evaluates to a SAS/GRAPH color, or
              string constant enclosed in quotes.  Use for special
                                  symbols, e.g., FONT='MARKER'.  If not 
specified, the
                                  standard symbol font is used.

* SUBSET=     An expression (which may involve any dataset variables) 
to
              select points.  A point will be plotted if the expression 
                                  evaluates to non-zero for the current 
observation.
              [Default: SUBSET=1]

* COPY=       The names of any variables to be copied to output dataset

* IN=         The name of an optional input annotate data set.  If
              specified, the IN= data set is concatenated with the
                                  OUT= data set.

* OUT=        Name of the annotate data set produced. [Default: 
OUT=_PTS_]
                

 */

%macro points(
   data=_LAST_,
   x=,             /* X variable for scatterplot       */
   y=,             /* Y variable for scatterplot       */
   z=,             /* Z variable for G3D (optional)    */
   by=,            /* BY variable(s) (mult plots)      */
   class=,         /* CLASS variable (mult curves)     */
   sys=2,          /* XSYS & YSYS value                */
   color='BLACK',  /* symbol color (quote if const)    */
   symbol='dot',   /* plot symbol                      */
   size=,          /* size of symbol                   */
   font=,          /* font for symbol                  */
   subset=1,       /* expression to select points      */
   copy=,          /* vars copied to output dataset    */
   in=,            /* input annotate data set          */
   out=_pts_       /* annotate data set produced       */
   );

options nonotes; 
%if %length(&by) or %length(&class) %then %do;
        proc sort data=&data;
        by &by &class;
        %end;
run;

options notes;
data &out;
   set &data;
        %if %length(&by) or %length(&class) %then %do;
                by &by &class;
                %end;
   keep x y function text
                %if %length(&size) %then  size ;
        color &by &class &copy xsys ysys ;
   length function $8 text $ 8 ;
   xsys = "&sys"; ysys = "&sys"; function='SYMBOL';
   x = &x;
   y = &y;
   %if &z ^= %str() %then %do;
          zsys = "&sys"; keep z zsys;
          z = &z;
       %end;
        %if %length(&size)   %then %str(size=&size;);
   color=&color;
        text=&symbol;
   %if &font ^= %str() %then %do;
      keep style;
      style = &font;
      %end;

   if (&subset);
run;
%if %length(&in) %then %do;
        data &out;
                set &in &out;
                %if %length(&by) %then %do;
                        by &by;
                        %end;
        %end;

%mend;

/*-------------------------------------------------------------------*
  *    Name: distplot.sas                                             *
  *   Title: Plots for discrete distributions                         *
  *     Doc: http://www.math.yorku.ca/SCS/vcd/distplot.html           *
  *                                                                   *
  *  Hoaglin & Tukey, Checking the shape of discrete distributions.   *
  *   In Hoaglin, Mosteller & Tukey (Eds.), Exploring data tables,    *
  *   trends, and shapes, NY: Wiley, 1985.                            *
  *-------------------------------------------------------------------*
  *  Author:  Michael Friendly            <friendly@yorku.ca>         *
  * Created:  19 Mar 1991 08:48:26                                    *
  * Revised:   3 Nov 2000 10:09:20                                    *
  * Version:  1.2                                                     *
  *  1.1  Plot y * &count so label will not be required               *
  *       Allow for 0 frequencies                                     *
  *  1.2  Added indicated parameter change plots                      *
  *       Fixed bugs in ngebin and geometric                          *
  *       Fixed validvarname for V7+                                  *
  * Requires: %words %label %gskip                                    *
  *                                                                   *
  * From ``Visualizing Categorical Data'', Michael Friendly (2000)    *         
  *-------------------------------------------------------------------
*/

 /* Description:
 
 The DISTPLOT macro constructs plots of a discrete distribution
 designed to diagnose whether the data follows one of the standard
 distributions: the Poisson, Binomial, Negative Binomial, Geometric,
 or Log Series, specified by the DIST= parameter.  The usual
 (PLOT=DIST) plot is constructed so that the points lie along a
 straight line when the data follows that distribution.  An influence
 plot (PLOT=INFL) shows the influence of each observation on the
 choice of the distribution parameter(s).

Usage:

 The DISTPLOT macro is called with keyword parameters.
 You must specify the distribution to be fit (DIST=). and the
 COUNT= and FREQ= variables.
 The arguments may be listed within parentheses in any order, separated
 by commas. For example: 
 
        %distplot(data=queues, count=women, freq=queues, dist=binomial,
            parm=0.435);
 
Parameters:

* DATA=       The name of the input data set [Default: DATA=_LAST_]

* COUNT=      Basic count variable

* FREQ=       Number of occurrences of count

* LABEL=      Horizontal (count) label

* DIST=       Name of distribution, one of POISSON, BINOMIAL, 
              GEOMETRIC, LOGSERIES, or NEGBIN.

* PARM=       Trial value of the distribution parameter(s) to level the 
              plot.  For the Binomial distribution, PARM=p, the 
binomial
                                  probability of success; for the Poisson, 
PARM=lambda,
                                  the Poisson mean.  For the Geometric and 
Negative binomial,
                                  PARM=p

* Z=          Multiplier for error bars in the PLOT=DIST plot.
              [Default: Z=1.96]

* PLOT=       What to plot: DIST and/or INFL [Default: PLOT=DIST]

* HTEXT=      Height of text labels in the plots [Default: HTEXT=1.4]

* OUT=        The name of the output data set [Default: OUT=DISTPLOT]

* NAME=       Name of the graphics catalog entry [Default: 
NAME=DISTPLT]

 */
%macro distplot(
     data=_last_,       /* name of input data set                   */
     count=,            /* basic count variable                     */
     freq=,             /* number of occurrences of count           */
     label=,            /* Horizontal (count) label                 */
     dist=,             /* name of distribution                     */
     parm=,             /* trial value of parm(s) to level the plot */
     z=1.96,            /* multiplier for error bars                */
     plot=DIST,         /* What to plot: DIST and/or INFL           */
          htext=1.4,         /* Height of text labels                    
*/
          out=distplot,      /* name of the output data set              
*/
     name=distplt       /* graphics catalog entry                   */
          );
 
        %*-- Reset required global options;
        %if &sysver >= 7 %then %do;
                %local o1 o2;
                %let o1 = %sysfunc(getoption(notes));
                %let o2 = %sysfunc(getoption(validvarname,keyword));
                options nonotes validvarname=V6;
                %end;
        %else %do;
           options nonotes;
                %end;

%*if &label=%str() %then %let label=&count;
%let plot=%upcase(&plot);
%let dist=%upcase(&dist);
%if %length(&z)=0 %then %let z=0;

%*-- Determine if any parameters were passed;
%global parm1 parm2;
%let parm1=0;
%let parm2=0;
%let nparm = %words(&parm, root=parm);
%* put nparm= &nparm parm1=&parm1 parm2=&parm2;

proc means data=&data N sum sumwgt mean var noprint vardef=weight;
   var &count;
   weight &freq;
   output out=_sum_ sumwgt=N sum=sum mean=_mean_ var=_var_ max=_max_;

data &out;
   set &data nobs=_cells_;
   if _n_=1 then set _sum_(drop=_type_ _freq_);
        drop kf k;
   k = &count;
   nk= &freq;                     * n(k);
        kf= gamma(k+1);                * k!  ;
  
   *-- centered value n*(k), Hoaglin & Tukey, Eqn 9 ;
   p = nk / N;
   if nk >=2 then nkc = nk - .67 - .8*p;
             else nkc = exp(-1);

        %if &dist=POISSON %then %do;
                *-- if levelling the plot, subtract centering value;
                %if &nparm = 0
                                %then %str( level =  0; );
                                %else %str( level = &parm1 - k * log( &parm1 ); 
);
           y =  log(kf * nk / N) + level ;
           yc=  log(kf * nkc/ N) + level ;
                parm = _mean_;       *-- estimate of lambda;
                call symput('eparm', left(put(parm, 6.3)));
           phat = exp(-_mean_) * _mean_**&count / kf;
                if nk<=1 then do;
                        ylo = log(kf*nkc/N) - 2.677 + level;
                        yhi = log(kf*nkc/N) + 2.717 - 2.3/N + level;
                        h = (yhi-ylo)/2;
                        end;
                %end;
        %else %if &dist=BINOMIAL %then %do;
                %if &nparm = 0
                                %then %str( level =  0; );
                                %else %do;
                                        level = -(_max_ * log (1-&parm1) + 
k*log(&parm1/(1-&parm1)));
                                        %end;
                bnk = gamma(_max_+1) / ( gamma(k+1)*gamma(_max_-k+1) );
                y =  log(nk / (N * bnk)) + level ;
                yc=  log(nkc/ (N * bnk)) + level ;

                parm = _mean_ / _max_;     *-- estimate of p;
                call symput('eparm', left(put(parm, 6.3)));
                phat = bnk * (parm**k) * parm**(_max_-k);

                if nk<=1 then do;
                        ylo = log(nkc/(N * bnk)) - 2.677 + level;
                        yhi = log(nkc/(N * bnk)) + 2.717 - 2.3/N + level;
                        h = (yhi-ylo)/2;
                        end;
                %end;
        %else %if &dist=NEGBIN %then %do;
                %if &nparm=0 %then %do;
                        parmn = _mean_**2 / (_var_-_mean_);     *-- n, moment 
est;
                        parm  = _mean_/_var_;                   *-- p, moment 
est;
                        level = 0;
                        %end;
                %else %if &nparm=1 %then %do;
                        parmn = &parm1;
                        parm = _mean_/_var_;                    *-- p, moment 
est;
                        level = -(parmn * log(parm) + k*log(1-parm));
                        %end;
                %else %do;
                        parmn = &parm1;
                        parm = &parm2;
                        level = -(parmn * log(parm) + k*log(1-parm));
                        %end;
*               parm  = (_var_/_mean_**2) -1;           *-- p, moment est;
                bnk = gamma(parmn+k) / (gamma(k+1) * gamma(parmn));
                phat = bnk * (parm**parmn) * (1-parm)**k;
                y =  log(nk / (N * bnk)) + level ;
                yc=  log(nkc/ (N * bnk)) + level ;

                if nk<=1 then do;
                        ylo = log(nkc/(N * bnk)) - 2.677 + level;
                        yhi = log(nkc/(N * bnk)) + 2.717 - 2.3/N + level;
                        h = (yhi-ylo)/2;
                        end;
                %end;
        %else %if &dist=GEOMETRIC %then %do;
                %if &nparm = 0
                        %then %str( level =  0; );
                        %else %str( level = -(log(&parm1) + k*log(1-
&parm1)););
                y =  log(nk / N ) + level ;
                yc=  log(nkc/ N ) + level ;

                parm = 1/_mean_;
                call symput('eparm', left(put(parm, 6.3)));
                phat = parm*(1-parm)**(k-1);

                if nk<=1 then do;
                        ylo = log(nkc/ N ) - 2.677 + level;
                        yhi = log(nkc/ N ) + 2.717 - 2.3/N + level;
                        h = (yhi-ylo)/2;
                        end;
                %end;
        %else %if &dist=LOGSERIES %then %do;
                %if &nparm = 0
                                %then %str( level =  0; );
                y =  log(k * nk / N ) + level ;
                yc=  log(k * nkc/ N ) + level ;

                *Birch estimator;
                parm = 1 - (1 / (1 + ((5/3)- log(mean)/16)*(mean-
1)+2)*log(mean));
                call symput('eparm', left(put(parm, 6.3)));
                if nk<=1 then do;
                        ylo = log(k * nkc/ N ) - 2.677 + level;
                        yhi = log(k * nkc/ N ) + 2.717 - 2.3/N + level;
                        h = (yhi-ylo)/2;
                        end;
                %end;
 
   *-- half-length of confidence interval for log(eta-k) [Eqn 10];
        if nk>1 then do;
                h = &z * sqrt( (1-p) / ( nk-((.47+.25*p)*sqrt(nk)) ) );
                ylo = yc - h;
                yhi = yc + h;
                end;

        *-- Estimated prob and expected frequency;
   exp  = N * phat;

        *-- Leverage and apparent parameter values (p.402);
/*
   %if &parm1 = 0
                %then %str(lev = (&count / _mean_) - 1;);
                %else %str(lev = (&count / &parm1) - 1;);
        hc = sign(lev) * lev / h;
        vc = sign(lev) * (log(nkc) - log(exp)) / h;
        slope = vc / hc;   *-- (lambda - lambda0);
        label y = 'Count metameter'
                hc = 'Scaled Leverage'
                vc = 'Relative parameter change'
                slope = 'Parameter change';
*/

proc print data=&out;
   id &count;
*   var nk y nkc yc h ylo yhi lev hc vc;
   sum nk nkc;
*   format y yc h yhi ylo 6.3 lev hc vc 6.2;

/* 
*-- Calculate goodness of fit chisquare;
data fit;
   set &out;
   chisq= (nk - exp)**2 / exp;
proc print data=fit;
   id &count;
   var nk p phat exp chisq;
   sum nk exp chisq;
*/

*-- Find slope, intercept of line;
proc reg data=&out outest=_parms_ noprint;
   model y =  &count;
data _stats_;        *-- Annotate data set to label the plot;
   set _parms_ (keep=&count intercep);
   set _sum_   (keep=_mean_);
        b = &count;
        a = intercep;
   drop &count intercep ek a b;
   length text $30 function $8;
   xsys='1'; ysys='1';
   x=15;
        *-- set label y location based on slope;
        if &count > 0
                then y=96;
                else y=16; 
   function = 'LABEL';
   size = &htext;
   color = 'RED';
   position='3'; text ='slope(b) = '||left(put(b,f6.3)); output;
   position='6'; text ='intercept= '||left(put(a,f6.3)); output;

   y=y-6;
        %if &dist=POISSON %then %do;
                ek = exp(b - &parm1);
                position='6'; text ="lambda:  mean = &eparm"; output;
                position='9'; text ='       exp(b) = '||put(ek,5.3); 
output;
        %end;
        %else %if &dist=BINOMIAL %then %do;
                %if &parm1>0 %then 
                        %str( b = b - log (&parm1/(1-&parm1)); );
                ek = exp(b)/(1+exp(b));
                position='6'; text ="p:      mean/n = &eparm"; output;
                position='9'; text ='   e(b)/1+e(b) = '||put(ek,5.3); 
output;
        %end;
        %else %if &dist=NEGBIN %then %do;
                %if &parm1>0 %then 
                        %str( b = b - log (1-&parm1); );
                ek = 1 - exp(b);
                en = a / log(ek);
                position='6'; text ='n:  a/log(p) = '||put(en,5.3); output; 
                position='9'; text ='p:    1-e(b) = '||put(ek,5.3); output;
        %end;
        %else %if &dist=GEOMETRIC %then %do;
                %if &parm1>0 %then 
                        %str( b = b - log (1-&parm1); );
                ek = 1 - exp(b);
                en = exp(a);
                position='3'; text ="p:   1/mean = &eparm";        output; 
                position='6'; text ='p:   1-e(b) = '||put(ek,5.3); output;
*               y = y-4;
                position='9'; text ='p:   e(a)   = '||put(en,5.3); output; 
        %end;
        %else %if &dist=LOGSERIES %then %do;
                %if &parm1>0 %then 
                        %str( b = b - log (&parm1); );
                ek = exp(b);
/*              position='6'; text ='p:   e(a)   = '||put(en,5.3); output;   
*/
                position='9'; text ='p:   1-e(b) = '||put(ek,5.3); output;
        %end;

%let order=;
%if &z > 0 %then %do;
data _conf_;
   set &out;
   drop yc;
   xsys='2'; ysys='2';
   x = &count;    line=33;
   y = yc;   function='MOVE  '; output;
   text='+'; function='SYMBOL'; output;
   y = yhi;  function='DRAW  '; output;
   y = yc;   function='MOVE  '; output;
   y = ylo;  function='DRAW  '; output;
data _stats_;
   set _stats_ _conf_;

*-- find range of confidence limits to set y axis extrema;
proc means data=_conf_ noprint;
   var y;
   output out=_range_ min=min max=max;
data _null_;
   set _range_;
        inc = 1;
        if (max-min)>10 then inc=2;
   min = inc * floor(min/inc);
   max = inc * ceil(max/inc);
   call symput('MIN', left(put(min,3.)));
   call symput('MAX', left(put(max,3.)));
   call symput('INC', left(put(inc,3.)));
run;
%let order = order=(&min to &max by &inc);
%end; /* %if &z */

*-- Poissonness-style distribution plot;
proc gplot data=&out;
   plot y * &count / anno=_stats_ vaxis=axis1 haxis=axis2 hminor=0 
vminor=1
                name="&name"
                des="Distribution plot of &count";
   symbol v=- h=2 i=rl c=black;
   axis1 &order
         label=(a=90 r=0
          'Count metameter')
         value=(h=&htext);
   axis2 offset=(3) minor=none
        %if %length(&label) %then %do;
         label=("k  (&label)")
                        %end;
         value=(h=&htext);
        run;

*-- Indicated parameter change (infl) plot;
        %if %index(&plot,INFL) %then %do;
        %label(data=&out, y=vc,    x=hc, text=&count, out=_anno1_, 
size=);
        *-- Draw lines from origin to each point;
        data _lines_;
                set &out(keep=hc vc);
                xsys='2'; ysys ='2';
                x=0;  y=0;  function='move    '; output;
                x=hc; y=vc; function='draw    '; output;
        data _anno1_;
                set _anno1_ _lines_;

        %label(data=&out, y=slope, x=hc, text=left(put(&count,2.)),
                out=anno2, size=);
        %gskip;
        symbol v=- h=2 color=black i=none;
        axis1 label=(a=90);

        proc gplot data=&out;
                plot vc * hc  /
                        hzero vref=0 lvref=33 anno=_anno1_ vaxis=axis1 
hminor=1 vminor=1
                        name="&name"
                        des="Parameter change plot of &count";
                run;
        %gskip;
        proc gplot data=&out;
                bubble slope * hc =hc / 
                        vref=0 lvref=33 anno=anno2 vaxis=axis1  hminor=1 
vminor=1
                        bsize=40 bcolor=red bscale=radius
                        name="&name"
                        des="Parameter change plot of &count";

        run; quit;
%end;
*-- Clean up datasets no longer needed;
proc datasets lib=work memtype=data nolist;
   delete _sum_ _stats_ _conf_ _parms_;
        run; quit;

        %*-- Restore global options;
        %if &sysver >= 7 %then %do;
                options &o1 &o2;
                %end;
        %else %do;
           options notes;
                %end;

%mend;

/*-------------------------------------------------------------------*
  *    Name: gskip.sas                                                *
  *   Title: Device-independent macro for multiple plots              *
  *     Doc: http://www.math.yorku.ca/SCS/vcd/gskip.html              *
  *-------------------------------------------------------------------*
  *  Author:  Michael Friendly            <friendly@yorku.ca>         *
  * Created: 12 Jul 96 16:43                                          *
  * Revised: 02 Jan 99 12:41                                          *
  * Version: 1.0                                                      *
  *                                                                   *
  * From ``Visualizing Categorical Data'', Michael Friendly (2000)    *         
  *-------------------------------------------------------------------
*/
 /* Description:
 
 The GSKIP macro is designed to handle difficulties in producing
 multiple plots in one SAS job.  For EPS, GIF, CGM and WMF drivers,
 it assigns a new output filename for the next plot.  For FOILS
 (on continuous forms) it skips the normally blank non-foil
 separator page.  Otherwise, it has no effect.

Usage:

 The GSKIP macro has one optional positional parameter.  It relies on 
global
 macro parameters, DISPLAY, DEVTYP, FIG, GSASFILE, and GSASDIR.
 These parameters are normally initialized either in the 
F<AUTOEXEC.SAS>
 file, or in device-specific macros.  For example, for normal graphic
 output to the Graph Window, assign DISPLAY and DEVTYP as
 
   %let devtyp=SCREEN;
        %let displa=ON;

 For EPS file output,
 
   %let devtyp=EPS;
        %let fig=1;
        %let gsasfile=myfig;
 
 GSKIP is normally used after each graphic procedure or macro to 
advance
 the FIG counter and open a new graphic output file.  For example,
 
   proc gplot;
                plot y * x;
        %gskip();

Parameters:

* INC       The value by which the FIG counter is incremented, normally
            1 (the default).  Use the INC parameter after a plot with
                                a BY statement.
 
Global Parameters:

* DISPLAY       String value, ON or OFF, usually set by the GDISPLA macro.
            The GISKP macro takes no action if DISPLAY=OFF.

* DEVTYP    String value, the type of graphic device driver.  The
            values EPS, GIF, CGM and WMF cause FIG= to be incremented
                                and a new output filename assigned.  If 
DEVTYP=FOILS,
                                a blank graphic page is produced.  All others 
are ignored.

* FIG       A numeric value, the number of the current figure.

* GSASFILE  String value, the basename of the graphic output file(s).
            The output files are named according to the macro 
expression
                                
               %scan(&gsasfile,1,.)&fig..%lowcase(&devtyp)

            e.g.,  myfile1.eps, myfile2.eps, ....
                                
* GSASDIR   String value, the output directory in which the graphic
            files are written.  If not specified, output goes to the
                                current directory.
 
 */

%global fig gsasfile gsasdir display devtyp;
%macro gskip(inc);
/*  quit; run; */
        %if &DISPLAY = OFF %then %goto done;    %*-- Only if we are 
displaying;
        %if %upcase(&devtyp)=EPS or %upcase(&devtyp)=GIF
         or %upcase(&devtyp)=CGM or %upcase(&devtyp)=WMF 
                %then %do;
        %if %length(&inc)=0 %then %let inc=1;
   %if %defined(gsasdir)=0 %then %let gsasdir=;
                %let fig = %eval(&fig + &inc);
                %let gsas = %scan(&gsasfile,1,.)&fig..%lowcase(&devtyp);
                %put GSKIP: gsasfile now: "&gsasdir.&gsas";
                filename gsas&fig  "&gsasdir.&gsas";
                goptions gsfname=gsas&fig;
        %end;
        
/* Skip the blank page in Zeta foils */
   %if &devtyp=FOILS %then %do;
   Proc Gslide;
      note j=c 'Page skip for FOILS';
   run;
   %end;
%done:;
%mend gskip;

/*-------------------------------------------------------------------*
  *    Name: logodds.sas                                              *
  *   Title: Plot empirical log-odds for logistic regression          *
  *     Doc: http://www.math.yorku.ca/SCS/vcd/logodds.html            *
  *-------------------------------------------------------------------*
  *  Author:  Michael Friendly            <friendly@yorku.ca>         *
  * Created:  6 Nov 1997 08:45:22                                     *
  * Revised:  23 Jun 2000 10:46:42                                    *
  * Version: 1.0                                                      *
  *  Updated for V7+ (VALIDVARNAME)                                   *
  *                                                                   *
  * From ``Visualizing Categorical Data'', Michael Friendly (2000)    *         
  *-------------------------------------------------------------------
*/
 /* Description:
 
 For a binary response variable, Y, taking values 0 or 1, and a
 continuous independent variable, X, the LOGODDS macro groups the
 X variable into some number of ordered, non-overlapping intervals.
 It plots the empirical log-odds of Y=1 (and/or Pr{Y=1}) against
 X for each interval of X, together with the fitted linear logistic
 relation, an optional smoothed curve (using the LOWESS macro),
 and the observed binary responses.
 

Usage:

 The input data to be plotted must be in case form.
 The LOGODDS macro is called with keyword parameters.  The X= and Y=
 variables are required.
 The arguments may be listed within parentheses in any order, separated
 by commas. For example: 
 
        %include catdata(icu);
        %logodds(data=icu, x=age, y=died, smooth=0.25, ncat=16,
                options=order=data);
 
Parameters:

* X=          Name of the continuous independent variable

* Y=          Name of the binary response variable

* EVENT=      Value of Y for the event of interest [Default: EVENT=1]

* DATA=       The name of the input data set [Default: DATA=_LAST_]

* OPTIONS=    Options for PROC LOGISTIC, for example, 
OPTIONS=DESCENDING.

* NCAT=       Number of categories of the X variable.
              For example, if deciles of X are desired, use NCAT=10. 
                                  [Default: NCAT=10]

* PLOT=       Scale(s) for the response. PLOT=LOGIT gives a plot on the
              logit scale, PLOT=PROB on the probability scale.
              [Default: PLOT=LOGIT PROB]

* SMOOTH=     Smoothing parameter for a lowess smooth, in the interval
              (0-1).  No smooth curve is produced unless a SMOOTH=
                                  value is specified.  

* SHOW=       Specifies whether to plot the binary observations.
              [Default: SHOW=OBS]

* OBS=        Specifies how to display the binary observations.  If
              OBS=STACK, the observations are plotted in vertical
                                  columns at the top (Y=1) or bottom (Y=0) of 
the plot.
                                  If OBS=JITTER a small random quantity is 
added (Y=0)
                                  or subtracted (Y=1) to the Y value.  
[Default: OBS=STACK]

* NAME=       The name of the graph in the graphic catalog [Default: 
NAME=LOGODDS]

* GOUT=       The name of the graphic catalog [Default: GOUT=GSEG]
                
 */
 
%macro logodds(
        x=,               /* Name of the continuous independent variable 
*/
        y=,               /* Name of the binary response variable        
*/
        event=1,          /* Value of Y for the event of interest        
*/
        data=_last_,
        options=,         /* Options for PROC LOGISTIC                   
*/
        ncat=10,          /* Number of categories of the X variable      
*/
        plot=logit prob,  /* Scale(s) for the response                   
*/
        smooth=,          /* Smoothing parameter for a lowess smooth     
*/
        hsym=1.5,
        show=obs,         /* How and whether to plot the binary obs.     
*/
        obs=stack,
        name=logodds,
        gout=gseg
        );

%let plot=%upcase(&plot);
%let show=%upcase(&show);

        %*-- Reset required global options;
        %if &sysver >= 7 %then %do;
                %local o1 o2;
                %let o1 = %sysfunc(getoption(notes));
                %let o2 = %sysfunc(getoption(validvarname,keyword));
                options nonotes validvarname=upcase;
                %end;
        %else %do;
           options nonotes;
                %end;

%let abort=0;
%if &x=%str() or &y=%str() %then %do;
   %put ERROR: The X= and Y= variables must be specified;
        %let abort=1;
   %goto DONE;
   %end;
        

proc logistic /*noprint*/ data=&data &options;
   model  &y = &x ;
   output out=results p=predict l=lower u=upper xbeta=plogit 
stdxbeta=selogit;

data results;
        set results;
        uplogit = plogit + selogit;
        lologit = plogit - selogit;
        logit = log(((&y=&event)+.25)/((1-(&y=&event))+.25));
proc sort;
        by &x;

proc sort data=&data;
   by &x &y;
%if &syserr > 4 %then %let abort=1; %if &abort %then %goto DONE;

proc rank data=&data groups=&ncat out=_grouped;
        var &x;
        ranks _gp_;

proc summary data=_grouped nway;
        class _gp_;
        var &x &y;
        output out=_groups_ mean(&x)=xmean sum(&y)=ysum min(&x)=xmin;

data _logits_;
        set _groups_(rename=(_freq_=n) drop=_type_) end=eof;
        label xmean="Mean &x"
              logit="Log Odds &y=&event";
        p = (ysum+.5) / (n+.5);
        logit = log( (ysum+.5) / (n-ysum+.5)  );
        nobs + n;
        if eof then call symput('nobs', put(nobs,8.));
run;
proc print;
        id _gp_;
        var xmin ysum n logit p;

data _pts_;
        set _logits_;
        xsys = '2'; ysys='2';
        x = xmean; y=logit; function='symbol'; size=2; text='square';

%*-- Mark the boundaries between adjacent groups;
data _marks_;
        set _logits_;
        xsys = '2'; ysys = '1';
        color = 'red'; when='A';
        x = xmin; y=0;    function='MOVE    '; output;
        x = xmin; y=2.25; function='DRAW    '; output;

%if %index(&show,OBS) %then %do;
%let otype = %upcase(%scan(&obs,1));
%let oparm = %scan(&obs,2,%str( ));
%let osym  = %scan(&obs,3,%str( ));
%if %length(&osym)=0 %then %let osym=dot;
data _obs_;
        set &data(keep=&x &y);
        by &x &y;
        drop i;
        length text $8;
        if first.&y then i=0;
        xsys = '2'; ysys='1';
        x = &x;
        %if &otype=STACK %then %do;
                %if %length(&oparm)=0 %then %let oparm=2;
                y=100*&y + (3+(&oparm)*i)*sign(.5-&y);
                %end;
        %else %do; /* &otype=JITTER */
                %if %length(&oparm)=0 %then %let oparm=10;
                y=100*&y + (3+(&oparm)*uniform(0))*sign(.5-&y);
                %end;
        function='symbol';
        size=1.3; text="&osym"; color='green  ';
        i+1;
%end;

data _pts_;
        set _pts_
                 %if %index(&show,OBS) %then _obs_ ;
                 _marks_;


%if %length(&smooth)>0 %then %do;
%lowess(data=results, x=&x, y=logit, gplot=NO, pplot=NO, 
outanno=_smooth_,
        silent=YES, robust=0, iter=1, f=&smooth, line=22);

data _pts_;
        set _pts_ _smooth_;
%end;

%if %index(&PLOT,LOGIT) %then %do;
proc gplot data=results gout=&gout;
   plot
        plogit  * &x = 1
        uplogit * &x = 2
        lologit * &x = 2
        / frame overlay vaxis=axis1 anno=_pts_ hminor=1 vminor=1
                  name="&name"
                  des="Empirical log-odds plot of &data";
   axis1 label=(a=90) offset=(3pct);
   symbol1 v=none   i=join l=1  w=3 c=blue;
   symbol2 v=none   i=join l=20 w=2 c=blue;
        label  plogit="Log Odds &y=1";
run; quit;
%gskip;
%end;

%if %index(&PLOT,PROB) %then %do;
data _pts_;
        set _logits_;
        xsys = '2'; ysys='2';
        x = xmean; y=p; function='symbol'; size=2; text='square';

data _pts_;
        set _pts_
                 %if %index(&show,OBS) %then _obs_ ;
                 _marks_;

%if %length(&smooth)>0 %then %do;
%lowess(data=&data, x=&x, y=&y, gplot=NO, pplot=NO, outanno=_smooth_,
        silent=YES, robust=0, iter=1, f=&smooth, line=22);

data _pts_;
        set _pts_ _smooth_;
%end;

proc gplot data=results gout=&gout;
   plot 
        predict * &x = 1
        upper * &x = 2
        lower * &x = 2
        / frame overlay vaxis=axis1 anno=_pts_ hminor=1 vminor=1
                  name="&name"
                  des="Empirical probability plot of &data";
                 
   axis1 label=(a=90) offset=(3pct) order=(0 to 1 by .2);
   symbol1 v=none   i=join l=1  w=3 c=blue;
   symbol2 v=none   i=join l=20 w=2 c=blue;
        label predict = "Probability &y=&event";
        format &y 4.1;
        run; quit;
        goptions reset=symbol;
%end;

%done:
        %*-- Restore global options;
        %if &sysver >= 7 %then %do;
                options &o1 &o2;
                %end;
        %else %do;
           options notes;
                %end;

%if &abort %then %put ERROR: The LOGODDS macro ended abnormally.;
%mend;


/*-------------------------------------------------------------------*
 *    Name: poisplot.sas                                             *
 *   Title: Poissonness plot for discrete distributions              *
 *     Doc: http://www.math.yorku.ca/SCS/vcd/poisplot.html           *
 *     Ref:                                                          *
 *  Hoaglin & Tukey, Checking the shape of discrete distributions.   *
 *   In Hoaglin, Mosteller & Tukey (Eds.), Exploring data tables,    *
 *   trends, and shapes, NY: Wiley, 1985.                            *
 *-------------------------------------------------------------------*
 *  Author:  Michael Friendly            <friendly@yorku.ca>         *
 * Created:  19 Mar 1991 08:48:26                                    *
 * Revised:   9 Nov 2000 11:24:23                                    *
 * Version:  1.3                                                     *
 *  1.1  Plot y * &count so label will not be required               *
 *       Allow for 0 frequencies                                     *
 *  1.2  Added indicated parameter change plots                      *
 *  1.3  Fixed validvarname for V7+                                  *
 *                                                                   *
 *  From ``Visualizing Categorical Data'', Michael Friendly (2000)   *         
 *-------------------------------------------------------------------*/
 /* Description:
 
 The POISPLOT macro constructs a ``Poissonness plot'' for determining
 if discrete data follows the Poisson distribution.  The plot has a
 linear relation between the count metameter n(k) and the basic
 count, k, when the distribution is Poisson.  An influence
 plot displays the effect of each observed frequency on the choice of
 the Poisson parameter, lambda.
 
Usage:

 The POISPLOT macro is called with keyword parameters.  The COUNT=
 and FREQ= parameters are required.  
 The arguments may be listed within parentheses in any order, separated
 by commas. For example: 
 
        data horskick;
                input deaths corpsyrs;
                label deaths='Number of Deaths'
                        corpsyrs='Number of Corps-Years';
        cards;
                        0    109
                        1     65
                        2     22
                        3      3
                        4      1
        ;
        %poisplot(count=Deaths,freq=corpsyrs, plot=dist);
 
Parameters:

* DATA=       The name of the input data set [Default: DATA=_LAST_]

* COUNT=      The name of the basic count variable

* FREQ=       The name of the variable giving the number of occurrences
              of COUNT

* LABEL=      Label for the horizontal (COUNT=) variable.  If not 
specified
              the variable label for the COUNT= variable in the input
                                  data set is used.

* LAMBDA=     Trial value  of the Poisson parameter lambda to level the 
plot.
              If LAMBDA=0 (the default) the plot is not levelled.

* Z=          Multiplier for error bars [Default: Z=1.96]

* PLOT=       What to plot: DIST and/or INFL [Default: PLOT=DIST INFL]

* HTEXT=      Height of text labels [Default: HTEXT=1.4]

* OUT=        The name of the output data set [Default: OUT=POISPLOT]

* NAME=       Name of the graphics catalog entry [Default: 
NAME=POISPLT]
                

*/
 
%macro poisplot(
     data=_last_,
     count=,            /* basic count variable                     */
     freq=,             /* number of occurrences of count           */
     label=,            /* Horizontal (count) label                 */
     lambda=0,          /* trial value of lambda to level the plot  */
     z=1.96,            /* multiplier for error bars                */
          plot=DIST INFL,    /* What to plot: DIST and/or INFL           
*/
          htext=1.4,
          out=poisplot,
          name=poisplt
          );
 
        %*-- Reset required global options;
        %if &sysver >= 7 %then %do;
                %local o1 o2;
                %let o1 = %sysfunc(getoption(notes));
                %let o2 = %sysfunc(getoption(validvarname,keyword));
                options nonotes validvarname=V6;
                %end;
        %else %do;
           options nonotes;
                %end;

%let abort=0;
%if %length(&count)=0 | %length(&freq)=0
   %then %do;
      %put ERROR: The COUNT= and FREQ= variables must be specified;
      %let abort=1;
      %goto DONE;
   %end;

%*if &label=%str() %then %let label=&count;
%let plot=%upcase(&plot);
%if %length(&z)=0 %then %let z=0;

proc means data=&data N sum sumwgt mean var noprint;
   var &count;
   weight &freq;
   output out=sum sumwgt=N sum=sum mean=mean;

data &out;
   set &data;
   if _n_=1 then set sum(drop=_type_ _freq_);
        drop kf k;
   k = &count;
   nk= &freq;                     * n(k);
   kf= gamma(k+1);                * k!  ;
 
   *-- if levelling the plot, subtract centering value;
   %if &lambda = 0
       %then %str( level =  0; );
       %else %str( level = &lambda - k * log( &lambda ); );
 
   y =  log(kf * nk / N) + level ;    * poisson metameter;
 
   *-- centered value n*(k), Hoaglin & Tukey, Eqn 9 ;
   p = nk / N;
   if nk >=2 then nkc = nk - .67 - .8*p;
             else nkc = exp(-1);
*       if nk > 0 then;
           yc=  log(kf * nkc/ N) + level ;
 
   *-- half-length of confidence interval for log(eta-k) [Eqn 10];
        if nk<=1 then do;
                ylo = log(kf*nkc/N) - 2.677 + level;
                yhi = log(kf*nkc/N) + 2.717 - 2.3/N + level;
                h = (yhi-ylo)/2;
                end;
        else do;
                h = &z * sqrt( (1-p) / ( nk-((.47+.25*p)*sqrt(nk)) ) );
                ylo = yc - h;
                yhi = yc + h;
                end;

        *-- Estimated prob and expected frequency;
   phat = exp(-mean) * mean**&count / kf;
   exp  = N * phat;

        *-- Leverage and apparent parameter values (p.402);
   %if &lambda = 0
                %then %str(lev = (&count / mean) - 1;);
                %else %str(lev = (&count / &lambda) - 1;);
        hc = sign(lev) * lev / h;
        vc = sign(lev) * (log(nkc) - log(exp)) / h;
        slope = vc / hc;   *-- (lambda - lambda0);
        label y = 'Count metameter'
                hc = 'Scaled Leverage'
                vc = 'Relative metameter change'
                slope = 'Parameter change';

proc print data=&out;
   id &count;
   var nk y nkc yc h ylo yhi lev hc vc;
   sum nk nkc;
   format y yc h yhi ylo 6.3 lev hc vc 6.2;
 
*-- Calculate goodness of fit chisquare;
data fit;
   set &out;
   chisq= (nk - exp)**2 / exp;
proc print data=fit;
   id &count;
   var nk p phat exp chisq;
   sum nk exp chisq;

*-- Find slope, intercept of line;
proc reg data=&out outest=parms noprint;
   model y =  &count;
data stats;        *-- Annotate data set to label the plot;
   set parms (keep=&count intercep);
   set sum   (keep=mean);
   drop &count intercep;
   length text $30 function $8;
   xsys='1'; ysys='1';
   x=15;
        *-- set label y location based on slope;
        if &count > 0
                then y=96;
                else y=16; 
   function = 'LABEL';
   size = &htext;
   color = 'RED';
   position='3'; text ='slope =   '||put(&count,f6.3);      output;
   position='6'; text ='intercept='||put(intercep,f6.3); output;
   ek = exp(&count - &lambda);
   y=y-6;;
   position='6'; text ='lambda:  mean = '||put(mean,5.3); output;
   position='9'; text ='       exp(slope) = '||put(ek,5.3); output;

%let order=;
%if &z > 0 %then %do;
data conf;
   set &out;
   drop yc;
   xsys='2'; ysys='2';
   x = &count;    line=33;
   y = yc;   function='MOVE  '; output;
   text='+'; function='SYMBOL'; output;
   y = yhi;  function='DRAW  '; output;
   y = yc;   function='MOVE  '; output;
   y = ylo;  function='DRAW  '; output;
data stats;
   set stats conf;

*-- find range of confidence limits to set y axis extrema;
proc means data=conf noprint;
   var y;
   output out=range min=min max=max;
data _null_;
   set range;
        inc = 1;
        if (max-min)>10 then inc=2;
   min = inc * floor(min/inc);
   max = inc * ceil(max/inc);
   call symput('MIN', left(put(min,2.)));
   call symput('MAX', left(put(max,2.)));
   call symput('INC', left(put(inc,2.)));
run;
%let order = order=(&min to &max by &inc);
%end; /* %if &z */

*-- Poissonness plot;
proc gplot data=&out;
   plot y * &count / anno=stats vaxis=axis1 haxis=axis2
                name="&name"
                des="Poissonness plot of &count";
   symbol v=- h=2 i=rl c=black;
   axis1 &order
         label=(a=90 r=0 h=1.4
          'Poisson metameter, ln(k!  n(k) / N)')
         value=(h=&htext);
   axis2 offset=(3) minor=none
        %if %length(&label) %then %do;
         label=("k  (&label)")
                        %end;
         value=(h=&htext);
        run; quit;

*-- Indicated parameter change (infl) plot;
        %if %index(&plot,INFL) %then %do;
        %label(data=&out, y=vc,    x=hc, text=&count, out=anno1, size=);
        *-- Draw lines from origin to each point;
        data lines;
                set &out(keep=hc vc);
                xsys='2'; ysys ='2';
                x=0;  y=0;  function='move    '; output;
                x=hc; y=vc; function='draw    '; output;
        data anno1;
                set anno1 lines;

        %label(data=&out, y=slope, x=hc, text=left(put(&count,2.)),
                out=anno2, size=);
        %gskip;
        symbol v=- h=2 color=black i=none;
        axis1 label=(a=90);

        proc gplot data=&out;
                plot vc * hc  /
                        hzero vref=0 lvref=33 anno=anno1 vaxis=axis1 hminor=1 
vminor=1
                        name="&name"
                        des="Parameter change plot of &count";
                run;
        %gskip;
        proc gplot data=&out;
                bubble slope * hc =hc / 
                        vref=0 lvref=33 anno=anno2 vaxis=axis1  hminor=1 
vminor=1
                        bsize=40 bcolor=red bscale=radius
                        name="&name"
                        des="Parameter change plot of &count";

        run; quit;
%end;
%done:
        %*-- Restore global options;
        %if &sysver >= 7 %then %do;
                options &o1 &o2;
                %end;
        %else %do;
           options notes;
                %end;
%mend;

/*-------------------------------------------------------------------*
  *    Name: sort.sas                                                 *
  *   Title: Generalized dataset sorting by format or statistic       *
  *     Doc: http://www.math.yorku.ca/SCS/vcd/sort.html               *
  *-------------------------------------------------------------------*
  *  Author:  Michael Friendly            <friendly@yorku.ca>         *
  * Created: 04 Nov 98 17:17                                          *
  * Revised: 19 Nov 1998 12:26:55                                     *
  * Version: 1.1                                                      *
  *                                                                   *
  * From ``Visualizing Categorical Data'', Michael Friendly (2000)    *         
  *-------------------------------------------------------------------
*/
 /* Description:
 
 The SORT macro generalizes the idea of sorting the observations in
 a dataset to include:

 - sorting according to the values of a user-specified format.  
 With appropriate user-defined formats, this may be used to arrange the
 observations in a dataset in any desired order.

 - reordering according to the values of a summary statistic computed 
        on the values in each of serveral groups, for example, the mean 
or 
        median of an analysis variable.  Any statistic computed by 
        PROC UNIVARIATE may be used.

Usage:

 You must specify one or more BY= variables.  To sort by the value
 of a statistic, specify name the statistic with the BYSTAT= parameter,
 and specify the analysis variable with VAR=.  To sort by formatted
 values, specify the variable names and associated formats with
 BYFMT=.  
 
 If neither the BYSTAT= or BYFMT= parameters are specified,
 an ordinary sort is performed.

 The sort macro is called with keyword parameters.  The arguments
 may be listed within parentheses in any order, separated by commas.
 For example:
 
        %sort(by=age sex, bystat=mean, var=income);

(sorting observations by mean INCOME, for AGE, SEX groups) or

        proc format;
                value age   0='Child' 1='Adult';
        %sort(by=age decending sex,  byfmt=age:age.);

(sorting by the formatted values of AGE).
 
Parameters:

* DATA=         Name of the input dataset to be sorted.  The default 
is the
            most recently created data set.

* VAR=          Specifies the name of the analysis variable used for 
BYSTAT
            sorting.

* OUT=          Name of the output dataset.  If not specified, the 
            output dataset replaces the input dataset.

* BY=       Names of one or more classification (factor, grouping)
            variables to be used in sorting.  The BY= argument may
                                contain the keyword DESCENDING before a 
variable name
                                for ordinary or formatted-value sorting.  For 
BYSTAT
                                sorting, use ORDER=DESCENDING.  The BY= 
variables may
                                be character or numeric.

* BYFMT=    A list of one or more terms, of the form, VAR:FMT or
            VAR=FMT, where VAR is one of the BY= variables, and FMT is
            a SAS format.  Do not specify BYSTAT= when sorting by
                                formatted values.

* VAR=      Name of the analysis variable to be used in determining
            the sorted order.

* BYSTAT=   Name of the statistic, calculated for the VAR= variable
            for each level of the BY= variables.  BYSTAT may be the
                                name of any statistic computed by PROC 
UNIVARIATE.

* FREQ=     For BYSTAT sorting, specify the name of a frequency 
variable
            if the input data consists of grouped frequency counts.

* ORDER=    Specify ORDER=DESCENDING to sort in descending order
            when sorting by a BYSTAT.  The ORDER= parameter applies
                                to all BY= variables in this case.

Example:
 Given a frequency table of Faculty by Income, sort the faculties
 so they are arranged by mean income:
        
        %sort(data=salary, by=Faculty, bystat=mean, var=income, 
freq=count);

*/
 
%macro sort(
        data=_last_,  /* data set to be sorted           */
        out=&data,    /* output dataset                  */
        by=,          /* name of BY/CLASS variable(s)    */
        byfmt=,       /* variable:format list            */
        var=,         /* name of analysis variable       */
        freq=,        /* frequency variable, for bystat  */        
        bystat=,      /* statistic to sort by            */
        order=        /* DESCENDING for decreasing order */
        );

%if %upcase(&data) = _LAST_ %then %let data = &syslast;

%local abort;
%let abort=0;
%if %length(&by)=0 %then %do;
        %put ERROR: At least one BY= variable must be specified for 
sorting;
        $let abort=1;
        %goto done;
%end;


%*-- If there is no bystat, check byfmt;
%if %length(&bystat)=0 %then %do;
        %if %length(&byfmt)>0 %then %do;
                %sortfmt(data=&data, by=&by, byfmt=&byfmt, out=&out);
                %end;
        %else %do;      
        %*-- There is no bystat, and no byfmt: just an ordinary sort 
step;
        proc sort data=&data out=&out;
                by &order &by;
                run;
                %end;
%end;

%else %do;    /* BYSTAT sorting */
        %if %length(&var)=0 %then %do;
                %put ERROR: Exaxtly one VAR= variable must be specified for 
sorting by &bystat;
                $let abort=1;
                %goto done;     
        %end;
        %if %length(%scan(&var,2)) > 0 %then %do;
                %let ovar=&var;
                %let var = %scan(&var,1);
                %put WARNING: VAR=&ovar was specified.  Using VAR=&var;
        %end;

        %*-- Reorder according to &by variables in reverse order;
        %let rby = %reverse(&by);
        %*put Reversed by list: &rby;

   %let count=1;
   %let word = %unquote(%qscan(&rby,&count,%str( )));
        %*-- Do the first one;
        %reorder(data=&data, out=&out, var=&var, class=&word, 
bystat=&bystat,
                        order=&order, freq=&freq);
   %*-- Do the rest;
   %do %while(&word^= );
       %let count = %eval(&count+1);
       %let word = %unquote(%qscan(&rby,&count,%str( )));
                 %if &word ^= %then %do;
                        %reorder(data=&out, out=&out, var=&var, class=&word, 
bystat=&bystat,
                        order=&order, freq=&freq);
                        %end;
   %end;
 
%end;

%done:
%if &abort %then %put ERROR: The SORT macro ended abnormally.;
%mend;


 /*------------------------------------------------------------*
  *    Name: reorder                                           *
  *   Title: Sort a dataset by the value of a statistic        *
  *------------------------------------------------------------*/

%macro reorder(
        data=_last_,   /* name of input dataset                */
        out=&data,     /* name of output dataset               */
        var=,          /* name of analysis variable            */
        freq=,         /* frequency variable, for bystat  */        
        class=,        /* name of class variable for this sort */
        bystat=,       /* statistic to sort by                 */
        order=,        /* DESCENDING for decreasing order      */
        outvar=,       /* Name of output statistic variable    */
        prefix=_
        );
        
%if %upcase(&data) = _LAST_ %then %let data = &syslast;
*put REORDER: class=&class;

proc sort data=&data;
   by &class ;
        run;
        
%if %length(&outvar)=0
   %then %let outvar =  &&prefix.&class;

proc univariate noprint data=&data;
   by &class ;
   var &var;
        %if %length(&freq) %then %str(freq &freq;) ;
   output out=_stat_ &bystat = &outvar;
   run;
*proc print;


%*-- Merge the statistics with the input dataset;
data &out;
   merge &data _stat_(keep=&class  &outvar);
        by &class;

%*-- Sort them by the statistic;
proc sort data=&out;
        by &order  &outvar;
        run;
%mend;


 /*------------------------------------------------------------*
  *    Name: reverse                                           *
  *   Title: Reverse the words in a string                     *
  *------------------------------------------------------------*/

%macro reverse(string);
   %local count word result;
   %let count=1;
   %let word = %qscan(&string,&count,%str( ));
        %let result=&word;
   %do %while(&word^= );
       %let count = %eval(&count+1);
       %let word = %qscan(&string,&count,%str( ));
                 %let result = &word &result;
   %end;
   %unquote(&result)
%mend;

 /*------------------------------------------------------------*
  *    Name: sorftmt                                           *
  *   Title: Sort variables by formatted values                *
  *------------------------------------------------------------*/

%macro sortfmt(data=, by=, byfmt=, out=&data);

%let tempvar=;
%if %length(&byfmt)>0 %then %do;
        data _temp_;
                set &data;
                %let i=1;
                        %*-- terms are separated by spaces.  Each is var:fmt 
or var=fmt;
                %let term = %scan(&byfmt,1,%str( ));
                %do %while(&term^= );
        %let i = %eval(&i+1);
                        %let var = %scan(&term,1, %str(=:));
                        %let fmt = %scan(&term,2, %str(=:));
                        %if %index(&fmt,%str(.))=0 %then %let fmt=&fmt..;

                                %*-- Create surrogate variable;
                        _&var = put( &var, &fmt );

                        %let tempvar = &tempvar _&var;
                                %*-- Replace the by variable with its 
surrogate;
                        %let by = %replace(&by, &var, _&var);
                        %put var=&var fmt=&fmt by=&by;
        %let term = %scan(&byfmt,&i,%str( ));
                %end;
%*      proc print data=_temp_;
        
        proc sort data=_temp_ out=&out
                %if %length(&tempvar)>0 %then (drop=&tempvar);;
                by &by;
        %end;
%mend;

%*-- Replace a substring with a new string;
%macro replace(string, old, new);
%local i;
%let i=%index( %upcase(&string), %upcase(&old) );
%if &i=0
        %then %let result = &string;
        %else %do;
                %let len = %length(&old);
                %let pre=;
                %if &i>1 %then %let pre=%substr( &string, 1, %eval(&i-1));
                %let result = &pre.&new.%substr( &string,%eval(&i+&len));
                %end;
        &result
%mend;


/*-------------------------------------------------------------------*
 *    Name:  agree.sas                                               *  
 *   Title:  Agreement chart for n x n table                         *
 *                                                                   *
 *-------------------------------------------------------------------*
 *  Author:  Michael Friendly            <friendly@yorku.ca>         *
 * Created:  13 Mar 1991 10:23:12                                    *
 * Revised:  12 Jan 1998 09:15:33                                    *
 * Version:  1.1                                                     *
 *                                                                   *
 * From ``Visualizing Categorical Data'', Michael Friendly (2000)    *         
 *-------------------------------------------------------------------*/
 /* Description:

The AGREE program is a collection of SAS/IML modules for preparing
observer agreement charts which portray the agreement between two
raters.

Usage:

The modules are typically loaded into the SAS/IML workspace with
the %include statement.  The required input parameters are specified
with IML statements, and the AGREE module is called as follows,
 
   proc iml;
      %include 'path/to/agree.sas';
      *-- set global variables, if desired;
      font = 'hwpsl009';
      htext=1.3;

      *-- data, labels, title, weights;
      freq =
         {  5     3     0      0,
            3    11     4      0,
            2    13     3      4,
            1     2     4     14 };
      vnames = {'New Orleans Neurologist' 'Winnipeg Neurologist'};
      lnames = {'Certain' 'Probable' 'Possible' 'Doubtful'};
      title  = 'Multiple Sclerosis: New Orleans patients';
      w = 1;         *-- diagonals only   , or ...;
      w = {1 (8/9)}; *-- diagonals + 1-off;
      run agree(freq, w, vnames, lnames, title);

Parameters:

The required parameters for the RUN AGREE statement are:

* table     A square numeric matrix containing the contingency table
            to be analyzed.

* weight    A vector of one or more weights used to give ``partial
            credit'' for disagreements by one or more categories.
            To ignore all but exact agreements, let weight=1.  
            To take into account agreements one-step apart (with a 
            weight of 5/6), let weight={1 5/6}.

* vnames    A character vector of two elements, containing the names
            of the row and column variables.

* lnames    A character vector containing the names of the row and
            column categories.  If table is $n \times n$, then
            lnames should contain $n$ elements.

* title    A character string containing the title for the plot.

==Global input variables:

The program uses two global variables to determine the font and
character height for text in the agreement chart.

* font     A character string specifying the font used.
           The default is Helvetica ('hwpsl009') if a PostScript
           driver is being used, SWISS otherwise.

* htext    A numeric value specifying the height of text characters.

 */
 *-- Bangdiwala Observer Agreement Chart [SAS SUGI, 1987, 1083-1088];
 
start agree(freq, w, vnames, lnames, title )
   global (font, htext);

   if type(font ) ^= 'C' then do;
      call execute('device  = upcase("&sysdevic");');
      if index(device,'PS') > 0 then
         font= 'hwpsl009';         /* Helvetica for PS drivers */
         else font = 'SWISS';
      end;
   if type(htext ) ^= 'N' then htext=1.2;

   row_sum = freq[,+];
   col_sum = freq[+,];
   n       = freq[+,+];
   k       = nrow(freq);

   reset noname;
   print (( freq || row_sum ) //
          ( col_sum || n ) )[r=(lnames ||'Total') c=(lnames 
||'Total')];
   obs_agr = ssq( vecdiag(freq) );
   tot_agr = col_sum * row_sum ;

   call gstart;
   call gwindow( (-.15#J(1,2,n)) // (1.1#J(1,2,n)) );
   call gset('FONT', font);
   height = htext;
   call gstrlen(len,lnames,height);

   corner= { 0 0 };
   fill  = 'EMPTY';
   *-- construct marginal rectangles and locate row/col labels --;
   do s = 1 to k;
      thisbox = corner || row_sum[s] || col_sum[s];
      boxes   = boxes  // thisbox;
      fill    = fill   // 'EMPTY';
      center  = corner +((row_sum[s] || col_sum[s])
                       - (len[s]     || len[s]    ))/2;
      labelx  = labelx //(center[1] || (-.06#n)  || 0 )
                       //((-.04#n)  || center[2] || 90);
      labels  = labels // lnames[s]
                       // lnames[s];
      corner  = corner + (row_sum[s] || col_sum[s]);
      ht      = ht     // height // height;
      end;

   *-- variable names;
   height = 1.4#htext;
   call gstrlen(len,vnames,height);
   center = ((1.0#n) - len)/2;
   labelx  = labelx // ( center[1] || (-.12#n)  || 0 )
                    // ( (-.10#n)  || center[2] || 90);
   labels  = labels // vnames[1]
                    // vnames[2];
   ht      = ht     // height // height;

   *-- surrounding frame, for all observations;
   boxes = boxes // ( { 0 0 } || n || n ) ;

   corner= { 0 0 };
   *-- construct agreement squares and scores;
   q = ncol(w) - 1;
   a = J(q+1,k,0);
   *-- b indexes distance from main diagonal for agreement;
   do b = 0 to q;
   do s = 1 to k;
      agr = max(1, s-b) : min(k, s+b) ;    * cells which agree;
      dis = 1 : max(1, s-b-1) ;            * disagre;
      box_loc = choose( (s-b-1)>0,
                        (sum(freq[s,dis]) || sum(freq[dis,s])),
                        { 0 0 } );
 /*   box_size= choose( (s-b) > 0,
                        (sum(freq[s,agr] ... */
      if s=1 then corner = {0 0};
             else corner = boxes[s,1:2] + box_loc;
      thisbox = corner || sum(freq[s,agr]) || sum(freq[agr,s]);
      boxes   = boxes  // thisbox;
      if b>0 then a[b+1,s] =thisbox[3] # thisbox[4];
 
      if b=0 then fill    = fill   // 'SOLID';
      else do;
         if mod(b,2)=1 then dir='L';
                       else dir='R';
         dens = int((b+1) / 2);
         fill    = fill   // (dir + char(dens,1));
         end;
      end;
   end;
   print 'Bangdiwala agreement scores';
   part = diag(w) * A;
   weights = shape(w,0,1);
   steps = 0:q;
   BN =  1 - ( ( tot_agr - obs_agr - part[,+] ) / tot_agr );
   reset name;
   print steps weights[f=8.5] BN[f=8.4];
*  print boxes[c={'BotX' 'BotY' 'LenX' 'LenY'}] fill;
*  print labels labelx[c={X Y ANGLE}] ht;
 
   run gboxes( boxes, labels, labelx, fill, ht, title );
   call gstop;
finish;
 
*-- Draw and label the agreement display --;
start gboxes( boxes, labels, labelx, fill, ht, title )
   global ( htext );
   call gopen('AGREEMT');
   *-- locate the 4 corners of each box;
   ll = boxes[,{1 2}];
   lr = boxes[,{1 3}][,+] || boxes[,2] ;
   ul = boxes[,1] || boxes[,{2 4}][,+] ;
   ur = boxes[,{1 3}][,+] || boxes[,{2 4}][,+];

   xy = ll || ul || ur || lr;
   max = max(ur[,1]) || max(ur[,2]);
   do i=1 to nrow(boxes);
      box = shape(xy[i,], 4);
      color='BLACK';
      pat = fill[i];
      call gpoly( box[,1], box[,2], 1, color, pat, color);
   end;
 
   *-- Draw dotted diagonal line to show marginal homogeneity--;
   call gdrawl( {0 0}, max, 3 , 'RED' );
 
   do f=1 to nrow(labels);
      lxya = labelx[f,];
      labl = labels[f ];
      height = ht[f];
      call gscript( lxya[,1], lxya[,2], labl, lxya[,3], 0, height);
   end;
   height = 1.2#htext;
   call gstrlen(len, title, height);
   tx = (max[1] - len)/2;
   call gscript(tx, max[2]#1.05, title, 0, 0, height);
   call gshow;
finish;

/*-------------------------------------------------------------------*
 |    Name:  sieve.sas                                               |  
 |   Title:  Sieve diagrams for two-way tables                       |
 |                                                                   |
 | Ref: Reidwyl & Schuepbach (1983).                                 |
 |                                                                   |
 *-------------------------------------------------------------------*
 *  Author:  Michael Friendly            <friendly@yorku.ca>         *
 * Created: 14 Apr 1991 14:12:14              (c) 1991-1998          *
 * Revised: 17 Nov 2000 12:21:07                                     *
 * Version:  1.4                                                     *
 *  1.2  Add colors global variable                                  *
 *  1.3  Added filltype='OBSP' to print cell freq in cell            *
 *  1.4  Made colors consistent with mosaics                         *
 *       Default font now depends on device driver                   *
 *                                                                   *
 * From ``Visualizing Categorical Data'', Michael Friendly (2000)    *         
 *-------------------------------------------------------------------*/

 /* Description:

The SIEVE program is a collection of SAS/IML modules for
drawing sieve (or parquet) diagrams for a two-way contingency table.

Usage:

   proc iml;
   %include iml(sieve);
   run sieve( f, vnames, lnames, title );

Parameters:

The required parameters for the RUN SIEVE statement are:

* f        two-way contingency table, size r x c

* vnames   variable names, a 1x2 character matrix.
           vnames[,1]=row variable, vnames[,2]=column variable

* lnames   category names, a 2 x max(r,c) character matrix.
           lnames[1,]=row categories, vnames[2,]=column categories.

* title    plot title

Global variables:

* filltype  'OBS': fill cells in proportion to observed frequencies
            'OBSP': like OBS, but also write obs. freq. in cell
            'DEV': fill in proportion to deviations from independence.
            'EXL': no fill, write expected frequency in cell
            'EXP': expfill, write expected frequency in cell

* margins   ''   : nothing in margins
            'TOTALS': row/col totals in margins

* font      font for text

* colors    names of two colors to use for the positive and
            negative residuals.  [Default: {BLUE RED}]
*/
 
start sieve (f, vnames, lnames, title )
      global(filltype, margins, colors, gout, name, font);
   if type(filltype) ^= 'C' then filltype='OBS';
   if type(margins ) ^= 'C' then margins ='';
   if type(colors)   ^='C'  then colors= {BLUE RED};
   if type(gout    ) ^= 'C' then gout='WORK.GSEG';
   if type(name    ) ^= 'C' then name='SEIVE';
   if type(lnames  )  = 'N' then lnames = trim(left(char(lnames)));

        *-- Set default font based on device driver name;
   if type(font    ) ^= 'C' then do;
                call execute('device  = upcase("&sysdevic");');
                if index(device,'PS') > 0 then
         font= 'hwpsl009';         /* Helvetica for PS drivers */
                else /* if device='WIN' then */
                        font = 'SWISS';
                end;

   filltype = upcase(filltype);
   margins = upcase(margins);
   verbose=0;
 
   nr= nrow(f);
   nc= ncol(f);
   nrc=nr#nc;
   r = f[,+];              * row totals;
   c = f[+,];              * col totals;
   n = c[+];               * grand total;
 
   e = r * c / n;          * expected frequencies;
   d = (f - e) / sqrt(e);  * standard deviates;
   print "Observed Frequencies",
         f [r=(lnames[1,]) c=(lnames[2,]) format=7.0];
   print "Expected Frequencies",
         e [r=(lnames[1,]) c=(lnames[2,]) format=7.1];
   print "Standardized Pearson Deviates",
         d [r=(lnames[1,]) c=(lnames[2,]) format=7.2];
 
   chisq = chisq(f,e);
   df = (nr-1)#(nc-1);
   prob = 1 - probchi(chisq,df);
   print "Test of Independence",
         df ' ' chisq[r={'G.F.' 'L.R.'} format=9.3]
         prob [format=8.4];
 
   rl= (100-nr+1) # r / n;
   cl= (100-nc+1) # c / n;
   cy = 101;
   do i = 1 to nr;
      cx = 0;
      cy = cy - rl[i,] - 1;
      do j = 1 to nc;
         if j= 1 then cx = 0;
         else cx = sum(cl[,1:(j-1)]) + j-1;
         boxes = boxes // ( cx || cy || cl[j] || rl[i] );
      end;
   end;
*  print boxes[c={'BotX' 'BotY' 'LenX' 'LenY'} format=8.3];
 
   call gstart(gout);
   call gopen(name,1);
   call gwindow({ -20 -20 110 110});
 
   if margins = 'TOTALS'          /* y-values for col labels */
      then lcl = {-11 -17 -6};    /* lnames, vnames, totals  */
      else lcl = {-6  -13};
   *-- category names;
   height = 1.1;
   call gset('FONT',font);
   lrx = 15;
   run gheight(lnames,font,lrx-1,1.75, len,height);
*  print lnames len[format=8.2] ;
 
      *-- row labels;
   lry = boxes[nc#(1:nr),2] + rl/2;
   lrx = repeat((-lrx), nr, 1);
   labelx = labelx // ( lrx || lry || repeat((0||height),nr,1) );
   labels = labels // ( shape(lnames[1,], nr,1) );
 
      *-- col labels;
   lcx = boxes[1:nc,1] + (cl/2)`;
   lcy = repeat(lcl[1], nc, 1);
   labelx = labelx // ( (lcx-(len[2,1:nc]`)/2) || lcy
                   || repeat((0||height),nc,1) );
   labels = labels // ( shape(lnames[2,], nc,1) );
 
   *-- variable names;
   ht = 1.1 # height;
   call gstrlen(len,vnames,ht,font);
   center = ( 100 - len)/2;
   labelx  = labelx // (  -16.5    || center[1] || 90 || ht)
                    // ( center[2] || lcl[2]    ||  0 || ht);
   labels  = labels // vnames[1]
                    // vnames[2];
   *-- title;
   if length(title)>1 then do;
      ht = 1.8;
      run gheight(title,font,100,2.2,len,ht);
      center = ( 100 - len)/2;
      labelx = labelx // ( center || 105 ||  0 || ht );
      labels = labels // title;
   end;
 
   *-- cell frequencies;
   if filltype='OBS' then do;
      f1 = loc(f = 1);
      if ncol(f1)>0  then value = repeat({"1"}, ncol(f1), 1);
   end;
   if filltype='EXL' | filltype='EXP' then do;
      f1 = 1:nrc;
      value = compress(char(shape(e, nrc, 1),5,1));
   end;
   if filltype = 'OBSP' then do;
      f1 = 1:nrc;
      value = compress(char(shape(f, nrc, 1),5,0));
   end;

   if /*filltype^='EXP' & */ ncol(f1)>0 then do;
      ht = height - .05;
      center = boxes[f1,] * ( I(2) // .5#I(2) );
      call gstrlen(len, value,ht,font);
      center[,1] = center[,1] - len/2;
      labelx = labelx // ( center || repeat((0||ht), ncol(f1), 1) );
      labels = labels // value;
   end;
   if margins = 'TOTALS' then do;
      ht = height - .05;
      center = repeat({101},nr,1) || lry;
      labelx = labelx // ( center || repeat((0||ht),nr,1) );
      labels = labels // char(shape(r, nr, 1),4,0);
 
      value  = char( shape( c||n , nc+1, 1), 4,0 );
      call gstrlen(len, value,ht,font);
      lcx = lcx // {101};  len[nc+1,]=0;
      center = (lcx -len/2)  || repeat(lcl[3],nc+1,1);
      labelx = labelx // ( center || repeat((0||ht),nc+1,1) );
      labels = labels // value;
   end;
 
   reset fw=7;
   labelx = round(labelx,.001);
*  print labelx[c={'X' 'Y' 'Angle' 'Ht'}] labels[format=$25.];
   d    = shape(d, nrc,1);
   if filltype = 'EXL'    then fill = shape({0}, nrc, 1);
   if filltype = 'DEV'    then fill = shape( d , nrc, 1);
   if filltype = 'OBS'    then fill = shape( f , nrc, 1);
   if filltype = 'OBSP'   then fill = shape( f , nrc, 1);
   if filltype = 'EXP' then do;
      d = j(nrc,1,0);          fill = shape( e , nrc, 1);
      end;
   run gboxes(boxes, labels, labelx, fill, d );
   call gshow;
   finish;
 
start gheight(label, font, maxlen, maxht, len, height);
   *-- determine height<maxht, to not exceed maxlen in length;
   *-- returns len and height;
   call gstrlen(len,label,height,font);
   max = max(len);
   if max > maxlen | max < .80#maxlen then do;
      height = min(maxht, height # .1 # int(10 # maxlen / max));
      call gstrlen(len,label,height,font);
      end;
   finish;
 
start gboxes( boxes, labels, labelx, fill,dev);
   call gopen('sieve');
   *-- locate the 4 corners of each box;
   ll = boxes[,{1 2}];
   lr = boxes[,{1 3}][,+] || boxes[,2] ;
   ul = boxes[,1] || boxes[,{2 4}][,+] ;
   ur = boxes[,{1 3}][,+] || boxes[,{2 4}][,+];
 
   xy = ll || ul || ur || lr;
   max = max(ur[,1]) || max(ur[,2]);
   do i=1 to nrow(boxes);
      box = shape(xy[i,], 4);
      color='BLACK';
      pat = 'EMPTY';
      call gpoly( box[,1], box[,2], 1, color, pat, color);
   end;
   run fillbox(boxes,fill,dev);
   height = 1;
   do f=1 to nrow(labels);
      lxya = labelx[f,];
      ht   = lxya[,4];
      labl = labels[f ];
      call gscript( lxya[,1], lxya[,2], labl, lxya[,3], 0, ht);
   end;
   finish;
 
start fillbox(boxes, fill, sign)
      global(filltype, colors);
   *-- fill each box proportional to abs(fill);
   w = boxes[,3];
   h = boxes[,4];
   totarea = sum(h#w);
   totfill = sum(abs(fill));
   scale = 1;
   if totfill>.1 & filltype='DEV' /* ^all( fill = int(fill) ) */
      then scale = sqrt(totarea / totfill);
*  print totarea totfill scale;
 
   if filltype='DEV'         /* standardized deviations */
      then do;
      s = sqrt(100 / (abs(fill)) ) ;
      s = s + 100#(abs(fill)<2) ;
      end;
      else
      s = sqrt(h # w / (abs(fill)#scale) );    *space between lines;
   nxl = int(w/s);
   nyl = int(h/s);
*  print 'Width, Height, Square', w h s fill nxl nyl;
   do i = 1 to nrow(boxes);
      if sign[i] >= 0
      then do; line = 1; color = colors[1];   /* 'BLUE' */
      end;
      else do; line = 3; color = colors[2];   /* 'RED ' */
      end;
      if filltype='EXP' then line=34;
      if nxl[i]>0 then do;
         from  = ((boxes[i,1] + (1:nxl[i]) # s[i]))`||
                 shape(boxes[i,2], nxl[i], 1);
         to    = from[,1] || shape(sum(boxes[i,{2 4}]), nxl[i], 1);
         call gdrawl( from, to, line, color );
      end;
      if nyl[i]>0 then do;
         from  = shape(boxes[i,1], nyl[i], 1) ||
                 ((boxes[i,2] + (1:nyl[i]) # s[i]))` ;
         to    = shape(sum(boxes[i,{1 3}]), nyl[i], 1) || from[,2] ;
         call gdrawl( from, to, line, color );
      end;
   end;
   finish;
 
start chisq(obs, fit);
   *-- Find Pearson and likelihood ratio chisquares;
   gf = sum ( (obs - fit)##2 / ( fit + (fit=0) ) );
   lr = 2 # sum ( obs # log ( (obs+(obs=0)) / (fit + (fit=0)) ) );
   return (gf // lr);
   finish;


/*----------------------------------------------------------------*
 |    Name:  fourfold.sas                                         |  
 |   Title:  IML modules for fourfold display of 2x2xK tables     |
 *----------------------------------------------------------------*
 *  Author:  Michael Friendly         <friendly@YorkU.CA>         *
 * Citation: SAS/IML graphics for fourfold displays, Observations,*
 *           3rd Quarter 1994, 47-56.                             *
 * Created:  9 May 1991 19:12:12      Copyright (c) 1992-2000     *
 * Revised:  21 Apr 2000 13:16:20                                 *
 * Version:  2.0                                                  *
 * Changes:  Added tests for homogeneity and cond. independence   *
 *  1.6  Added options for colors, patterns, font                 *
 *  1.7  Added confidence rings for odds ratios                   *
 *  1.8  Added option for order of tables (down vs. across)       *
 *  1.9  Fixed problems with zero subtables and zero odds ratios  *
 *       Added std='TOTAL' ??  Added htext=                       *
 *  2.0  Added fillpat routine to select color based on Z val     *
 *       Switched colors/patterns to conform with mosaics         *
 *       Handles 3 or more factors                                *
 *       Default font now depends on device driver                *
 *       Added outstat data set containing OR, LOGOR, etc         *
 *                                                                *
 * From ``Visualizing Categorical Data'', Michael Friendly (2000) *         
 *----------------------------------------------------------------*/
/* Usage:
   proc iml;
   %include fourfold;
   run fourfold( dim, table, vnames, lnames );

Required parameters:
   dim      table dimensions: {2 2 k}
   table    two- or three-way contingency table,
            of size (2k) x 2
   vnames   variable names, 1x3 character matrix.
            vnames[,1]=column variable
            vnames[,2]=row variable
            vnames[,3]=panel variable
   lnames   category names, 3 x k character matrix.
            vnames[1,]=col categories,
            lnames[2,]=row categories,
            lnames[3,]=panel categories.

Global input variables:
   std      ='MARG' standardizes each 2x2 table to equal
            margins, keeping the odds ratio fixed (see config)
            ='MAX' standardizes each table to a maximum
            cell frequency of 100.
            ='MAXALL' standardizes all tables to max(f[i,j,k])=100.
                                ='TOTAL' standardizes to the maximum total over 
all tables.
   config   ={1 2}|{1}|{2} specifies the margins to standardize
   down     number of panels down each page
   across   number of panels across each page
   sangle   angle for side labels (0|90)
   colors   names of two colors to use for the smaller and
            larger diagonals of each 2x2 table.
   patterns names of two fill patterns.  For grayscale, use
            {SOLID SOLID} and colors={GRAYC0 GRAY80}.
        shade    shading levels, corresponding to values of the Z-value
                 for the log odds ration.
   alpha    error rate for confidence rings on odds ratios;
            0 to suppress
   conf     type of confidence rings ='Individual'|'Joint'
   font     font used for labels.
        htext    base height of text labels
   frame    line style for boxed frame (0=none)
        name     name of graphics catalog entry
        order    DOWN|ACROSS - how to arrange multiple plots on a page
        ptitle   panel title, for multiple plots.  The name(s) of the 
panel
                 variable(s) are substituted for '&V'; the levels of the
                                panel variables are substituted for '&L'. 
Default: '&V : &L'.
        outstat  name of output data set containing odds ratio, log odds 
ratio,
                 etc.
*/

goptions gunit=pct;
start fourfold(dim, table, vnames, lnames)
  global (std, config, down, across, name, sangle, colors, patterns,
          alpha, conf, font, order, odds, bounds, verbose, htext, 
frame,
                         filltype, shade, ptitle, outstat );

   if nrow(vnames)=1 then vnames=vnames`;
   if nrow(dim)=1 then dim=dim`;
   print  vnames dim '  ' lnames;
 
   *-- Check conformability of arguments --;
   f = nrow(vnames)||nrow(lnames);
   if dim[#] ^= nrow(table)#ncol(table)
      then do;
         print 'ERROR: TABLE and LEVEL arguments not conformable';
                        show dim table;
         goto done;
      end;
   if ^all(f = nrow(dim))
      then do;
         print 'ERROR: VNAMES or LNAMES not conformable with dim';
                        show dim vnames lnames;
         goto done;
      end;
 
  /*-- Set global defaults --*/
  if type(std)     ^='C' then std='MARG';
  if type(config)  ^='N' then config={1 2};
  if type(name)    ^='C' then name='FFOLD';
  if type(sangle)  ^='N' then sangle=90;
  if type(colors)  ^='C' then colors= {BLUE RED};
  if type(patterns)^='C' then patterns={solid solid};
  if type(filltype) ^= 'C' then filltype = {HLS HLS};
  if type(shade) ^= 'N'
      then shade = {2 4};            /* shading levels            */
  if type(alpha)   ^='N' then alpha=.05;
  if type(conf)    ^='C' then conf='Individual';
  if type(htext)   ^='N' then htext=2;
  if type(order)   ^='C' then order='DOWN';
  if type(verbose) ^='C' then verbose = 'NONE';
  if type(frame)   ^='N' then frame=1;        *-- line style for frame;
  if type(ptitle)  ^='C' then ptitle = '&V : &L';
  if type(outstat)  ^='C' then outstat = '';

        *-- Set default font based on device driver name;
   if type(font ) ^= 'C' then do;
                call execute('device  = upcase("&sysdevic");');
                if index(device,'PS') > 0 then
         font= 'hwpsl009';         /* Helvetica for PS drivers */
                else /* if device='WIN' then */
                        font = 'SWISS';
                end;

  nf = nrow(dim);
  if nf<3 then k=1;     * number of panels;
          else k = (dim[3:nf])[#];

        sysver=&sysver;
        if sysver<6.08 then do;
                *-  Viewports are broken in SAS 6.07;
                if type(down)    ^='N' then down=1;
                if type(across)  ^='N' then across=1;
                end;
        else do; *(;
                if type(across)    ^='N' & type(down)='N' then 
down=ceil(k/down);
        else if type(down)    ^='N' & type(across)='N' then 
down=ceil(k/across);
        else if nf>3 then do; 
                across = dim[3];
                down = (dim[4:nf])[#];
                end;
        else do;
                across = int(sqrt(k));
                down = ceil(sqrt(k));
                end;
        end; *);
   
  print 'Global Options',
         std config down across name sangle,
                        colors patterns alpha conf font order verbose htext 
frame
                        filltype shade ;

  /*-- Establish viewports --*/
  np = max(down,across);
  pd = np - (across||down);
  size = int(100 / np);
  if order='DOWN' then
     do;
        do i = 1 to across;
           px = size # ((i-1) // i) + (pd[1] # size/2);
           do j = 1 to down;
              py = 100 - (size # (j//(j-1)) + (pd[2] # size/2));
              ports = ports // shape( (px||py), 1);
           end;
        end;
     end;
     else do;
        do j = 1 to down;
           py = 100 - (size # (j//(j-1)) + (pd[2] # size/2));
           do i = 1 to across;
              px = size # ((i-1) // i) + (pd[1] # size/2);
              ports = ports // shape( (px||py), 1);
           end;
        end;
     end;
  nport=nrow(ports);
 
  if ncol(table) ^= 2 then table=shape(table,0,2);
        if nf>2 then run facnames(dim, lnames, nf, 3, ':', plab);

  run odds(dim, table, lnames);
  if k>1 then run tests(dim, table, vnames);
  if type(filltype) = 'C' then do; 
        run fillpat(fcolors, fpattern);
        * print 'Selecting colors from', fcolors fpattern;
        end;
  
  page = 0;                    * number of pages;

  pvar = '';
  if k>1 then do j=nf to 3 by -1;
        if pvar ^= '' then pvar = pvar + ':';
        pvar = pvar + trim(vnames[j]);
        end;
         
  do i=1 to k;
     r = 2#i;                  * row index, this table;
     t=table[((r-1):r),];      * current 2x2 table;
 
     /* construct top label for this panel */
     title='';
     if k > 1 & length(ptitle)>1 then do;
                title = ptitle;
                call change(title, '&V', pvar);
                call change(title, '&L', plab[i,]);
                end;
 
     /* standardize table to fit 100x100 square */
     run stdize(fit, t, table);
 
     if verbose ^= 'NONE' then do;
     print title;
     print fit[c=(lnames[1,]) r=(lnames[2,]) f=8.2] ' '
             t[c=(lnames[1,]) r=(lnames[2,]) f=8.0] ;
     end;
 
     /*-- start new page if needed --*/
     if nport=1 | mod(i,nport)=1 then do;
        call gstart;
        page = page+1;                * count pages;
        gname =rowcatc(trim(name) || char(page,2,0));
        call gopen(gname);            * name uniquely;
        end;
 
     /*-- set viewport --*/
     if nport>1 then do;
     ip = 1 + mod(i-1,nport);         * viewport number;
     port = ports[ip,];               * coordinates;
     call gport(port);
     end;
 
                colsave = colors; patsave=patterns;
                colors = fcolors[{2 1}, 1+sum(abs(odds[i,4])>shade)]`;
                patterns = fpattern[{2 1}, 1+sum(abs(odds[i,4])>shade)]`;
        *        print 'Using colors:' colors patterns;

     /*-- draw this panel, display if end-of page --*/
     call gpie2x2(fit, t, lnames, vnames, title, np, odds[i,2]);
          colors = colsave; patterns = patsave;
     if alpha>0 then run conflim(dim, t, bounds[i,], table);
     if mod(i,nport)=0 | i=k then call gshow;
   end;
   call gclose;

        done:
finish;
 
start stdize(fit, t, table) global(std, config);
  /*-- standardize table to equal margins --*/
  if std='MARG' then do;
     newtab = {50 50 , 50 50 };
          if any(t ^=0)
        then call ipf(fit,status,{2 2},newtab,config,t);
                  else fit = j(2,2,0);
  end;
 
  /*-- standardize to largest cell in EACH table --*/
  else if std='MAX' then do;
     n = t[+,+];
          if n>0 then do;
                        fit = (t/n)#200 ;
                        fit = fit# 100/max(fit);
          end;
          else fit = j(2,2,0);
  end;
 
  /*-- standardize to largest cell in ALL tables --*/
  else if std='MAXALL' then do;
          fit = t # 100 / max(table);
  end;
  
  /*-- Standardize to total in largest table --*/
  else if std='TOTAL' then do;
                tot = (shape(table[,+],0,2))[,+];
                fit = t # 200 / max(tot);
  end;
  
  else fit = t;   /* raw counts */
finish;
 
start gpie2x2(tab,freq,lnames,vnames,title,np,d)
      global(sangle, colors, patterns, font, htext, frame);
  /*-- Draw one fourfold display --*/
  t = shape(tab,1,4);     * vector of scaled frequencies;
  r = 5 * sqrt(t);        * radii of each quarter circle;
 
  /*-- set graphic window, font, and text height */
  call gwindow({-16 -16 120 120});
  call gset('FONT',font);
  ht = htext # max(np,2);
  call gset('HEIGHT',ht);
 
  /*-- set shading patterns for each quadrant */
  /* cell:[1,1] [1,2] [2,1] [2,2] */
  angle1 = { 90    0   180   270 };
  angle2 = {180   90   270   360 };
  patt   = (shape(patterns[,{1 2 2 1 2 1 1 2}],2))[1+(d>0),];
  color  = (shape(  colors[,{1 2 2 1 2 1 1 2}],2))[1+(d>0),];
        *print patt color;
 
  /*-- draw quarter circles, with color and shading */
  do i = 1 to 4;
     call gpie(50,50, r[i], angle1[i], angle2[i],
               color[,i], 3, patt[,i]);
     if int(abs(i-2.5)) = (d>0) then do;
                        rad = (r[i] + {0, 10}) / 50;
                        ang = repeat( (angle1[i]+angle2[i])/2, 2, 1 );
                        call gpiexy(xx,yy, rad, ang,{50 50}, 50);
                *       print 'Slice markers' i rad ang xx yy;
                        call gdraw(xx, yy);
                        end;
     end;
 
  /*-- draw frame and axes --*/
  call gxaxis({0 50},100,11,1,1);
  call gyaxis({50 0},100,11,1,1);
  if frame>0 then call ggrid({0 100}, {0 100}, frame);
 
  /*-- set label coordinates, angles --*/
  lx = { 50, -.5, 50, 101};
  ly = { 99, 50, -1,  50};
  ang= {  0,  0,  0,   0};
  /*-- label justification position --*/
       /*  ab  lt  bl   rt  */
  posn = {  2,  4,  8,   6};
  if vnames[1] = " " then vl1 = '';
     else vl1= trim(vnames[1])+': ';
  vl2='';
 
  /*-- are side labels rotated? --*/
  if sangle=90 then do;
     ang[{2 4}] = sangle;
     posn[ {2 4}] = {2 8};
     if vnames[2] ^= " "
        then vl2= trim(vnames[2])+': ';
     end;
  labels = (vl2 + lnames[2,1])//
           (vl1 + lnames[1,1])//
           (vl2 + lnames[2,2])//
           (vl1 + lnames[1,2]);
  run justify(lx,ly,labels,ang,posn,ht,xnew,ynew,len);
 
  /*-- write actual frequencies in the corners --*/
  cells = compress(char(shape(freq,4,1),6,0));
  lx = {  5, 95,  5, 95};
  ly = { 94, 94,  4,  4};
  ang= {  0,  0,  0,  0};
  posn={  6,  4,  6,  4};
  run justify(lx,ly,cells,ang,posn,ht,xnew,ynew,len);
 
  /*-- write panel title centered above */
  if length(title)>1 then do;
     ht=1.25#ht;
     call gstrlen(len,trim(title),ht);
     call gscript((50-len/2),112,title,,,ht);
  end;
finish;
 
start justify(x, y, labels, ang, pos, ht, xnew, ynew, len);
  /* Justify strings a la Annotate POSITION variable.
     x, y, labels, ang and pos are equal-length vectors.
     Returns justified coordinates (xnew, ynew)
  */
 
  n = nrow(x);
  call gstrlen(len,labels);
*print len labels;
  xnew = x; ynew =  y;
 
  /* x and y offset factors for each position   */
  /*       1   2   3   4   5   6   7   8    9   */
  off1 = {-1 -.5   0  -1 -.5   0  -1   -.5   0  };
  off2 = { 1   1   1   0   0   0  -1.6 -1.6 -1.6};
 
  do i=1 to n;
     if ang[i]=0 then do;
        xnew[i] = x[i] + (off1[pos[i]]#len[i]);
        ynew[i] = y[i] + (off2[pos[i]]#ht);
     end;
     else if ang[i]=90 then do;
        ynew[i] = y[i] + (off1[pos[i]]#len[i]);
        xnew[i] = x[i] - (off2[pos[i]]#ht);
     end;
  call gscript(xnew[i], ynew[i], trim(labels[i]), ang[i]);
  end;
  len = round(len,.01);
finish;
 
start odds(dim, table, lnames)
      global(alpha, conf, odds, bounds, outstat);
  /*-- Calculate odds ratios for 2x2xK table --*/
  free odds bounds;

  nf = max(ncol(dim), nrow(dim));
  if nf<3 
        then do;  k = 1;       rl='';  end;
   else do;  
                k = (dim[3:nf])[#];
                run facnames(dim, lnames, nf, 3, ':', rl);
                end;

  do i=1 to k;
     r = 2#i;
     t=table[((r-1):r),];
          if any(t=0) then t=t+0.5;
     d = (t[1,2]||t[2,1]);
     or = or // ( t[1,1]#t[2,2])/ ((d + .01#(d=0))[#]);
     selogor = selogor // sqrt( sum( 1 / (t + .01#(t=0)) ) );
     end;
  logor = log(or);
  z = log(or)/selogor;
  prz = 2#(1 - probnorm(abs(z)));
  odds = or || logor || selogor || z || prz;
 
  title= 'Odds (' + trim(lnames[1,1]) + '|' + trim(lnames[2,1])
         + ') / ('+ trim(lnames[1,1]) + '|' + trim(lnames[2,2]) + ')';
  reset noname;
  print title;
  cl={'Odds Ratio' 'Log Odds' 'SE(LogOdds)' 'Z' 'Pr>|Z|'};
  print odds[r=rl c=cl format=9.3];

  if outstat ^= '' then do;
                vn = 'rl or logor selogor z prz';
                call execute('%let outstat=', outstat, ';');
                call execute("create &outstat var{", vn,  "};");
                append;
     end;
 
  /* Find confidence intervals for log(odds) and odds            */
  if nrow(alpha)>1 then alpha=shape(alpha,1);
  cl={'Lower Odds' 'Upper Odds' 'Lower Log' 'Upper Log'};
  if substr(conf,1,1)='I'
     then conf='Individual';
     else conf='Joint';
  do i=1 to ncol(alpha);
     if conf='Individual'
        then pr=alpha[i];
        else pr= 1 - (1-alpha[i])##(1/k);
     if pr>0 then do;
      z = probit(1 - pr/2);
      ci= (odds[,2] * j(1,2)) + (z # selogor * {-1 1});
      co= exp(ci);
      bounds = bounds || co;
      ci= co || ci;
      print conf 'Confidence Intervals, alpha=' pr[f=6.3]  ' z=' 
z[f=6.3  ],
            ci[r=rl c=cl f=9.3];
      end;
     end;
  reset name;
finish;
 
start tests(dim,table, vnames);
        dm = dim;
        vn = vnames;
        nf = max( nrow(dm), ncol(dm) );
        if nf>3 then do;
                dm[3] = (dim[3:nf])[#];  dm=dm[1:3];
                vn[3] = rowcat( trim(shape(vn[3:nf],1)) );
                end;

  config = {1 1 2,           /*  Test homogeneity of odds ratios */
            2 3 3};          /*  (no three-way association)      */
  call ipf(m, status, dm, table, config);
  chisq = sum ( (table - m)##2 / ( m + (m=0) ) );
  chisq = 2 # sum ( table # log ( (table+(table=0)) / (m + (m=0)) ) );
  df  = dm[3] - 1;
  prob= 1- probchi(chisq, df);
  out = chisq || df || prob;
  test={'Homogeneity of Odds Ratios'};
  print 'Test of Homogeneity of Odds Ratios (no 3-Way Association)',
         test chisq[f=8.3] df prob[f=8.4];
 
  config = {  1 2,           /*  Conditional independence */
              3 3};          /*  (Given no 3-way)         */
  call ipf(m, status, dm, table, config);
  chisq = 2 # sum ( table # log ( (table+(table=0)) / (m + (m=0)) ) )
          - chisq;
  df  = 1;
  prob= 1- probchi(chisq, df);
 
  *-- CMH test (assuming no 3-way);
  free m;
  k = dm[3];
  do i=1 to k;
     r = 2#i;                  * row index, this table;
     t=table[((r-1):r),];      * current 2x2 table;
     n = n // t[1,1];
     s = t[+,+];
     m = m // t[+,1] # t[1,+] / (s + (s=0));
     v = v // (t[+,][#]) # (t[,+][#]) / (s#s#s-1);
     end;
  cmh = (abs(sum(n) - sum(m)) - .5)##2 / sum(v);
 
  chisq = chisq // cmh;
  df = df // df;
  prob = prob // (1 - probchi(cmh,1));
  out = out // (chisq || df || prob);
  test = {'Likelihood-Ratio','Cochran-Mantel-Haenszel    '};
  head = 'Conditional Independence of '+trim(vn[1])
         +' and '+trim(vn[2])+' | '+trim(vn[3]);
  reset noname;
  print head, '(assuming Homogeneity)';
  reset name;
  print test chisq[f=8.3] df prob[f=8.4];
finish;
 
start conflim(dim, t, bounds, table)
      global(std);
  do l=1 to ncol(bounds);
  run limtab(dim, t, bounds[l], tbound);
     *-- standardize the fitted table the same way as data;
  run stdize(fit, tbound, table);
  s = shape(fit,1,4);     * vector of scaled frequencies;
  r = 5 * sqrt(s);        * radii of each quarter circle;
  angle1 = { 90    0   180   270 };
  angle2 = {180   90   270     0 };
  pat = 'EMPTY'; color='BLACK';
  do i = 1 to 4;
     call gpie(50,50, r[i], angle1[i], angle2[i],
               color, 3, pat);
     end;
  end;
finish;
 
start limtab(dim, t, bounds, tbound)
      global(std);
  /* find 2x2 tables whose odds ratios are at the limits of the
     confidence interval for log(odds)=0 */
     odds = bounds[1];
     a = sqrt(odds)/(sqrt(odds)+1);
     b = 1 - a;
     ltab =  (a || b) // (b || a);
     *-- construct a fitted table with given odds ratio,
         but same marginals as data;
     config = {1 2};
     call ipf(tbound,status,{2 2},t,config,ltab);
  *  print odds[f=7.3] ltab[f=6.3] t[f=5.0] tbound[f=8.2];
finish;

start fillpat(fcolors, fpattern)
      global(filltype, shade, colors);
   *-- Set colors and patterns for shading fourfold areas;
   ns = 1+ncol(shade);
*       ns  = 2;
   if ns<5
      then dens=char({1 3 4 5}[1:ns]`,1,0);
      else dens={'1' '2' '3' '4' '5'};
   fpattern = J(2,ns,'EMPTY   ');
   if ncol(colors)=1 then colors=shape(colors,1,2);
   colors = substr(colors[,1:2] + repeat('       ',1,2),1,8);
   fcolors = repeat(colors[1],1,ns) // repeat(colors[2],1,ns);
 
   if ncol(filltype)=1 then      /* if only one filltype */
      do;
         if filltype='M0'  then filltype={M0 M90};
         else if filltype='M45' then filltype={M135 M45};
                        else if filltype='LR' then 
filltype=cshape(filltype,2,1,1);
         else filltype=shape(filltype,1,2);
      end;
 
   do dir=1 to 2;                 * for + and - ;
      ftype = upcase(filltype[dir]);      * fill type for this 
direction;
      if substr(ftype,1,1)='M' then
         do;
            angle = substr(ftype,2);
            if angle=' ' then angle='0';
            if ns=2 then fpattern[dir,] = {M1N M1X}+angle;
               else fpattern[dir,] = {M1N M3N M3X M4X 
M5X}[1:ns]`+angle;
         end;
      else if ftype='L' | ftype='R' then
            fpattern[dir,] = trim(ftype)+dens;
      else if substr(ftype,1,4)='GRAY' then
         do;
            if length(ftype)=4
               then step=16;
               else step = num(substr(ftype,5));
            dark = 256 - step#(1:ns)`;
            *-- convert darkness values to hex;
            fcolors = substr(fcolors,1,max(6,length(fcolors)));
            fcolors[dir,] = compress('GRAY'+hex(dark)`);
            fpattern[dir,] = repeat('SOLID',1,ns);
         end;
      else if substr(ftype,1,3)='HLS' then
         do;
         /* lightness steps for varying numbers of scale values */
         step={ 0 100 0 0 0,
               0 90 100 0 0,
               0 25 70 100 0,
               0 20 45 70 100}[ns-1,];
         /* make step=100 map to '80'x, =0 map to 75% of the way to 
'FF' */
         step =  25 + 0.75#step;
         step = step[1:ns]#255/100;
                        dark = 255-ceil(step/2);
         lval= hex(dark)`;

         clr = upcase(colors[dir]);
         hue = {RED GREEN BLUE MAGENTA CYAN YELLOW,
               '070' '100' '001' '03C' '12C' '0B4'  };
         col = loc(hue[1,]=clr);
         if ncol(col)=0 then col=dir;
         hval= hue[2,col];
          fcolors[dir,] =  compress('H'+hval+lval+'FF');
         fpattern[dir,] = repeat('SOLID',1,ns);
         end;
      end;
   finish;
 
start hex(num);
   *-- Convert 0<num<256 to hex;
   chars = cshape('0123456789ABCDEF',16,1,1);
   if num>255 then num = mod(num,256);
   h = rowcat(chars[1+floor(num/16),] || chars[1+mod(num,16),]);
   return(h);
   finish;
   
*-- Construct factorial combinations of level names;
start facnames (levels, lnames, first, last, sep, rl);
        free rl;
        do i=first to last by sign(last-first)+(first=last);
                cl = lnames[i,1:levels[i]]`;
                if i=first then rl=cl;
                else do;        /* construct row labels for prior factors 
*/
                        ol = repeat(rl, 1, levels[i]);
                        ol = shape(ol,  levels[i]#nrow(rl), 1);
                        nl = shape( (cl[1:(levels[i])]), nrow(ol), 1);
                        rl = trim(rowcatc(ol || shape(sep,nrow(ol),1) || 
nl));
                        end;
                end;
        finish;

/*--------------------------------------------------------------------*
 |    Name:  mosaics.sas                                              |  
 |   Title:  IML modules for general n-way mosaic display             |
 |                                                                    |
 |  This program defines the modules. Use mosaicm.sas to install them |
 |  in a SAS/IML storage catalog.                                     |
 *--------------------------------------------------------------------*
 | Original documentation:  ``Users Guide to MOSAICS: A SAS/IML       |
 |   program for Mosaic Displays'', Dept of Psychology Report 206     |
 | For current version, see:                                          |
 |    http://www.math.yorku.ca/SCS/mosaics/mosaics.html               |
 *--------------------------------------------------------------------*
 *  Author:  Michael Friendly              <friendly@YorkU.ca>        *
 * Created:  11 Aug 1990 10:27:11          (c) 1990-1998              *
 * Revised:  15 Jul 1999 14:59:29                                     *
 * Version:  3.6                                                      *
 * Changes:                                                           *
 *   Added fittype='PARTIAL'                                          *
 *   Stop fitting after last plot                                     *
 * (2.7)                                                              *
 *   Filltype changed to allow separate coding for + and - residuals  *
 *     and grayscale shading (filltype='GRAY')                        *
 *   Colors made global                                               *
 * (2.9)  Added cellfill to print residual symbol                     *
 * (3.0)  Added MARKOVk fittype; fit equiprobability model for f=1    *
 * (3.1)  Added readtab routine to read freq, labels from a SAS       *
 *     dataset; devtype='FT' for Freeman-Tukey residuals;             *
 * (3.2)  Added handling of structural zeros                          *
 *     Changed default values to filltype=HLS, colors={BLUE RED}      *
 * (3.3)  Added gskip module, for EPS output. Added &X2 for title     *
 *     Added makemap stub, fuzz value for 0 residuals                 *
 * (3.4)  Added vlabels control, fuzz now sets line style solid.      *
 *     Global variables in separate module to make changing defaults  *
 *     easier.  In transpos module, can specify the variable names    *
 *     in the new order, rather than indices. Same for config.        *
 *     Added JOINTk and CONDITk models (1<=k<=n)                      *
 * (3.5)  Fixed conflict between global var DEVTYPE and macro var     *
 *     Changed circle blanking for CELLFILL to white/black text       *
 *     Added control of threshold for CELLFILL                        *
 *     Added calculation of adjusted residuals                        *
 *     Default font now depends on device driver                      *
 *     Added NAME global for grseg graph names; fixed adj res bug     *
 *     Added CELLFILL='FREQ' to display cell frequency                *
 *     Added ABBREV=# to abbreviate variable names in model           *
 * (3.6)  Added outstat global variable to generate output data set   *
 *     'reorder' changed to 'transpos'                                *
 *--------------------------------------------------------------------
*/
/* Usage:
  proc iml;
   reset storage=mosaic.mosaic;
   load module=_all_;
   shade = {   }; space = {   }; verbose = ;  see global inputs
   run mosaic(levels, table, vnames, lnames, plots, title);

 where:
   levels    Vector of number of levels of each factor.
   table     Table of cell frequencies, as in IPF: first factor
             varies most rapidly along cols, last factor varies
             most slowly down the rows.  If elements of table are
             a single variable in a SAS data set (e.g., output from
             PROC FREQ), with factors={A B C}, the rows (obs.)
             should be sorted BY C B A to obtain IPF ordering.

   vnames    Vector of factor names, order corresponding to levels
   lnames    Matrix of level names: rows=factors, cols=max(levels)
             lnames[i,1:levels[i]] gives level names for factor i.

   plots     List of margins to be plotted: vector of any of the
             integers 1 - ncol(levels).  If plots contains  i  the
             var1 x var2 x ... var i margin will be plotted.
   title     Character string(s) title for plots.  If title is a
             vector, then title[i] is used for plots = i.  If title
             contains '&MODEL', the model fitted is substituted.

Global input variables:
   colors    Colors used for positive and negative residuals.
             Default: {BLUE RED}.
   config    IPF-style model configuration for fittype='USER'.
             Ignored for other fittypes.
   devtype   Type of deviations to be represented by shading. 'GF'
             calculates components of Pearson goodness of fit
             chisquare; 'LR' calculates components of likelihood
             ratio chisquare. 'FT' calculates Freeman-Tukey
             residuals.  devtype='GF' is the default.
   fittype   Type of sequential models to fit: JOINT,MUTUAL,PARTIAL,
             CONDIT, or USER.  If fittype='USER', specify the model
             in the matrix config.
   filltype  Type of fill pattern to use for shading. A vector of
             one or two character strings. filltype[1] is used for
             positive residuals; filltype[2], if present, is used
             for negative residuals.
               'LR'  --> patterns Ld, Rd, where d=density value
               'M0'  --> patterns MdN0, MdN90
               'M45' --> patterns MdN135, Md45 [default]
               'GRAY'--> patterns GRAYnn
               'HLS' --> solid HLS colors, varying in lightness
   cellfill  What to write about residuals in cells with large ones.
                'NONE' --> nothing
                                         'FREQ' --> cell frequency
                'SIGN' --> draws + or - symbols, # = shading level
                'SIZE' --> draws + or - symbols, size ~ shading
                'DEV'  --> write residual value
   htext     Height of text labels [default: 1.3]
   font      Font for text labels [default: DUPLEX]
   legend    Orientation of legend for shading of residual values
             in mosaic tiles. Possible values are 'H', 'V', and
             'NONE'. Default: 'NONE'.
   order     Specifes whether/how to do a correspondence analysis
             on each marginal subtable.  See the Users Guide.
   outstat   Name of output data set
   shade     Vector of up to 5 values of abs(dev[i]) for boundaries
             between shading levels. If shade={2 4}; (the default),
             then shading density is:
                  0 <= |dev[i]| < 2  ->  0 (empty)
                  2 <= |dev[i]| < 4  ->  1
                  4 <= |dev[i]|      ->  2
             Use shade= a big number to suppress all shading.
   fuzz      Values |dev[i]|< fuzz are outlined in black.
   space     2-vector of x,y: amount of plotting area reserved for
             spacing. Typically {20 20}.
   verbose   Controls verbose or detailed output: 'FIT' and/or 'BOX'
   zeros     A 0/1 matrix of the same size/shape as table where 0
             indicates that the corresponding value in table is to
             be treated as missing or a structural zero.
 */
 
*title 'SAS/IML modules for mosaic displays';
*version(6.06);  *-- Requires SAS Version 6.06 or later;
 
start mosaic(levels, table, vnames, lnames, plots, title)
      global(config, devtype, fittype, filltype, shade, space, split,
             legend, colors, htext, font, verbose, cellfill, order,
                                 zeros, fuzz, vlabels, name, abbrev, outstat);
 
   print / '+-------------------------------------------+',
           '|  Generalized Mosaic Display, Version 3.5  |',
           '+-------------------------------------------+';
        if rowcatc(type(levels)||type(table)||type(vnames)||type(lnames)) 
^= 'NNCC'
                then do;
         print 'ERROR: One or more arguments are not defined or of the 
wrong type';
                        show levels table vnames lnames;
         goto done;
      end;
                
   if nrow(vnames)=1 then vnames=vnames`;
   if nrow(levels)=1 then levels=levels`;
   print title, vnames levels '  ' lnames;
 
   *-- Check conformability of arguments --;
   f = nrow(vnames)||nrow(lnames);
   if levels[#] ^= nrow(table)#ncol(table)
      then do;
         print 'ERROR: TABLE and LEVEL arguments not conformable';
                        show levels table;
         goto done;
      end;
   if ^all(f = nrow(levels))
      then do;
         print 'ERROR: VNAMES or LNAMES not conformable with LEVELS';
                        show levels vnames lnames;
         goto done;
      end;
 
        run globals;    
   print 'Global options', fittype devtype filltype split
                           shade[f=3.0] colors ,htext
                                                                        font legend 
cellfill verbose fuzz;
   if fittype='USER' then do;
      if type(CONFIG) = 'C' then
                        config = name2num(config, vnames);
      if type(CONFIG) = 'N' then do;
         print 'Fitting user specified model', CONFIG;
                        end;
      else do;
                        print "CONFIG must be specified for fittype='USER'";
                        goto done;
         end;
   end;
        
   f = nrow(levels);
   whichway = num(translate(split,'10','HV'));
   dir = shape(whichway,f,1);
   if type(space) ^= 'N'
      then space = 10# (sum(dir=0) || sum(dir=1));
   savspace = space;
 
   call gstart;
   *-- divide the plot into boxes  --;
   run divide(levels, table, vnames, lnames, plots, dir, title);
   call gstop;
   space = savspace;
   done:
   finish;
 
start globals;
   *-- Check global inputs, assign default values if not assigned --;
   *   If you dont like these default values, change them;
   if type(filltype) ^= 'C' then filltype = {HLS HLS};
   if type(fittype) ^= 'C'
      then fittype = 'JOINT';
   if type(devtype) ^= 'C'
      then devtype = 'GF';           /* type of deviations: GF,LR */
   if type(shade) ^= 'N'
      then shade = {2 4};            /* shading levels            */
   if type(split) ^= 'C'
      then split = {H V};            /* divide H V H V ...        */
   if type(htext) ^= 'N'
      then htext = 1.4;              /* height of text labels     */
   if type(colors) ^= 'C'
      then colors= {BLUE RED};
   if type(legend) ^= 'C'
      then legend= 'NONE';
   if type(cellfill) ^= 'C'
           then cellfill = 'NONE'; 
   if type(fuzz) ^= 'N'
      then fuzz = .20;               /* fuzz value for 0 residuals */
        if type(order) ^= 'C' then order='NONE';
   if type(verbose) ^='C'
      then verbose = 'NONE';         /* verbose output?           */
        if type(vlabels) ^= 'N'
                then vlabels = 2;              /* variable labels up to 2 
vars */
        if type(name) ^= 'C' then name='MOSAIC';
        if type(abbrev) ^= 'N' then abbrev=0;
        if type(outstat) ^= 'C' then outstat='';
 
        *-- Set default font based on device driver name;
   if type(font ) ^= 'C' then do;
                call execute('device  = upcase("&sysdevic");');
                if index(device,'PS') > 0 then
         font= 'hwpsl009';         /* Helvetica for PS drivers */
                else /* if device='WIN' then */
                        font = 'SWISS';
                end;

        fittype = upcase(fittype);
        devtype = upcase(devtype);
        cellfill= upcase(cellfill);
        colors  = upcase(colors);
        order   = upcase(order);
        split   = upcase(split);
        verbose = upcase(verbose);
        finish; 

start divide(levels, table, vnames, names, plots, dir, title)
      global(config, devtype, fittype , shade, space, verbose, order, 
zeros,
                abbrev);
   *-- start with origin in lower left corner --;
   length= {100 100};                /* x,y length of box area */
   boxes = {0 0}                     /* lowerleft  x,y  */
         ||( length - space );       /* length     x,y  */
   factors = nrow( levels );

        *-- structural zeros or missing values?;
        if type(zeros) = 'U'
                then zeros = j(1,levels[#]);
                else zeros = shape(zeros,1, levels[#]);
        miss = loc(table = .);
        if ncol(miss)>0 then table[miss] = 0;
        miss = loc( row((table = .)) | (zeros = 0) );
        missing = (ncol(miss) > 0);
        tab = shape(table, 1, levels[#]);
        if missing then tab[miss] = 0;

   rows = 1;                 /* number of rows in margins */
   do f = 1 to factors while (f <= max(plots) );
      whichway =dir[f];
      cols = levels[f];      /* number of cols in margins */
      reset noname;
      print  "Factor:" f (vnames[f]);
      reset name;
      mconfig = (f:1)`;       * n sub f, (f-1), ..., 1 ;
      call marg(loc,margin,levels,tab, mconfig);    * margin, with 
zeros;
      call marg(loc,mtab,levels,table,mconfig);     * margin, original 
data;
      call marg(loc,mzero,levels,zeros,mconfig);    * marginal zeros;
                mzero = mzero > 0;

      *-- Construct row and column labels for marginal table;
      cl = names[f,];
      if f=1 then rl='';
         else do;        /* construct row labels for prior factors */
            ol = repeat(rl, 1, levels[f-1]);
            ol = shape(ol,  levels[f-1]#nrow(rl), 1);
            nl = repeat((names[f-1,1:(levels[f-1])])`, nrow(rl));
            rl = concat(ol,nl);
         end;
      margin = (shape(margin, rows)) ;
      mzero  = (shape(mzero, rows)) ;
 
      if f=1 then do;
                *       dev = J(1,ncol(margin),0);
                        *-- fit equiprobability model;
                        fconfig = 1;
                        model = compress("(="+vnames[1]+")");
                        fit = margin[+]/ncol(margin);
                        fit = repeat(fit, 1, ncol(margin));
                        dev = (margin - fit) / sqrt(fit);
                        end;
      else do;
         if fittype='JOINT' then do;
            fconfig = (1 : (f-1))`  || shape(f, f-1, 1, 0);
            dim = ncol(margin) || rows;
            call ipf(fit,status,dim,margin,{1 2},mzero);
                                run mfix(mzero,fit,margin,mtab);
         end;
         else if substr(fittype,1,5)='JOINT' then do;
            if f=2 then fconfig= 1:2;
            else do;
                                        if length(fittype)=5
                                                then k=f;
                                                else k = num(substr(fittype,6));
                                        fconfig = remove(1:f, k)` || shape(k, f-
1, 1, 0);
                                        end;
            run mfit(margin, fconfig, levels, f, fit, status);
                                run mfix(mzero,fit,margin,mtab);
         end;
         else if fittype='MUTUAL' then do;
            fconfig = 1:f;
            dim = levels[f:1];
            call ipf(fit,status,dim,margin,fconfig,mzero);
                                run mfix(mzero,fit,margin,mtab);
         end;
         else if substr(fittype,1,6)='CONDIT' then do;
            if f=2 then fconfig= 1:2;
            else do;
                                        if length(fittype)=6
                                                then k=f;
                                                else k = num(substr(fittype,7));
                                        fconfig = remove(1:f, k) // j(1, f-1, k);
*                                       fconfig = (1 : (f-1)) // j(1, f-1, k);
                                        end;
            run mfit(margin, fconfig, levels, f, fit, status);
                                run mfix(mzero,fit,margin,mtab);
         end;
         else if fittype='PARTIAL' then do;
            if f=2 then fconfig= 1:2;
                   else fconfig =(1:2) // ((3:f)`|| (3:f)`);
            run mfit(margin, fconfig, levels, f, fit, status);
                                run mfix(mzero,fit,margin,mtab);
         end;

         else if substr(fittype,1,6)='MARKOV' then do;
                                *-- determine order of Markov chain requested;
            if length(fittype)=6
               then k=1;
               else k = num(substr(fittype,7));
                                if factors < (k+2) then do;
                                        print 'Warning: Not enough factors for 
order' k 'Markov chain';
                                        if f=2 then fconfig= 1:2;
                                                                else fconfig =(1:(f-1)) 
// (2:f);
                                end;
                                else do;
                                        if f <= (k+1) then fconfig= 1:f;
                                        else do;
                                                free fconfig;
                                                do i=1 to k+1;
                                                fconfig = fconfig // (i:(f-k+i-1));
                                                end;
                                        end;
                                end;
            run mfit(margin, fconfig, levels, f, fit, status);
                                run mfix(mzero,fit,margin,mtab);
         end;

         else if fittype='USER' then do;
            if f=factors then fconfig=config;
                         else fconfig = reduce(config, f);
            run mfit(margin, fconfig, levels, f, fit, status);
                                run mfix(mzero,fit,margin,mtab);
         end;
         if status[1]^=0 then
            print "IPF ended abnormally", status[c={Error MaxDev 
Iterations}];
      end;
 
                *-- Calculate residuals --;
                if index(devtype, 'GF')              /* Pearson residuals       
*/
                        then do;
                                dtyp='GF';
                                dev = (margin - fit)/sqrt(fit + (0=fit));
                                end;
                        else if index(devtype,'LR')       /* likelihood ratio 
resids */
                        then do;
                                dtyp='LR';
                                dev =    sign(margin-fit) #
                                        sqrt(2#( margin # 
(log((margin+(margin=0))
                                                        / (fit+(fit=0))))
                                                        -margin+fit     ));
                                end;
                                                                                        
        /* Freeman-Tukey resids    */
                        else do;
                                dev = sqrt(margin) + sqrt(margin+1)
                                - sqrt(4#fit + 1);
                                dtyp='FT';
                                end;
                if (index(devtype,'ADJ') | index(devtype,'STD')) & 
any(plots=f) then do;
*                       print dtyp 'Residuals',
                                        dev[r=rl c=cl format=8.2];
                        run adjres(levels[1:f], fconfig, shape(fit`,0,1), 
shape(dev`,0,1), adj);
                        dev = t(shape(adj,0,nrow(fit)));
                        end;

                *-- Chisquare for current model --;
                chisq = chisq(margin,fit);
                df = df(levels[1:f,], fconfig) - sum(mzero=0);
                if df>0 & all(chisq>.0001)
                        then prob = 1 - probchi(chisq,df);
                        else prob = 1;
                if f>1 then run modname(fconfig,vnames,model);
      if any(f=plots) | any(verbose = 'CHISQ') then do;
                        print model
                                        df ' ' chisq[c={'ChiSq'} r={'G.F.' 
'L.R.'} format=9.3]
                                        prob [c={'Prob'} format=8.4];
                        end;

                *-- print fit and deviations matrices;
                reset noname;
                if any(verbose = 'FIT') then do;
                        print 'Marginal totals' ,
                                        margin[r=rl c=cl format=8.0];
                        print 'Fitted frequencies' ,
                                        fit[r=rl c=cl format=8.2];
                        end;
                if prob < .999 & any(f=plots) then do;
                        ltx = {'Pearson residuals' ,
                                        'Likelihood Ratio residuals',
                                        'Freeman-Tukey residuals'}[loc(dtyp={GF 
LR FT})];
                        if (index(devtype,'ADJ') | index(devtype,'STD'))
                                then ltx = 'Adjusted ' + ltx;
                        print (ltx),
                                        dev[r=rl c=cl format=8.2];
                        end;

                *-- Correspondence analysis to reorder this factor?;
                if f>1 & order ^= 'NONE' & missing=0
                        then run corresp( margin, dev, rl, cl);

      *-- Calculate proportions for each row over row totals, allowing
          for 0 margins;
      mar = row( margin );
      margin = margin / ( ( margin[,+] + (0=margin[,+]) )
                            * J(1,levels[f]) );
      if any(verbose = 'FIT') then
         print 'Marginal proportions',
            margin[r=rl  c=cl format=8.4];
 
      *-- Divide boxes for this factor;
      run divide1(levels, margin, boxes, f, whichway);
      reset name;
      if any(verbose = 'BOX') then do;
         bl = shape( (rl || J(rows,cols-1,' ')),rows#cols,1);
         print boxes[r=bl c={'BotX' 'BotY' 'LenX' 'LenY'} format=8.2];
      end;
 
      run  space( levels, boxes, f, dir );
      run labels( levels, vnames, names, f, dir, boxes, labelx, labels 
);
 
      if any(verbose = 'BOX') then
         print labelx[r=labels c={x y 'Angle' 'Height' 'Width'}] ;
 
      *-- display the mosaic for current margins ;
      if any(f=plots) then do;
         if nrow(title) < f
            then titl = title[nrow(title),];
            else titl = title[f,];
         if index(titl, '&MODEL') > 0
            then do;
               modl = compress(model," ,");
               modl = translate(modl,"()()","{}[]");
               call change(titl, '&MODEL', modl);
            end;
         if index(titl, '&G2') > 0
            then do;
               modl = 'G2 (' + trim(left(char(df,4,0))) + ') = '
                                                        + 
trim(left(char(chisq[2],10,2))) ;
               call change(titl, '&G2', modl);
            end;
         else if index(titl, '&X2') > 0
            then do;
               modl = 'X2 (' + trim(left(char(df,4,0))) + ') = '
                                                        + 
trim(left(char(chisq[1],10,2))) ;
               call change(titl, '&X2', modl);
            end;

         *-- ravel dev by rows, so positions match new boxes;
         dev = row( dev );
         run gboxes( boxes, labels, labelx, dev, f, titl, mar );
                        run makemap( boxes, labelx, names, levels, dev, mtab, 
fit, f);
         run gskip;
         end;
 
                mod = model;
                run outstat( vnames, names, levels, dev, mtab, fit, f, 
mod);
      *-- prepare for next factor;
      rows = rows * levels[f];
   end; /* do f=1 to factors */
   finish;
 
start mfit(margin, config, levels, f, fit, status);
   *-- Fit the model given by config to marginal in margin;
                mconfig = (f:1)`;       * n sub f, (f-1), ..., 1 ;
                dim = levels[f:1];
                rows = nrow(margin);
                
                *--turn margin inside-out to match config;
                call marg(loc,cmargin,dim,margin,mconfig);
                call ipf(cfit,status,levels[1:f],cmargin,config);

                *--turn fit inside-out to match margin;
                call marg(loc,fit,levels[1:f],cfit,mconfig);
      fit = shape(fit,rows);
   finish;

start mfix(mzero,fit,margin,mtab);
        *-- Fix up fitted values and margin if missing or structural 
zeros;
*       if missing=0 then return;
        miss = loc(mzero = 0);
        if ncol(miss) > 0 then do;
                fit[miss] = mtab[miss];
                margin[miss] = mtab[miss];
                end;
        finish;
         
start divide1( levels, margin, boxes, f, whichway );
   *-- Divide each old box in proportion to the values in row i
       of margins;
   oldbox = boxes;           /* save previous box locations        */
   ngp = nrow(margin);       /* number of previous marginal totals */
   nit = ncol(margin);       /* number of divisions of each such   */
   free boxes;               /* set up to append new divisions     */
 
   do i= 1 to ngp;                * box we are currently dividing;
      *-- get coordinates of box to divide;
      cx = oldbox[i,1];
      cy = oldbox[i,2];
      lx = oldbox[i,3];
      ly = oldbox[i,4];
      marg = margin[i,];
      p = cusum(marg);            /* cumulative proportions */
      p = 0//shape(p,nit-1,1);
      if whichway=1               /* dividing horizontally  */
         then do;
           thisbox = repeat(cx,nit,1)             ||
                     repeat(cy,nit,1) + ( ly # p) ||
                     repeat(lx,nit,1)             ||
                     repeat(ly,nit,1) # marg` ;
         end;
         else do;                 /* dividing vertically    */
           thisbox = repeat(cx,nit,1) + ( lx # p) ||
                     repeat(cy,nit,1)             ||
                     repeat(lx,nit,1) # marg`     ||
                     repeat(ly,nit,1)     ;
         end;
      boxes = boxes // thisbox ;
   end;
   finish;
 
start space( levels, boxes, f, dir )
      global( verbose, space );
   factors= nrow(levels);
   which  = dir[f];
   *-- determine space available to each variable ;
   ndir   = sum( dir=0 ) || sum ( dir=1 );       * # splits each way;
   vspace = space / ndir;
 
   loc = 1 + which;    /* 1:xspace  2:yspace  */
   units = 1;
   do i = 1 to f;
      if which=dir[i] then units = units * levels[i];
      end;
   units = units-1;
   rows = nrow(boxes) / levels[f];
   scale= J(rows,1) * (0:(levels[f]-1));
   unit =  vspace[loc] / units;
   coord= shape(boxes[,loc], rows, levels[f]);
   coord= coord + scale # unit ;
   boxes[,loc] = shape(coord,nrow(boxes));
   space[loc]  = space[loc] - vspace[loc];
   if any(verbose = 'BOX') then print space vspace units  unit;
   finish;
 
start gboxes( boxes, labels, labelx, dev, fac, title, freq )
      global( shade, verbose, filltype, htext, font, colors, 
                        legend, cellfill, fuzz, name );
   *-- Draw and label the mosaic display --;
   call gopen(trim(name)+char(fac,1,0),0,title);
   *-- locate the 4 corners of each box;
   ll = boxes[,{1 2}];
   lr = boxes[,{1 3}][,+] || boxes[,2] ;
   ul = boxes[,1] || boxes[,{2 4}][,+] ;
   ur = boxes[,{1 3}][,+] || boxes[,{2 4}][,+];
 
   xy = ll || ul || ur || lr;
   max = max(ur[,1]) || max(ur[,2]);
        
   sf  = {100 100} / max;              * scale factor: expand to 100;
   if any(verbose = 'BOX') then
      print max[c={X Y}] sf[c={X Y}];
   window = {-16 -16, 108 108};
   if legend ^= 'NONE' then window[2,]={116 116};
   call gwindow(window);
 
   *-- Set parameters for filling boxes in various fill types;
   lines = {1 3};
   run fillpat(fcolors,fpattern);

        centers = ((ll + ur) / 2) * diag(sf);
        fillmin = max(shade);
        if cellfill ^= 'NONE' then do;
                cfill = cellfill;
                cellfill = scan(cellfill, 1);
                fillmin = scan(cfill, 2,' ');
                if fillmin =' '
                        then fillmin = shade[1];
                        else fillmin = num(fillmin);
                dec = scan(cfill, 3,' ');
                if dec = ' ' then do;
                        if cellfill = 'DEV' 
                                then dec = 1;
                                else dec = 0;
                        end;
                        else dec=num(dec);
                end;
        free cxy txy hxy;
         
   do i=1 to nrow(boxes);
                bwidth = boxes[i,3];
                bheight= boxes[i,4];
      box = shape(xy[i,], 4) * diag(sf);
      *-- Make color and direction of shading reflect deviation
          from model;
      index = 1 + (round(dev[i],.001)<0);
      color = colors[index];
      line  = lines [index];
                if (abs(dev[i])<fuzz) then do;              * fuzz 
residuals;
                        color='BLACK';
                        line = 1;
                        end;   
      fcol= color;
                den   = sum( abs(dev[i]) >= shade );        * shading 
density;
                den   = min(max( 0, den ), 5);              * 0 <= den <= 5  
;
      if filltype ='DRAW' then pat='EMPTY';
      else do;
         if den= 0 then pat = 'EMPTY';               * fill pattern   ;
         else do;
            pat = fpattern[index,den];
            fcol = fcolors[index,den];
         end;
      end;

                *-- Writing something in the cell?  Store locations and 
text;
*               if cellfill ^= 'NONE' & (den>0 | nrow(boxes)<12) then do; 
                if cellfill ^= 'NONE' & (abs(dev[i])>=fillmin | 
nrow(boxes)<12) then do; 
         cxy = cxy // centers[i,];
                        bxy = bxy // (bwidth || bheight);
                        hxy = hxy // den;
                        if cellfill = 'SIGN'
                                then do;
                                        txy = txy // substr(({'+++++','-----
'}[index]),1,max(1,den));
                                        end;
                        else if cellfill = 'SIZE'
                                then do;
                                        txy = txy // ({'+','-'}[index]);  
                                        end;
                        else if cellfill = 'DEV' then do; 
                                txy = txy // compress(char(dev[i], 6, dec));
                                end;
                        else do; /* cellfill = 'FREQ' */
                                txy = txy // compress(char(freq[i], 6, dec));
                                end;
      end;
                *-- Draw and fill the box;              
      call gpoly( box[,1], box[,2], line, color, pat, fcol);
   end; * <---- loop over boxes;
        
   if filltype='DRAW' then run fillbox(boxes,dev,sf);

        *-- Adding cellfill annotations? ;
        if cellfill ^= 'NONE' & nrow(txy)>0 then do;
                cht = j(nrow(txy), 1, htext);
                if cellfill = 'SIGN' then cht = 1.2 # cht;
                if cellfill = 'SIZE' then cht = cht # hxy;
                do i=1 to nrow(txy);
                        call gstrlen(wid, txy[i], cht[i]);  
                        call gstrlen(dep, '+', cht[i]);
                        w = w // (wid || dep);
                        /*
                        if (wid < bxy[i,1]) & (dep < bxy[i,2]) then do;
                        call gpie(cxy[i,1], cxy[i,2], wid/2, 0, 360, 
'WHITE',, 'SOLID');
                        end;
                        */
                        cellclr = {'BLACK' 'WHITE'}[1+(hxy[i]>=2)];
                        sx = cxy[i,1]-wid/2;
                        sy = cxy[i,2]-dep/2;
                        call gscript(sx, sy, txy[i], 0, 0, cht[i], , 
cellclr);
                end;
      if any(verbose='BOX') then do;
                        print "cellfill labels", cxy[f=7.2] txy w[f=7.2] cht 
bxy[f=6.1];  
      end;
        end;
         
   do f=1 to nrow(labels);
      angle  = labelx[f,3];
      height = labelx[f,4];
      if angle = 0 then scale = diag(sf[,1] || {1});
                   else scale = diag({1} || sf[,2]);
      lxya = labelx[f,1:2] * scale;
      labl = labels[f ];
      call gscript( lxya[,1], lxya[,2], labl, angle, 0, height, font);
   end;

        *-- Draw the title;
   if length(title)>1 then do;
      height = max(htext+.1, 1.6);
      call gstrlen(len, title, height, font);
      if len > 110 then do;
         height = height # .1 # int(10 # 110 / len);
         call gstrlen(len, title, height, font);
         end;
      tx = (100 - len)/2;
      call gscript(tx, 104, title, 0, 0, height, font);
   end;

   run glegend(fcolors, fpattern);
   call gshow;
   finish;
 
start fillpat(fcolors, fpattern)
      global(filltype, shade, colors);
   *-- Set colors and patterns for shading  mosaic tiles;
   ns = ncol(shade);
   if ns<5
      then dens=char({1 3 4 5}[1:ns]`,1,0);
      else dens={'1' '2' '3' '4' '5'};
   fpattern = J(2,ns,'EMPTY   ');
   if ncol(colors)=1 then colors=shape(colors,1,2);
   colors = substr(colors[,1:2] + repeat('       ',1,2),1,8);
   fcolors = repeat(colors[1],1,ns) // repeat(colors[2],1,ns);
 
   if ncol(filltype)=1 then      /* if only one filltype */
      do;
         if filltype='M0'  then filltype={M0 M90};
         else if filltype='M45' then filltype={M135 M45};
                        else if filltype='LR' then 
filltype=cshape(filltype,2,1,1);
         else filltype=shape(filltype,1,2);
      end;
 
   do dir=1 to 2;                 * for + and - ;
      ftype = filltype[dir];      * fill type for this direction;
      if substr(ftype,1,1)='M' then
         do;
            angle = substr(ftype,2);
            if angle=' ' then angle='0';
            if ns=2 then fpattern[dir,] = {M1N M1X}+angle;
               else fpattern[dir,] = {M1N M3N M3X M4X 
M5X}[1:ns]`+angle;
         end;
      else if ftype='L' | ftype='R' then
            fpattern[dir,] = trim(ftype)+dens;
      else if substr(ftype,1,4)='GRAY' then
         do;
            if length(ftype)=4
               then step=16;
               else step = num(substr(ftype,5));
            dark = 256 - step#(1:ns)`;
            *-- convert darkness values to hex;
            fcolors = substr(fcolors,1,max(6,length(fcolors)));
            fcolors[dir,] = compress('GRAY'+hex(dark)`);
            fpattern[dir,] = repeat('SOLID',1,ns);
         end;
      else if substr(ftype,1,3)='HLS' then
         do;
         /* lightness steps for varying numbers of scale values */
         step={ 0 100 0 0 0,
               0 40 100 0 0,
               0 25 60 100 0,
               0 20 45 70 100}[max(1,ns-1),];
         /* make step=100 map to '80'x, =0 map to 75% of the way to 
'FF' */
         step =  25 + 0.75#step;
         step = step[1:ns]#255/100;
                        dark = 255-ceil(step/2);
         lval= hex(dark)`;

         clr = upcase(colors[dir]);
         hue = {RED GREEN BLUE MAGENTA CYAN YELLOW,
               '070' '100' '001' '03C' '12C' '0B4'  };
         col = loc(hue[1,]=clr);
         if ncol(col)=0 then col=dir;
         hval= hue[2,col];
          fcolors[dir,] =  compress('H'+hval+lval+'FF');
         fpattern[dir,] = repeat('SOLID',1,ns);
         end;
      end;
   finish;
 
start hex(num);
   *-- Convert 0<num<256 to hex;
   chars = cshape('0123456789ABCDEF',16,1,1);
   if num>255 then num = mod(num,256);
   h = rowcat(chars[1+floor(num/16),] || chars[1+mod(num,16),]);
   return(h);
   finish;
   
start glegend( fcolors, fpattern )
      global( shade, htext, colors, legend, font );
   *-- Draw legend indicating color/shading for standardized residuals;
   if legend='NONE' then return;
   ns = ncol(shade);
   values = ( -shade[ns:1]` ) ||{-.01} ||.01 || shade;
   cval = trim(char(values,3,0));
   nv = ncol(values);
   w = 8;            * width of legend box;
   s = 3;            * spacing between boxes;
   if ns>3 then do; w=7; s=2; end;
   label = {'Standardized' 'residuals:'};
   call gset('font',font);
   call gset('height',htext);
   call gstrlen(len,label);
   if legend = 'H' then
      do;
         y = {-11 -16};
         x = (100+s-(s+w)#nv) + w #(0:nv);
         call gscript(min(x)-len-3, y[1]-{2 5},label,0);
      end;
   else do;
         x = {107 112};
         y = (100+s-(s+w)#nv) + w #(0:nv);
         call gscript(x[1]+{2 5}, min(y)-len-3,label,90);
      end;
 
   do i = 1 to nv;
      sign = 1 + (values[i]<0);
      line={1 3}[sign];
      color = colors[sign];
      den = sum( abs(values[i]) >= shade );
      if den= 0 then pat = 'EMPTY';               * fill pattern   ;
      else do;
         pat = fpattern[sign,den];
         fcol = fcolors[sign,den];
         end;
      if legend='H' then
         do;
            xx = x[i]+ ({0 0}|| w || w) + s#(i-1);
            yy = y[{1 2 2 1}];
         end;
      else do;
            yy = y[i]+ ({0 0}|| w || w) + s#(i-1);
            xx = x[{1 2 2 1}];
         end;
      call gpoly( xx, yy, line, color, pat, fcol);
      if i=1 then             label = '<'+cval[i];
         else if i=nv then    label = '>'+cval[i];
         else if i<=ns+1 then label=cval[i-1]+':'+cval[i];
         else                 label=cval[i]+':'+cval[i+1];
      label = compress(label);
      call gstrlen(len,label);
      if legend='H' then
         do;
            xx = xx[1]+max(0,((xx[3]-xx[1])-len)/2);
            yy = yy[1]+1;
            angle = 0;
         end;
      else do;
            yy = yy[1]+max(0,((yy[3]-yy[1])-len)/2);
            xx = xx[3]+3;
            angle = 90;
         end;
      call gscript(xx,yy,label,angle);
   *print xx yy label len;
   end;
   finish;
 
start labels( levels, vnames, names, f, dir, boxes, labelx, labels )
      global( htext, font, verbose, vlabels );
   *-- generate positions of labels for this factor;
   k = levels[f];
   which = dir[f];
   factors = nrow(levels);
   loc = which + 1;                  /* 2=x, 1=y */
   box = boxes[(1:k),];
 
   str = shape(names[f,],k,1);
   line= sum(which=dir[1:f]);
   call gset('font', font);
   ht = htext - .1 * (line-1);
   call gstrlen( len, str, ht );
 
   *-- Position of labels along the baseline, centering if possible;
   wid = box[,loc+2];
   pos = box[,loc]          /*   center     rt justify */
          + choose( (len<=wid), (wid-len)/2, (wid-len) );
 * end = pos + len;
   if any(verbose = 'BOX') & 
           any(len/(wid+.01) > 1.5) then
      print 'Overfull label', str len wid pos;
   if pos[1]+len[1] > pos[2] then
      pos[1] = min(pos[1],box[1,loc])-1;
   do i=2 to nrow(len);     /* check for overlap of labels */
      if (pos[i] < pos[i-1]+len[i-1])
 /*     |(pos[i] < box[i,loc]) */ then do;
         if i<nrow(len) then do;
            if box[i,loc]+len[i] < pos[i+1] then pos[i] = box[i,loc];
            end;
         else pos[i] = max(box[i,loc],pos[i-1]+len[i-1]+1);
         end;
      end;
 
   *-- Baseline of labels for this factor;
   lines= sum(which = dir);
   sep = choose( (factors<=6), 1, .5);
   base =   3.5*line - 4*(lines+sep) + 2*(loc-1.5);
 
   lx = j(k,5,0);                      /*  1 | 0 : which */
   lx[,2-which] = base;                /*  x | y  */
   lx[,1+which] = pos;                 /*  y | x  */
   lx[,3] = {0 90}[loc];               /*  angle  */
   lx[,4] = ht;                        /*  height */
   lx[,5] = len;                       /*  length */

   if any(factors = vlabels) then do;      *-- variable names;
      ht = ht + .2;
      call gstrlen(len,vnames[f],ht);
      pos = ( max(box[,loc]+wid) - len)/2;
      base= base - 4;
      lx = lx // ( (pos || base || {0} || ht || len) //
                   (base|| pos  || {90}|| ht || len) )[loc,];
      str= str// vnames[f];
      end;
 
   labelx = labelx // round(lx,.01);
   labels = labels // str ;
   finish;
 
start fillbox(boxes, dev, sf)
      global(shade, verbose, colors);
   if ncol(shade)=1 then shade=shade`;
   *-- fill density proportional to number of shading levels
       equalled or exceeded by deviation;
   d = shape(dev,nrow(boxes),1);
   fill = ( (abs(d)*J(1,ncol(shade)))  >=
            repeat(shade,nrow(d),1) )[,+];
 
   *-- fill each box proportional to abs(fill);
   w = boxes[,3] # sf[1];
   h = boxes[,4] # sf[2];
   s =  choose( fill=0, 1000, 4 / abs(fill) );
 
   nxl = int((w-.5)/s);
   nyl = int((h-.5)/s);
   if any(verbose = 'BOX') then
      print 'Width, Height, Square', w h s fill nxl nyl;
   lines = {1 2};
   do i = 1 to nrow(boxes);
      index = 1 + (round(dev[i],.001)<0);
      color = colors[index];
      line  = lines [index];
      if nxl[i]>0 then do;
         from  =(((boxes[i,1] + (1:nxl[i]) # s[i]))`||
                 shape(boxes[i,2], nxl[i], 1))         * diag(sf);
         to    =
                (((boxes[i,1] + (1:nxl[i]) # s[i]))`||
                 shape(sum(boxes[i,{2 4}]), nxl[i], 1))* diag(sf);
         call gdrawl( from, to, line, color );
      end;
      if nyl[i]>0 then do;
         from  =(shape(boxes[i,1], nyl[i], 1) ||
                 ((boxes[i,2] + (1:nyl[i]) # s[i]))`)  * diag(sf);
         to    =(shape(sum(boxes[i,{1 3}]), nyl[i], 1) ||
                 ((boxes[i,2] + (1:nyl[i]) # s[i]))`)  * diag(sf);
         call gdrawl( from, to, line, color );
      end;
   end;
   finish;
 
start reduce(config, f);
   * find loglin config including only factors 1:f ;
   con = config;
   do i=1 to ncol(con);
      term = con[,i];
      if any(term > f) then term = remove(term, loc(term > f));
                if ncol(term) = 0
                        then con[,i] = j(nrow(config), 1, 0);
                        *-- next line would fail if term is now empty;
        else con[,i] = shape(term, nrow(config), 1, 0);
      end;
 
   *-- delete any all-zero rows and cols;
   r = con[+,];
   con = con[,loc(r>0)];
   r = con[,+];
   con = con[loc(r>0),];
 
   *-- rearrrange in order of increasing complexity;
   if ncol(con) > 1 then do;
      orders = (con ^=0)[+,];
      r = rank(orders);
      noc = con;
      con[,r] = noc;
 
      *-- remove terms which are marginal to those later;
      keep = j(1,ncol(con));
      do i = 1 to ncol(con)-1;
         ci = con[,i];
         if any(ci=0) then ci = remove(ci,loc(ci=0));
 
         do j = i+1 to ncol(con);
            cj = con[,j];
            if any(cj=0) then cj = remove(cj,loc(cj=0));
            if ncol(unique(ci,cj)) = ncol(unique(cj)) then keep[i]=0;
            end;
         end;
      con = con[,loc(keep)];
      end;  /* ncol(con)>1 */
   *-- delete any all-zero rows;
   r = con[,+];
   con = con[loc(r>0),];
   return(con);
   finish;
 
start chisq(obs, fit);
   *-- Find Pearson and likelihood ratio chisquares;
   gf = sum ( (obs - fit)##2 / ( fit + (fit=0) ) );
   lr = 2 # sum ( obs # log ( (obs+(obs=0)) / (fit + (fit=0)) ) );
   return (gf // lr);
   finish;
 
start terms(dim, config);
   *-- transform ipf config into list of terms in loglinear model;
   *   returns a matrix with ncol(dim) columns.  Each row is the
       indices of one term in the model;
   nv = nrow(dim);    * number of variables;
   nm = ncol(config); * number of margins in model;
   max= 2##nv - 1;
   do i = 1 to max;
      t = vars_in(i,nv);
      do j = 1 to nm;
         c = config[,j]`;
         *-- are all elements of t contained in this margin? ;
         if ncol(unique(t,c))  = ncol(unique(c))
            then do;
            terms = terms // shape(t,1,nv,0);
            goto next;
            end;
         end;
 next:
      end;
      return (terms);
 
   finish;
 
start vars_in(num,nv);
   *-- determine variables represented by a number from 1...2##nv-1,
       considered as a binary number;
   n = num;
   do i=1 to nv;
      if mod(n,2)=1 then r = r || i;
      n = int(n/2);
      end;
   return ( r );
   finish;
 
start df(dim,config);
        if all(config=1) then return(dim[1]-1);
   terms = terms(dim,config);
   nc = dim[#];         /* number of cells */
   nt = nrow(terms);    /* number of marginal terms */
   np = 0;
   *-- find number of parameters fitted in each term;
   do i = 1 to nt;
      t = terms[i,];
      t = t[,loc(t>0)];
      np= np + (dim[t]-1)[#];
      end;
   df = nc - np - 1;
   return(df);
   finish;
 
start modname(config,names,model) global(abbrev);
   *-- Expand IPF config into symbol for loglinear model;
   free model;
   brackets = {'{' '}'};
   vars = 0;
   do i = 1 to ncol(config);
      vars = unique(vars,config[,i]);
      end;
   if ncol(vars) > nrow(names) then brackets = {'[' ']'};
   do i = 1 to ncol(config);
      effect = config[,i];
      effect = effect[loc(effect>0)];
      term = '';
      do j = 1 to nrow(effect);
         if term ^= '' then
         term=trim(term)+ ',';
                        if abbrev=0
                then term = term + names[ effect[j,] ];
                                else term = term + substr(names[ effect[j,] 
],1,abbrev);
         end;
      term = brackets[1] + trim(term) + brackets[2];
      model= model || term;
   end;
   model = rowcatc(model);
   model = substr(model, 1, length(model));
   finish;
 
*-- Module for correspondence analysis for reordering table categories.
         At present, this analysis merely suggests an ordering, but does 
not
    actually reorder the table or the mosaic display;
         
start corresp( margin, dev, rl, cl)
                global(order);
                
        r = margin[,+];
        c = margin[+,];
        n = sum(margin);
        if (any(order='DEV')) then do;
                *-- use residuals from current model;
                d = shape(dev, nrow(margin), ncol(margin));
                dpd = t(d)*d / n;
                end;
        else do;
                *-- fit joint independence model for current col variable;
                e = r * c / n;
                d = (margin - e) / sqrt(e);
                dpd = t(d)*d / n;
                end;

   call eigen(values, vectors, dpd);
   k = min(nrow(margin), ncol(margin))-1; * number of non-zero 
eigenvalues;
   values  = values[1:k];
   cancorr = sqrt(values);        * singular values = Can R;
   chisq = n * values ;           * contribution to chi-square;
   percent = 100* values / trace(dpd);
        cum = cusum(percent);
        
   print 'Singular values, and Chi-Square Decomposition',,
         cancorr [colname={'Singular Values'} format=9.4]
         chisq   [colname={'Chi-Squares'}     format=9.3]
         percent [colname={'Percent'}         format=8.2]
         cum     [colname={' Cum % '}         format=8.2];

        *-- Find Dim1 scores for row/col categories;
   L = values[1];
   U = vectors[,1];
   Y = diag(1/sqrt(C/N)) * U * diag(sqrt(L));
   X = diag(N/R) * (margin / N) * Y * diag(sqrt(1/L));

        d = dev;
        row = rl;
        col = cl[,1:ncol(margin)];
        *-- sort rows and cols of dev by corresp dimensions1;
        if (any(order='ROW')) then do;
                rx = rank(X);
                t = d;
                d[ rx, ] = t;
                t = row;
                row[ rx, ] = t;
                t = X;
                X[ rx ] = t;
                end;
                
        if (any(order='COL')) then do;
                ry = rank(Y);
                t = d;
                d[, ry] = t;
                t = col;
                col[ ,ry ] = t;
                t = Y;
                Y[ ry ] = t;
                Y = t(Y);
                perm = ry`;
                perm[ , ry] = 1:ncol(Y);
                if ncol(Y)>2 & cum[1]>.5 then
                        print 'Suggested permutation of levels of this 
variable is',
                                perm[c=col];
                end;
                
        print 'Residuals, bordered by Row and Column Scores on CA 
Dimension 1',
                        'Reordered by' order;
        print d [r=row c=col format=7.2] X[c={'RowDim1'} f=7.2], 
                        Y[r={'ColDim1'} f=7.2]; 
        finish;

*-- Modules for data input;
  /* ------------------------------------------------------------------
--
   Routine to read frequency and index/label variables from a SAS 
dataset
   and construct the appropriate levels, and lnames variables

   Input:  dataset - name of SAS dataset (e.g., 'mydata' or 
'lib.nydata')
           variable - name of variable containing frequencies
             vnames - character vector of names of index variables
   Output: dim (numeric levels vector)
           lnames (K x max(dim)) 
  --------------------------------------------------------------------
*/
 
start readtab(dataset, variable, vnames, table, dim, lnames);
        if type(vnames)^='C'
                then do;
                        print 'VNAMES argument must be a character vector';
                        show vnames;
                        return;
                end;            
   if nrow(vnames)=1 then vnames=vnames`;

        call execute('use ', dataset, ';');
        read all var variable into table;
        run readlab(dim, lnames, vnames);
        call execute('close ', dataset, ';');
        reset noname;
        print 'Variable' variable 'read from dataset' dataset,
                'Factors:        ' (vnames`),
                'Levels ordered: ' vnames lnames;
        reset name;
        finish;

/* Read variable index labels from an open dataset, construct a dim 
   vector and lnames matrix so that variables are ordered correctly
   for mosaics and ipf (first varying most rapidly).

        The data set is assumed to be sorted by all index variables.  If
        the observations were sorted by A B C, the output will place
        C first, then B, then A.
   Input:    vnames (character K-vector)
 */

start readlab(  dim, lnames, vnames);
        free span lnames dim;
        nv = nrow(vnames);

        spc = '        ';
        do i=1 to nv;
                vi = vnames[i,];
                read all var vi into cli;
                if type(cli) = 'N'
                        then do;
                                tmp = trim(left(char(cli,8)));
                                tmp = substr(tmp,1,max(length(tmp)));
                                cli = tmp;
                                end;  
                cli = trim(cli); 
                span = span || loc(0=(cli[1,] = cli))[1];
                d=design( cli );
                dim = dim || ncol(d);
                free row1;
                *-- find position of each first distinct value;
                do j=1 to ncol(d);
                        row1 = row1 || loc(d[,j]=1)[1];
                        end;
        *               print vi cli d;
        *               print row1;
                *-- sort elements in row1 so that var labels are in data 
order;
                order = rank(row1);
                tmp = row1;
                row1[,order]=tmp;       

                li = t(cli[row1]);
                if i=1 then lnames = li;
                        else do;
                                if ncol(lnames) < ncol(row1) 
                                        then lnames=lnames || repeat(spc, i-1, 
ncol(row1)-ncol(lnames));
                                if ncol(lnames) > ncol(row1)
                                        then li = li || repeat(spc, 1, 
ncol(lnames)-ncol(li));
                                lnames = lnames // li;
                        end;
                end;
        *       print span;
        *-- sort index variables by span so that last varies most slowly;
        order = rank(span);
        tmp = span;
        span[,order] = tmp;
        tmp = dim;
        dim[,order] = tmp;
        tmp = lnames;
        lnames[order,] = tmp;
        tmp = vnames;
        vnames[order,] = tmp;
        *       print dim lnames vnames;
        finish;

*-- backward compatibility [reorder -> transpos] (until the next 
release);
start reorder(dim, table, vnames, lnames, order);
        run transpos(dim, table, vnames, lnames, order);
        finish;

start transpos(dim, table, vnames, lnames, order);
   *-- Reorder the dimensions of an n-way table.  Order is a 
permutation
            of the integers 1:ncol(dim), such that order[k]=i means that
                 dimension k of the array table becomes dimension i of the 
result.
                 Alternatively, order can be a character vector of the 
names
                 of variables (vnames) in the new order.
   *   Note: to restore a reordered table to its original form, use
                 the anti-rank of the original order;
   *-- Use to rearrange the table prior to calling mosaics;
                 
   if nrow(order) =1 then order=order`;
        if type(order)='C' then do k=1 to nrow(order);
                ord = ord // loc(upcase(order[k,]) = upcase(vnames));
                end;
        else ord = order;

        *-- Dont bother if order = 1 2 3 ... ;
        if all( row(ord)=1:ncol(row(ord)) ) then return;

   if nrow(dim  ) =1 then dim  =dim`;
   if nrow(vnames)=1 then vnames=vnames`;
   call marg(loc,newtab,dim,table,ord);
   table = row(newtab);
   dim = dim[ord,];
   vnames = vnames[ord,];
   lnames = lnames[ord,];
        finish;

start row (x);
   *-- function to convert a matrix into a row vector;
 
   if (nrow(x) = 1) then return (x);
   if (ncol(x) = 1) then return (x`);
   n = nrow(x) * ncol(x);
   return (shape(x,1,n));
        finish;

/*--------------------------------------------------------------*
 |  IML module to handle multiple output EPS files (or other    |
 |  device-dependent multiple plot circumstances).              |
 |  - This implementation requires a macro variable, DEVTYP to  |
 | be set.  Initialize the FIG variable to 1.  
 *--------------------------------------------------------------*/
*global fig gsasfile devtyp;
start gskip;
        call execute('_dev_ = upcase("&DEVTYP");');
        call execute('_disp_ = "&DISPLAY";');
        if upcase(trim(_disp_)) ^= 'OFF' then do;
        if (_dev_ = 'EPS') | (_dev_ = 'GIF') then do;
                _dev_ = lowcase(_dev_);
                call execute('%let fig = %eval(&fig + 1);');
                call execute('%let gsas = %scan(&gsasfile,1,.)&fig..', 
_dev_, ';');
                call execute('%put NOTE: gsasfile now: &gsas;');
                call execute('filename gsas&fig  "&gsas";');
                call execute('goptions gsfmode=replace gsfname=gsas&fig;');
                free _dev_ _disp_;
        end;
        end;
        finish;

/*  Dummy makemap module (replaced by weblet) */
start makemap( boxes, labelx, lnames, levels, dev, margin, fit, f)
        global(legend, verbose);
        dummy=1;
        finish;

/*  Translate a character matrix MAT to a numeric matrix of
    the indices of each element in a vector of NAMES.
         Returns a numeric matrix of the same shape as MAT  */
start name2num(mat, names);
        new = j(nrow(mat), ncol(mat), 0);
        do i=1 to nrow(mat);
                do j=1 to ncol(mat);
                        l = loc(trim(upcase(mat[i,j])) = upcase(names));
                        if type(l)^='U' then new[i,j] = l;
                        end;
                end;
        return(new);
        finish;

start adjres( dim, config, m, res, adj);
        *-- Calculate adjusted residuals for a loglin model;
        
        run cdesign(dim, config,  x );        /* get design matrix */
        D = diag(m);
        S = inv( t(X) * D * X);               /* cov(beta) */
   V = (sqrt(D) * x)`;

         *-- compute leverage, faster than vecdiag(V` * S * V);
        C = t(V)*S  ;                         /* catcher matrix */
        lev = (C#t(V))[,+];                   /* leverage  */

   se  = sqrt(1 - lev);                  /* std errors of resids */
   adj = res / se;                       /* adjusted residuals */
*       print m[f=6.3] lev[f=6.3] res[f=6.3] adj[f=6.3] se[f=6.3];
        finish;

start cdesign(dim, config, x);
   *-- Find (full rank) design matrix, X, from IPF configuration;
   if nrow(dim)=1 then dim = dim`;
   terms = terms(dim,config);
   nc = dim[#];               /* number of cells */
   nt = nrow(terms);          /* number of marginal terms */

   free x;
   *-- construct cols of X matrix for each term;
   do i = 1 to nt;
      t = terms[i,];
      t = t[,loc(t>0)];                /* find vars in this term */
      xi= 1;
      do j=1 to nrow(dim);
         if any(j=t) then do;          /* is variable j in the term?*/
            xj = designf( (1:dim[j])` ) ;
            end;
         else xj = j(dim[j],1) ;
         xi = xj @ xi;
         end;
      x = x || xi;
      end;
   x = j( nrow(x), 1) || x;
   finish;

start outstat( vnames, lnames, levels, dev, margin, fit, f, mod)
        global(outstat);
        
        if outstat='' then return;
        
        do i=1 to f;
      cl = lnames[i,]`;
      if i=1 then do;
                                rl=cl;
                                ml=cl[1:(levels[i])];
                                end;
         else do;        /* construct row labels for prior factors */
            ol = repeat(rl, 1, levels[i]);
                                ml = repeat(ml, (levels[i]), 1);
            ol = shape(ol,  levels[i]#nrow(rl), 1);
            nl = repeat( (cl[1:(levels[i])]), nrow(rl));
            rl = trim(rowcatc(ol || shape(':',nrow(ol),1) ||nl));
                                ml = ml || nl;
         end;
                end;
        labels = rl+ '                                          ';
        residual  = shape(dev, nrow(dev)#ncol(dev), 1);
        fitted = shape(fit, nrow(fit)#ncol(fit), 1);
        freq = shape(margin, nrow(margin)#ncol(margin), 1);
        factors = shape(f, nrow(fit)#ncol(fit), 1);
        model = shape(mod, nrow(fit)#ncol(fit), 1)+'                   ';
        
        vn = rowcat(row(vnames)+' '); 
        vn = vn + ' factors labels residual fitted freq model';
        
        *-- create all the factor variables;
        do i=1 to ncol(row(vnames));
                if i<= ncol(ml)
                        then call execute( vnames[i], '= ml[,i];');
                        else call execute( vnames[i], '= "        ";');
                end;


        *print 'Outstat', factors labels freq fitted residual;
        if f=1 then do;
                call execute('%let outstat=', outstat, ';');
/*
                call execute("create &outstat var{factors labels freq 
fitted residual model};");
*/
                call execute("create &outstat var{", vn,  "};");
                end;
        append;
        finish;


/*--------------------------------------------------------------------*
 | MOSAICM SAS                                          Version 3.4   |
 |  IML modules for general n-way mosaic display of contingency table.|
 |  This program creates and stores the modules in the IML storage    |
 |  library, MOSAIC.MOSAIC                                            |
 *--------------------------------------------------------------------
*/
title 'Install mosaic modules';
*-- Change the path in the following filename statement to point to
    the installed location of mosaics.sas;
*filename mosaics  'c:\sasuser\mosaics';
filename mosaics  '~/sasuser/mosaics/';
*--- Change the path in the libname to point to where the compiled
    modules will be stored, ordinarily the same directory;
*libname  mosaic   'c:\sasuser\mosaics';
libname  mosaic   '~/sasuser/mosaics/';

proc iml ;
   *-- Install mosaics.sas as compiled modules;
   reset storage=mosaic.mosaic;
   %include mosaics(mosaics) ;
   store module=_all_;
   show storage;
quit;

/*--------------------------------------------------------------------*
 | MOSAICD SAS                                                 3.4    |
 |  IML modules for general n-way mosaic display of contingency table.|
 |  This version uses externally calculated cell residuals (dev)      |
 *--------------------------------------------------------------------*
 | Usage:                                                             |
 |  proc iml;                                                         |
 |   %include mosaicd;                                                |
 |   shade = {   }; space = {   }; verbose = ;  see global inputs     |
 |   run mosaicd(levels, table, vnames, lnames, dev, title);          |
 | where:                                                             |
 |   levels    Vector of number of levels of each factor              |
 |   table     Table of cell frequencies, as in IPF: first factor     |
 |             varies most rapidly along cols, last factor varies     |
 |             most slowly down the rows.  If elements of table are   |
 |             a single variable in a SAS data set (e.g., output from |
 |             PROC FREQ), with factors={A B C}, the rows (obs.)      |
 |             should be sorted BY C B A to obtain IPF ordering.      |
 |                                                                    |
 |   vnames    Vector of factor names, order corresponding to levels  |
 |   lnames    Matrix of level names: rows=factors, cols=max(levels)  |
 |             lnames[i,1:levels[i]] gives level names for factor i.  |
 |                                                                    |
 |   dev       Deviations from model                                  |
 |   title     Character string title for plots                       |
 |                                                                    |
 | Global input variables (optional)                                  |
 |   filltype  Type of fill pattern to use for shading.               |
 |               'LR'  --> patterns Ld, Rd, where d=density value     |
 |               'M0'  --> patterns MdN0, MdN90                       |
 |               'M45' --> patterns MdN135, Md45 [default]            |
 |   htext     Height of text labels [default: 1.3]                   |
 |   shade     Vector of up to 5 values of abs(dev[i]) for boundaries |
 |             between shading levels. If shade={2 4}; (the default), |
 |             then shading density is:                               |
 |                   0 <= |dev[i]| < 2  ->  0 (empty)                 |
 |                   2 <= |dev[i]| < 4  ->  1                         |
 |                   4 <= |dev[i]|      ->  2                         |
 |             Use shade= a big number to suppress all shading.       |
 |   space     2-vector of x,y: amount of plotting area reserved for  |
 |             spacing. Typically {20 20}.                            |
 |   verbose   Controls verbose or detailed output                    |
 *--------------------------------------------------------------------*
 *  Author:  Michael Friendly            <friendly@yorku.ca>          *
 * Created:  11 Aug 1990 10:27:11          (c) 1990, 1991             *
 * Revised:  13 May 1998 09:00:11                                     *
 * Version:  3.4                                                      *
 *--------------------------------------------------------------------
*/
*title 'Mosaic displays for externally-fitted models';
*version(6.06);  *-- Requires SAS Version 6.06 or later;
 
*proc iml;
start mosaicd(levels, table, vnames, lnames, dev, title)
      global(config, devtype, fittype, filltype, shade, space, split,
             legend, colors, htext, font, verbose, cellfill, order, 
zeros);
 
   if nrow(vnames)=1 then vnames=vnames`;
   if nrow(levels)=1 then levels=levels`;
   print / '+-------------------------------------------+',
           '|  Specialized Mosaic Display, Version 3.4  |',
           '+-------------------------------------------+',,
           title, vnames levels '  ' lnames;
 
   *-- Check conformability of arguments --;
   f = nrow(vnames)||nrow(lnames);
   if ^all(f = nrow(levels))
      | levels[#] ^= nrow(table)#ncol(table)
      then do;
         print 'Arguments not conformable'; show levels table;
         goto done;
      end;
 
        run globals;    
        reset name;
   print 'Global options', filltype split
                           shade[f=3.0] colors ,htext
                                                                        font legend 
cellfill verbose;
   f = nrow(levels);
   whichway = num(translate(split,'10','HV'));
   dir = shape(whichway,f,1);
   if type(space) ^= 'N'
      then space = 10# (sum(dir=0) || sum(dir=1));
   savspace = space;
        savedev = dev;
         
   call gstart;
   *-- divide the plot into boxes  --;
   run divided(levels, table, vnames, lnames, dev, dir, title);
   call gstop;
   space = savspace;
        dev = savedev;
   done:
   finish;
 
start divided(levels, table, vnames, names, dev, dir, title)
      global(shade, space, verbose);
   *-- start with origin in lower left corner --;
   length= {100 100};                /* x,y length of box area */
   boxes = {0 0}                     /* lowerleft  x,y  */
         ||( length - space );       /* length     x,y  */
   factors = nrow( levels );
   rows = 1;                 /* number of rows in margins */
   do f = 1 to factors ;
      whichway =dir[f];
      cols = levels[f];      /* number of cols in margins */
      reset noname;
      print "Factor:" f (vnames[f]);
      reset name;
      mconfig = (f:1)`;       * n sub f, (f-1), ..., 1 ;
      call marg(loc,margin,levels,table,mconfig);
 
      *-- Construct row and column labels for marginal table;
      cl = names[f,];
      if f=1 then rl='';
         else do;        /* construct row labels for prior factors */
            ol = repeat(rl, 1, levels[f-1]);
            ol = shape(ol,  levels[f-1]#nrow(rl), 1);
            nl = repeat((names[f-1,1:(levels[f-1])])`, nrow(rl));
            rl = concat(ol,nl);
         end;
      margin = (shape(margin, rows)) ;
      print 'Marginal totals', margin[r=rl c=cl ];
 
      *-- Calculate proportions for each row over row totals, allowing
          for 0 margins;
      mar = row( margin );
      margin = margin / ( ( margin[,+] + (0=margin[,+]) )
                            * J(1,levels[f]) );
      *-- Divide boxes for this factor;
      run divide1(levels, margin, boxes, f, whichway);
 
      if any(verbose = 'BOX') then do;
         bl = shape( (rl || J(rows,cols-1,' ')),rows#cols,1);
         print boxes[r=bl c={'BotX' 'BotY' 'LenX' 'LenY'} format=8.2];
      end;
 
      run  space( levels, boxes, f, dir );
      run labels( levels, vnames, names, f, dir, boxes, labelx, labels 
);
 
      if any(verbose = 'BOX') then;
      print labelx[r=labels c={x y 'Angle' 'Height'}] ;
 
      *-- display the mosaic for current margins ;
      if f=factors then do;
         *-- at this point the marginal table has been turned inside-
out;
         *   so, dev must be reordered to match. However, MARG chokes 
on
             values < 0, so subtract minimum value, then add it back;
         mdev = min(dev);
         run marg(loc,newdev,levels, dev-mdev, mconfig);
         dev = newdev + mdev;
*?????   dev = shape( dev,1,(nrow(dev)#ncol(dev)) );
         call gboxes( boxes, labels, labelx, dev, f, title, mar );
         run gskip;
         end;
 
      *-- prepare for next factor;
      rows = rows * levels[f];
   end;
   finish;
 
*reset storage=mosaic.mosaic;
*store module=_all_;
*show storage;

/*  Name:  mosdata.sas
   Title:  Assorted contingency table data sets for mosaic displays
*/

/*
  Running this program creates a SAS/IML storage catalog named
  MOSAIC.MOSDATA.  It is assumed that the libref MOSAIC has been
  defined, e.g., as follows:
libname mosaic   'c:\sasuser\mosaics';         *-- Windows ;
*/
libname mosaic '~/sasuser/mosaics';            *-- Unix    ;

/*
  To use one with mosaics,

  proc iml;
     reset storage=mosaic.mosaic;
     load module=_all_;
     reset storage=mosaic.mosdata;
     load module=bartlett;
     run bartlett;
     ...
*/
proc iml;

start bartlett;
   data='bartlett';
   dim = {2 2 2};
   title="Bartlett data";
        source='Bartlett, 1935, JRSS';
   table = {156 84 84 156,
            107 133 31 209};
   vnames= {'Alive?' 'Time' 'Length' ,
            'A' 'T' 'L'};
   vnames = vnames[1,];
   lnames= {'Alive' 'Dead',
            'Now' 'Spring',
            'Long' 'Short'};
   finish;

start berkeley;
   data='berkeley';
   title='Berkeley Admissions Data';
        source='Bickel-etal:75';
   dim = {2 2 6};
   vnames = {"Admit" "Gender" "Dept"};
   lnames = {"Admitted" "Rejected" " "  " "  " "  " ",
             "Male"  "Female" " "  " "  " "  " ",
             "A" "B" "C" "D" "E" "F"};
          /* Admit Not */
   table = { 512  313,
              89   19,
             353  207,
              17    8,
             120  205,
             202  391,
             138  279,
             131  244,
              53  138,
              94  299,
              22  351,
              24  317};
   *  print table;
   finish;

start cancer;
   data='cancer';
/*
 Three year survival of 474 breast cancer patients according to nuclear
 grade and diagnostic centre.
        Data from Morrison etal
   Whittaker, J. (1990) Graphical Models, p. 220.
   Lindsey, J.K. (1995) Modelling Frequency and Count Data, p38
*/
        source='Morrison etal, Lindsey:95 (p38)';
   dim= {2 2 2};
   table = {
   /*
      Malignant   benign
      Died Surv Died Surv
   */
      35   59   47   112,   /* Boston    */
      42   77   26    76};  /* Glamorgan */

   vnames = {'Survival' 'Grade' 'Center'};
   lnames = {'Died' 'Surv',
            'Malignant' 'Benign',
            'Boston' 'Glamorgan'};
   title = 'Breast Cancer Patients';
   finish;

start cesarean;
   data='cesarean';
   source='Fahrmeir & Tutz (1994)';
        title = 'Risk factors for infection in cesarean births';
   dim = {3 2 2 2};
   vnames = {'Infection' 'Risk?' 'Antibiotics' 'Planned'};
   lnames = {'Type 1' 'Type 2' 'None',
            'Yes' 'No' '',
            'Yes' 'No' '',
            'Yes' 'No' ''
            };
   table = {
      0   1  17,
      0   1   1,
      11  17  30,
      4   4  32,
      4   7  87,
      0   0   0,
      10  13   3,
      0   0   9
            };
   finish;

start detergen;
   data='detergen';
   source='Fienberg:80 (p. 71), RiesSmith:63';
        title = 'Detergent preference data';
   dim = {2 2 2 3};
   vnames = {'Temperature' 'M-User?' 'Preference' 'Water softness'};
   lnames = {'High' 'Low' '',
            'Yes'  'No'  '',
            'Brand X' 'Brand M' '',
            'Soft' 'Medium' 'Hard'};

   table = {
      19  57  29  63,  /* Soft     Brand X */
      29  49  27  53,  /* Soft     Brand M */
      23  47  33  66,  /* Medium   Brand X */
      47  55  23  50,  /* Medium   Brand M */
      24  37  42  68,  /* Hard     Brand X */
      43  52  30  42   /* Hard     Brand M */
   };
   finish;

start dyke;
        data='dyke';
        source='DykePatterson:52, Fienberg:77 (p.73)';
/*
!Sources of knowledge of cancer Dyke & Patterson (1952)
!(Fienberg, 1977, p.73)
!lindsey/cat3ex/ch2ex8.dat
!Stokes, Davis, Koch, sec 14.4
!         Y                     N            Radio
!    Y         N           Y         N      Reading
!Good Poor Good Poor   Good Poor Good Poor
!                                           Newspaper   Lectures
*/
   table = {
  23    8    8    4     27   18    7    6,  /*   Y          Y */
 102   67   35   59    201  177   75  156,  /*   Y          N */
   1    3    4    3      3    8    2   10,  /*   N          Y */
  16   16   13   50     67   83   84  393,  /*   N          N */
   };
   dim = {2 2 2 2 2};
   vnames = {'Knowledge' 'Reading' 'Radio' 'Lectures' 'Newspaper'};
   lnames = {'Good' 'Poor',
            'Yes' 'No',
            'Yes' 'No',
            'Yes' 'No',
            'Yes' 'No' };

        *-- Reversing order of Yes/No vars makes nicer displays;
        table = table[{4 3 2 1},];
        table = table[,{7 8 5 6 3 4 1 2}];
   table = shape(table, 16, 2);
        lnames[2,] = lnames[2,{2 1}];
        lnames[3,] = lnames[3,{2 1}];
        lnames[4,] = lnames[4,{2 1}];
        lnames[5,] = lnames[5,{2 1}];

   title='Sources of knowledge of cancer';
   finish;

start employ;
/*
    Employment status on Jan1 1975, by cause of layoff and length of
    previous employment at time of layoff for employees who lost their
    job in Fall 1974 in Denmark.

        "In 1974 the Danish National Inst for Social Science Research
        investigated 1314 employees who left their jobs during the 2nd
        half of the year.
        Classified by:
                Employment status, 1/1/75:  New job vs. still 
unemployed
                Cause of layoff:            Closure, etc. vs. 
Replacement
                Length of employment at time of layoff
*/

        data='employ';
   title  = 'Employment Status Data';
   source = 'Andersen (1991), Ex 5.3 (p.167); from Kjer (1978) Table 
4.8';
   dim = {2 2 6};
   vnames = {'EmployStatus' 'Layoff' 'LengthEmploy'};
   lnames = {'NewJob' 'Unemployed'    '' '' '' '',
             'Closure'  'Replaced' '' '' '' '',
             '<1 Mo' '1-3 Mo' '3-12 Mo' '1-2 Yr' '2-5 Yr' '>5 Yr'};

       /* B: -Closure-   -Replaced-                 */
       /* A: Job  Unem   Job  Unem                  */
   table = {   8   10,     40   24,        /* < 1 Mo */
              35   42,     85   42,        /* 1-3 Mo */
              70   86,    181   41,        /* 3-12Mo */
              62   80,     85   16,        /* 1-2 Yr */
              56   67,    118   27,        /* 2-5 Yr */
              38   35,     56   10 };      /* > 5 Yr */
        finish;

start gilby;
   title = 'Clothing and intelligence rating of children';
   source = 'Gilby & Pearson 1911, from Anscombe 1981, p 302';
   /*
   Gilby, W. H. and Pearson, K.
   On the significance of the teacher's appreciation of
   general intelligence.  Biometrika, 8, 93-108 (esp p 94)
   Quoted by Kendall (1943,...1953) Table 13.1, p 320

   Schoolboys were classified according to their clothing
   and to their teachers rating of 'dullness'
   Note: the last two categories of clothing were pooled
   as were the first two categories of dullness
   */
   data='gilby';
   dim = {6 4};
   vnames = {'Dullness' 'Clothing'};
   lnames = {
      'Mentally defective or Slow dull' 'Slow' 'Slow intelligent' 
'Fairly intelligent' 'Distinctly capable' 'Very able',
      'Very well clad' 'Well clad' 'Poor but passable' 'Insufficient or 
worse' '' ''
      };
        *-- Shorter labels for the categories;
   lnames = {
      'Ment. defective' 'Slow' 'Slow Intell' 'Fairly Intell' 'Capable' 
'V.Able',
      'V.Well clad' 'Well clad' 'Passable' 'Insufficient' '' ''
      };
   table = {
         33  48 113 209 194  39,
         41 100 202 255 138  15,
         39  58  70  61  33   4,
         17  13  22  10  10   1 };
   finish;

start haireye;
   *-- Hair color, eye color data;
        data='haireye';
        source = 'Snee (1974)';
        table = {
 /* ----brown---   -----blue-----   ----hazel---   ---green--- */
   32  38  10  3   11  50  10  30   10  25  7  5   3  15  7  8,   /* M 
*/
   36  81  16  4    9  34   7  64    5  29  7  5   2  14  7  8 }; /* F 
*/
** changed table[,2] from {38,81} **;
 table = {
 /* ----brown---   -----blue-----   ----hazel---   ---green--- */
   32  53  10  3   11  50  10  30   10  25  7  5   3  15  7  8,   /* M 
*/
   36  66  16  4    9  34   7  64    5  29  7  5   2  14  7  8 }; /* F 
*/


   dim    = { 4 4 2 };
   vnames = {'Hair' 'Eye' 'Sex' };          /* Variable names */
   lnames = {                               /* Category names */
         'Black' 'Brown' 'Red' 'Blond',    /* hair color */
         'Brown' 'Blue' 'Hazel' 'Green',   /* eye color  */
         'Male' 'Female' ' '  ' ' };       /* sex        */
   title  = 'Hair color - Eye color data';
   finish;

start heart;
   data='heart';
   source='Karger, 1980';
        title = 'Sex, occupation and heart disease';

   dim = {2 2 3};
   vnames = {'Disease' 'Gender' 'Occup'};
   lnames = {'Disease' 'None'    '',
             'Male'   'Female'   '',
             'Unempl' 'WhiteCol' 'BlueCol'};
        table = {
        254     759,   /*  Male    Unempl  */
        431     10283, /*  Female  Unempl  */
        158     3155,  /*  Male    WhiteCol*/        
        52      3082,  /*  Female  WhiteCol*/        
        87      2829,  /*  Male    BlueCol */
        16      416};  /*  Female  BlueCol */
        finish;

start heckman;
   data='heckman';
        source='HeckmanWillis:77,Lindsey:93 (p. 185)';
/*
From Lindsey, cat3dat/ch9e5.dat
see also Lindsey 93, Table 6.2, p185
!Labour force participation of married women 1967-1971
!Heckman, J.J. & Willis, R.J. (1977) "A beta-logistic model for the 
analysis
!of sequential labor force participation by married women." Jr. Pol. 
Econ.
!85: 27-58
!1971    1970 1969 1968 1967
!Yes No
*/
   table = {
      426  38 , /*Yes  Yes  Yes  Yes */
         16  47 , /*No   Yes  Yes  Yes */
       11   2 , /*Yes  No   Yes  Yes */
       12  28 , /*No   No   Yes  Yes */
       21   7 , /*Yes  Yes  No   Yes */
        0   9 , /*No   Yes  No   Yes */
        8   3 , /*Yes  No   No   Yes */
        5  43 , /*No   No   No   Yes */
       73  11 , /*Yes  Yes  Yes  No  */
        7  17 , /*No   Yes  Yes  No  */
        9   3 , /*Yes  No   Yes  No  */
        5  24 , /*No   No   Yes  No  */
       54  16 , /*Yes  Yes  No   No  */
        6  28 , /*No   Yes  No   No  */
       36  24 , /*Yes  No   No   No  */
       35 559}; /*No   No   No   No  */

        title='Labour force participation of married women 1967-1971';
   dim = {2 2 2 2 2};
   vnames = '19' + ('71':'67');
   lnames = repeat({'Working   ' 'Not Working'}, 5, 1);
   lnames = repeat({'Yes  ' 'No   '}, 5, 1);
   lnames[,1] = ('71':'67')` + lnames[,1];
   finish;

start hoyt;
        source='Hoyt, Krishnaiah, Torrence (1959); Fienberg pp.91-92';
        data='hoyt';
        title = 'Minnesota High School Graduates';
   dim   = { 4 3 7 2};
   vnames= {'Status' 'Rank' 'Occupation' 'Sex'};
   lnames= {'College' 'School' 'Job' 'Other' '' '' '',
            'Low' 'Middle' 'High' '' '' '' '',
            '1' '2' '3' '4' '5' '6' '7',
            'Male' 'Female' '' '' '' '' ''};
   table = {
     87   3  17 105  216   4  14 118  256   2  10  53 ,
     72   6  18 209  159  14  28 227  176   8  22  95 ,
     52  17  14 541  119  13  44 578  119  10  33 257 ,
     88   9  14 328  158  15  36 304  144  12  20 115 ,
     32   1  12 124   43   5   7 119   42   2   7  56 ,
     14   2   5 148   24   6  15 131   24   2   4  61 ,
     20   3   4 109   41   5  13  88   32   2   4  41 ,
     53   7  13  76  163  30  28 118  309  17  38  89 ,
     36  16  11 111  116  41  53 214  225  49  68 210 ,
     52  28  49 521  162  64 129 708  243  79 184 448 ,
     48  18  29 191  130  47  62 305  237  57  63 219 ,
     12   5  10 101   35  11  37 152   72  20  21  95 ,
      9   1  15 130   19  13  22 174   42  10  19 105 ,
      3   1   6  88   25   9  15 158   36  14  19  93 };
   finish;

start marital;
        source='Agresti, Table 7.3';
        data='marital';
  *-- define the data variables;
                    /* Gender Pre  Extra */
  table={ 17   4 ,  /* Women  Yes  Yes  */
          54  25 ,  /* Women  Yes  No   */
          36   4 ,  /* Women  No   Yes  */
         214 322 ,  /* Women  No   No   */
          28  11 ,  /* Men    Yes  Yes  */
          60  42 ,  /* Men    Yes  No   */
          17   4 ,  /* Men    No   Yes  */
          68 130 }; /* Men    No   No   */
   dim = { 2 2 2 2 };
   vnames = {'Marital' 'Extra' 'Pre' 'Gender'};
   lnames = {'Divorced'   'Married',
             'Extra Sex: Yes' 'No',
             'Pre Sex: Yes'   'No',
             'Women    '      'Men' };
   title  = 'Pre/Extramarital Sex and Marital Status';
   finish;

start mental;
        data = 'mental';
        title='Mental impariment and parents SES';
        source='Haberman, 1979 [p.375], from Srole etal,(1978) p.289';
        * also, Agresti:90, Lindsey p.99;       
        table = {
                64   94   58   46
                57   94   54   40
                57  105   65   60
                72  141   77   94
                36   97   54   78
                21   71   54   71
                };
   dim = { 4 6 };
   vnames= {"Mental Impairment" "Parents SES"};
   lnames= {"Well" "Mild" "Moderate" "Impaired" " " " ",
            "High" "2" "3" "4" 5 "Low"};
        finish;

start mobility;
   data='mobility';
        source='FeathermanHauser:78';
   dim = { 5 5 };
   /* Social Mobility data, Featherman & Hauser, 78 ,
            analyzed by Falguerolles & Mathieu, COMPSTAT 88 */
   /*    Sons occupation
         UNM   LNM    UM    LM  Farm */
   table = {
        1414   521   302   643    40,
         724   524   254   703    48,
         798   648   856  1676   108,
         756   914   771  3325   237,
         409   357   441  1611  1832 };

   title  = {'Social Mobility data'};
   vnames = {"Son's Occupation" "Father's Occupation"};
   lnames = { 'UpNonMan' 'LoNonMan' 'UpManual' 'LoManual' 'Farm',
            'UpNonMan' 'LoNonMan' 'UpManual' 'LoManual'  'Farm'};
   finish;

start abortion;
   data='abortion';
   title='Abortion opinion data';
   source='Christiensen (\S 3.5.2) ';  *-- page 92;

   dim = {2 2 2};
/* SES:      Low      NotLow  */
/* Opinion: Y   N     Y   N   */
   table ={171  79   138 112,      /* Female */
           152 148   167 133};     /* Male   */

   run marg(loc,newtab,dim,table,(3:1)`);
   table = shape(newtab,2,4);
   vnames={'Sex' 'Status' 'Support Abortion'};
   lnames={'Female' 'Male',
           'Lo'  'Hi',
           'Yes' 'No'};
   finish;

start suicide;
   data = 'suicide';
        source = 'Heuer:79,Friendly:94a';
   table = {
     512    25   852    64   875    52   477    29   229     3,  /* Gun 
*/
     335    40   883   113   625    91   201    45    45    29,  /* Gas 
*/
    1524   212  2751   575  3936  1481  3581  2014  2948  1355,  /* 
Hang */
    1160   921  2823  1672  2465  2224  1531  2283   938  1548,  /* 
Poison */
     189   131   366   276   244   327   273   388   268   383,  /* 
Jump */
      67    30   213   139   247   354   207   679   212   501   /* 
Drown */
            };
   dim = {2 5 6};

   meth= {'Gun' 'Gas' 'Hang' 'Poison' 'Jump' 'Drown'};
   age =  char(do(10,55,15),2,0)+shape({'-'},1,4)+
          char(do(20,65,15),2,0)
           || {'>65'  ' '   };
   sex = {'Male' 'Female' ' '  ' '  ' '  ' '  };

   vnames = {'Sex' 'Age' 'Method'};
   lnames = (sex // age // meth) ;
   title  = 'Suicide data';
   free meth age sex;
   finish;

start titanic;
   data='titanic';
        source='Dawson:95';
   title='Survival on the Titanic';
   dim = {4 2 2 2};
   vnames = {'Class' 'Sex' 'Age' 'Survived'};
   lnames = {'1st' '2nd' '3rd' 'Crew',
            'Male'  'Female' '' '',
            'Child' 'Adult' '' '',
            'Died'  'Survived'   '' ''};
   table = {
  /*
 CLASS1    CLASS2    CLASS3    CLASS4    SURVIVE     AGE      SEX
  */
    0         0        35         0,   /* No       Child    Male   */
    0         0        17         0,   /* No       Child    Female */
  118       154       387       670,   /* No       Adult    Male   */
    4        13        89         3,   /* No       Adult    Female */
    5        11        13         0,   /* Yes      Child    Male   */
    1        13        14         0,   /* Yes      Child    Female */
   57        14        75       192,   /* Yes      Adult    Male   */
  140        80        76        20    /* Yes      Adult    Female */
        };
   table = table[{3 4 1 2  7 8 5 6},];
   lnames[3,] = {'Adult' 'Child' '' ''};
   finish;

start victims;
        data='victims';
        source = 'Reiss:80,Fienberg:80 (Table 2-8)';
   crime = {'Rape' 'Assault' 'Robbery' 'PickPock' 'Pers.Larceny'
            'Burglary' 'Hous.Larceny' 'Auto Theft'};
   dim = {8 8};
   vnames = {'First Victimization' 'Second Victimization'};
   lnames = crime // crime ;
   title  = 'Repeat Victimization Data';
   table = {  26   50  11   6    82   39   48   11,
              65 2997 238  85  2553 1083 1349  216,
              12  279 197  36   459  197  221   47,
               3  102  40  61   243  115  101   38,
              75 2628 413 329 12137 2658 3689  687,
              52 1117 191 102  2649 3210 1973  301,
              42 1251 206 117  3757 1962 4646  391,
               3  221  51  24   678  301  367  269}`;
   free crime;
   finish;

/*  transform a vector of character strings into a matrix, whose
    number of rows is ncol(dim) and number of columns is max(dim).
         Useful for creating lnames.
        e.g.,
                ln = {a1 a2 a3 b1 b2 b3 b4 c1 c2};
                lname = vec2mat({3 4 2}, ln);
*/
start vec2mat(dim, vec);
   r = ncol(dim);
   c = max(dim);
   len = max(length(vec));
   blank='                    ';
   mat = j(r, c, substr(blank, 1, len));
   start = 1;
   do i = 1 to r;
      l = dim[i];
      mat[i, 1:l] = shape(vec[start:start+l-1],1);
      start = start+l;
      end;
   return(mat);
   finish;

/*
*-- Stack two matrices, filling out columns of the smaller with 0s or ' 
's;
start ontop(mat1, mat2);
   if type(mat1) = 'N'
      then fill=0;
      else fill=' ';
   nc1 = ncol(mat1);
   nc2 = ncol(mat2);
   if ncol(mat1) = ncol(mat2)
      then return (mat1 // mat2);
   else if nc1 < nc2 then do;
      result = (mat1 || j( nrow(mat1), nc2-nc1, fill)) // mat2;
      end;
   else do;
      result = mat1 // (mat2 || j( nrow(mat2), nc1-nc2, fill));
      end;
   return(result);
   finish;
*/

start datalist(datasets) global(data, title, source, dim, vnames, 
lnames);
   file print;

   put 'dataname' @12 'title / dim / vnames' /;
   do i = 1 to ncol(datasets);
                title = '????';
      call execute('run ', datasets[i], ';');
                namelist = namelist // data;
                titles   = titles // title;
      d = rowcat(char(dim,1+length(vnames[1])));
      v = rowcat(vnames+' ');
      put data @12 title / @12 'dim: ' d / @12 'vn:    ' v;
      put;
      end;
                
   finish;


   datasets = {
    abortion bartlett  berkeley  cancer  cesarean  detergen  dyke
         employ  gilby haireye  heart heckman hoyt  marital mental 
mobility
         suicide titanic victims};

   reset storage=mosaic.mosdata;
   store module=_all_ datasets;
   show storage;

        *-- Make sure all are mentioned in datasets;
   stored = storage();
   util = {vec2mat, datalist, datasets};
   dif = t(setdif(stored, datasets));
   dif = setdif(dif, util);
   if type(dif)^='U' then print stored dif;

   run datalist(datasets);
quit;         ;

/*
   Name:   mospart.sas
   Title:  Mosaics plots for partial association


        The mospart module produces a series of mosaics plots for partial
        association, that is separate plots for each level of one or more
        by variables.

        Input parameters are the same as for mosaic, except that:
        
        byvar - specifies the variables (names or numbers) which are used 
to
                stratify the data.  One mosaic is produced for each 
combination
                of the levels of the byvars.  These may be composed into a 
single
                graphic using the %panels macro after the SAS/IML step.

        plots - is a global variable rather than an input parameter here.
                If not specified, plots = the number of variables not given 
as
                byvariables.

*/
start mospart(dim, table, vnames, lnames, title, byvar)
      global(config, devtype, fittype, filltype, shade, space, split, 
plots,
             htext, verbose, font, cellfill, vlabels, fuzz, sep);
   factors = max(nrow(dim), ncol(dim));

        if type(byvar) = 'C'
                then byvar = name2num(row(byvar), row(vnames));
        if all(byvar=0) then do;
                print 'Error: BYVAR out of bounds in MOSPART';
                show vnames byvar; print byvar;
                return;
                end;
        others = remove( (1:factors), (byvar) );

        bydim = (shape(dim[byvar],1))`;
        byvn  = (shape(vnames[byvar],1))`;
        nby   = nrow(bydim);        *-- number of by variables;
        modim = dim[others];        *-- dimensions for each mosaic;
        rows  = modim[#];
        cols  = bydim[#];           *-- number of mosaic displays;
        * print byvar byvn bydim modim rows;
        cl = lnames[byvar,];
                
        if type(plots) ^= 'N'   then plots = ncol(others);
        if type(htext) ^= 'N'   then htext = cols;
        if type(vlabels) ^= 'N' then vlabels=0;
        if type(sep) ^= 'C' then sep=', ';
                
        * transpos table to put byvars first;
        order = shape(byvar,1)||others;
        dm = dim;
        vn = vnames;
        ln = lnames;
        tab = table;
        run transpos(dm, tab, vn, ln, order);
        * print dm vn ln tab;
        tab = shape(tab,rows);
        
        *-- construct labels for byvars;
        do i=1 to nby;
      cur = ln[i,]`;
      if i=1 then cl=cur;
         else do;        /* construct row labels for prior factors */
                                sp = sep[,min(i-1, ncol(sep))];
            ol = repeat(cl, 1, dm[i]);
            ol = shape(ol,  dm[i]#nrow(cl), 1);
            nl = repeat( (cur[1:(dm[i])]), nrow(cl));
            cl = trim(rowcatc(ol || shape(sp,nrow(ol),1) ||nl));
         end;
                end;
        * print 'transposed, reshaped tab' tab[c=cl];
        print 'Mosaic plots for levels of' byvn, cl;
        
        pn = vn[1:nby];
        dm = dm[(nby+1):factors];
        vn = vn[(nby+1):factors];
        ln = ln[(nby+1):factors,];

   call gstart;
        do ip = 1 to cols;
                bylev = cl[ip];
                if nby=1
                        then titl = trim(pn[1]) + ': ' + bylev;
                        else titl = trim(cl[ip]);
                ptab = tab[,ip];
*               print 'Slab' ip byvn bylev vn ln; * ptab;
      run mosaic(dm, ptab, vn, ln, plots, titl);
        end;
        finish;


*-- Scatterplot matrix of Mosaic displays for pairwise association;
/*
   Name:   mosmat.sas
   Title:  Scatterplot matrix of mosaic displays

        Input parameters are the same as for mosaic, except that vnames
        may contain two rows -- 
        vnames[1,] -- long names (split on '/'), used to label diagonal
                panels
        vnames[2,] -- short variable names used in pairwise mosaics

   Set 
        fittype='PARTIAL'; plots=3;     for partial/conditional association
        fittype=anthing;   plots=2              for marginal association
*/
        
start mosmat(dim, table, vnames, lnames, plots, title)
      global(config, devtype, fittype, filltype, shade, space, split,
             htext, verbose, font);
   factors = max(nrow(dim), ncol(dim));
        if ncol(vnames) = 1 then vnames = vnames`;
   if nrow(vnames) = 2 & ncol(vnames)=factors then
      do;
         vnlong = vnames[1,];    *-- long names (diag panels);
         vnames = vnames[2,];    *-- short names (mosaics);             
      end;
      else do;
         vnlong = vnames;
         vnames = vnames;
      end;
        if type(htext) ^= 'N' then htext=factors;
        if type(split) ^= 'C' then split={V H};
        if type(font)  ^= 'C' then font='hwpsl009';
        
   call gstart;
   ip = 0;  ig=0;      *-- panel and graph numbers;
   replay = '';        *-- greplay list;
   do row = 1 to factors;
      do col = 1 to factors;
         ip = ip+1;
         ig = ig+1;
         if row = col then do;
            run vpanel(vnlong[col]);
            end;
         else do;
            others = remove( (1:factors), (row||col) );
            * transpose table to conform to model;
            order = col||row||others;
            dm = dim;
            vn = vnames;
            ln = lnames;
            tab = table;
                                if type(config)='N' then do;
                                        run modname(config,vn,model);
                                        print row col model config;
                                end;
            run transpos(dm, tab, vn, ln, order);
            ord = rowcatc(vn`);
            titl = title;
            run mosaic(dm, tab, vn, ln, plots, titl);
            end;
         *-- Construct replay list: graph ig in panel ip;
         replay = replay + trim(char(ip,2))+':'+trim(left(char(ig)))+' 
';
         end; /* do row */
      end; /* do col */
 
      nvar = '%let nvar =' + char(factors,2) + ';' ;
      replay = '%let replay =' + replay + ';' ;
      call execute(replay);
      call execute(nvar);
 
finish;

*-- Draw a panel for a variable name; 
start vpanel(name)
        global(font, htext);
   *-- Draw a panel with the variable name;
   call gstart;
   call gopen;
   window = {0 0 100 100};
   call gwindow(window);
   call ggrid( {5  95}, {5  95});
   run split(name, '/', lines);
        nlines = nrow(lines);

   ht=3#htext ;
        *-- Find length of all lines, reduce ht if necessary;
   call gstrlen(len,lines, ht, font);
*       print  lines len;
        if max(len) > 90 
                then ht = round(ht # 90/max(len),.1);
        
   do l = 1 to nlines;
      line = lines[l,];
      call gstrlen(len,line, ht, font);
      x = 50 - len/2;
      y = 50 - ht/2 + 1.5#(ht+1)#(nlines-l);
      call gscript(x,y, line,,,ht,font);
        *       print 'vpanel:' l x y line len ht;
      end;
finish;

start split(in, char, out);
   *-- split a string into separate strings at each occurrence of 
'char';
   free out;
   i=1;
   sub = scan(in,i,char);
   do while(sub ^=' ');
      out = out // sub;
      i = i+1;
      sub = scan(in,i,char);
   end;
finish;


/*
   Name:  mosademo.sas
  Title:  Demonstration program for MOSAICS
   Requires that MOSAICM.SAS be run first to install modules
    in libname 'mosaic'.  See the User's Guide.
*/
goptions vsize=7in hsize=7in;         /* make the plot square */

 
proc iml;
start haireye;
   *-- Hair color, eye color data;
 table = {
 /* ----brown---   -----blue-----   ----hazel---   ---green--- */
   32  38  10  3   11  50  10  30   10  25  7  5   3  15  7  8,   /* M 
*/
   36  81  16  4    9  34   7  64    5  29  7  5   2  14  7  8 }; /* F 
*/
 
levels= { 4 4 2 };
vnames = {'Hair' 'Eye' 'Sex' };    /* Variable names */
lnames = {                         /* Category names */
         black brown red blond,        /* hair color */
         brown blue hazel green,       /* eye color  */
         male female ' '  ' ' };       /* sex        */
title  = 'Hair color - Eye color data';
finish;
 
run haireye;

   reset storage=mosaic.mosaic;
   load module=_all_;

   *-- Fit models of joint independence (fittype='JOINT');
   plots = 2:3;
   split={V H};
   title = ' ';
   htext=1.42;
   run mosaic(levels, table, vnames, lnames, plots, title);
 
   *-- reorder eye colors (brown, hazel, green, blue);
   table  = table[,((1:4) || (9:16) || (5:8))];
   lnames[2,] = lnames[2,{1 3 4 2}];
   plots=2:3;
   run mosaic(levels, table, vnames, lnames, plots, title);
 
   plots=3;
   fittype='MUTUAL';
   run mosaic(levels, table, vnames, lnames, plots, title);
quit;

*include goptions;         *-- set goptions device=  etc;
goptions hsize=7 in vsize=7 in;
title 'Mosaics for specialized models fit externally to mosaics.sas';
 
proc iml;
  dim = { 4 4 };
  /* Unaided distant vision data Bishop etal p. 284*/
     /*    Left eye grade */
  group  = {' (women)'};
  f = {1520   266   124    66,
        234  1512   432    78,
        117   362  1772   205,
         36    82   179   492 };
  title  = {'Unaided distant vision: Independence'} + group;
  vnames = {'Right Eye','Left Eye'};
  lnames = { 'High' '2' '3' 'Low',
             'High' '2' '3' 'Low'};
   reset storage=mosaic.mosaic;
   load module=_all_;
        %include '~/sasuser/mosaics/mosaicd.sas';

   htext=1.5;
   plots={2};
        font='hwpsl009';
        colors={BLUE RED};
        filltype={HLS HLS};
        *-- Independence model;
   run mosaic(dim, f, vnames, lnames, plots, title);
 
        *-- test quasi independence (ignore diagonal);
   initab = j(4,4) - i(4);
   qf = f - diag(f);
   call ipf(fit, stat, dim,qf, {1 2}, initab);
   fit = fit + diag(f);
   dev = (f - fit)/sqrt(fit);
   chisq=chisq(f,fit);
   df = 9-4;
   title={'Quasi Independence Model'} + group;
   print / title , chisq[r={GF LR}] df, fit[f=8.3], dev[f=8.3];

   run mosaicd(dim, f, vnames, lnames, dev, title);
 
   *-- Symmetry model;
   title={'Symmetry Model'} + group;
   fit = (f + f`)/2;
   dev = (f - fit)/sqrt(fit);
   print title, dev;
   chisq=chisq(f,fit);
   df = .5#nrow(f)#(nrow(f)-1);
   print / title , chisq[r={GF LR}] df, fit[f=8.3], dev[f=8.3];
   run mosaicd(dim, f, vnames, lnames, dev, title);
 
        *-- Quasi-symmetry;
   call ipf(fit, stat, dim,qf, config);
   dev = (qf - fit)/sqrt(fit);
   dev = dev[1:4,];
   dev = dev - diag(dev);   *-- Rounding error on diagonal?? ;
   df = .5#(nrow(f)-1)#(nrow(f)-2);
   chisq=ssq(dev) ;
   prob = 1 - probchi(chisq,df);
   print / 'Quasi-Symmetry' config, fit[f=8.3], dev[f=8.3],
                            chisq[r={GF LR} f=8.4] df prob;
   run mosaicd(dim, f, vnames, lnames, dev, title);
 
quit;
%*gfinish;

%include goptions;

goptions vsize=7 in hsize=7 in;
proc iml;
 
start victims;
   crime = {'Rape' 'Assault' 'Robbery' 'PickPock' 'Pers.Larceny'
            'Burglary' 'Hous.Larceny' 'Auto'};
   levels = {8 8};
   vnames = {'First Victimization' 'Second Victimization'};
   lnames = crime // crime ;
   title  = 'Repeat Victimization Data';
   table = t({ 26   50  11   6    82   39   48   11,
              65 2997 238  85  2553 1083 1349  216,
              12  279 197  36   459  197  221   47,
               3  102  40  61   243  115  101   38,
              75 2628 413 329 12137 2658 3689  687,
              52 1117 191 102  2649 3210 1973  301,
              42 1251 206 117  3757 1962 4646  391,
               3  221  51  24   678  301  367  269});
        finish;
        run victims;

        *-- load mosaic modules;
   reset storage=mosaic.mosaic;
   load module=_all_;
 
        *-- select subset of rows/cols;
   keep = {1 2 3 6 8};
   table = table[keep,keep];
   lnames = lnames[,keep];
   levels = {5 5};

        *-- set mosaic global options;
   htext = 1.4;
        
   shade = {2 4 8};

   plots = {2};
   run mosaic(levels, table, vnames, lnames, plots, title);
 
        *-- rearrange rows/cols by CA dim1;
        keep = {2 3 1 5 4};
        table = table[keep,keep];
        lnames = lnames[,keep];
 
        *-- standardize table to equal margins;
   avg = table[,+] / levels[1];
   newtab = repeat(avg,1,5);
   config = {1 2};
   call ipf(adjusted, status, levels, newtab, config, table);
   title  = 'Repeat Victimization Data, Adjusted to Equal Margins';
   lab = crime[keep];
   print title, adjusted[r=lab c=lab f=8.2];
   plots = 2;
   run mosaic(levels, adjusted, vnames, lnames, plots, title);

        *-- fit quasi-independence (ignore diagonal cells);
   title  = 'Repeat Victimization Data, Quasi Independence';
        zeros = J(5,5) - I(5);
   run mosaic(levels, adjusted, vnames, lnames, plots, title);
quit;

%include goptions;
goptions vsize=7 hsize=7;
* Sex, Occupation and heart disease [Karger, 1980];
 
data heart;
   input gender $ occup $  @;
   heart='Disease';  input freq @;  output;
   heart='No Dis';   input freq @;  output;
cards;
Male   WhiteCol   158   3155
Female WhiteCol    52   3082
Male   BlueCol     87   2829
Female BlueCol     16    416
Male   Unempl     254    759
Female Unempl     431  10283
;
proc sort data=heart;
   by descending heart gender;

proc iml;
/*
   use heart;
   read all var{freq} into table;
   levels = { 2 3 2 };
   vnames = {'Gender' 'Occup' 'Heart' };
   lnames = {'Female'  'Male ' ' ',
             'BlueCol' 'Unempl'  'WhiteCol',
             'Disease' 'NoDisease'  ' ' };
*/
   title  = 'Sex, Occupation, and Heart Disease';
 
   reset storage=mosaic.mosaic;
   load module=_all_;

   vnames = {'Gender' 'Occup' 'Heart' };
   run readtab('heart', 'freq', vnames, table, levels, lnames);
 
   plots = 2:ncol(levels);
   run mosaic(levels, table, vnames, lnames, plots, title);
quit;

title 'Alcohol, Cigarette, and Marijuana Use by High School Seniors';
* Source: Agresti, 1996, p. 152;
data druguse;
        input alcohol $ cigaret $ @;
        marijuan = 'Mar:+';  input freq @;  output;
        marijuan = 'Mar:- '; input freq @;  output;
        
cards;
Alc:+  Cig:+   911   538
Alc:+  Cig:-    44   456
Alc:-  Cig:+     3    43
Alc:-  Cig:-     2   279
;
%include goptions;
goptions hsize=7in vsize=7in;
%mosaic(var=alcohol cigaret marijuan, count=freq, plots=2:3,
        fittype=condit,
        title=%str(Alcohol, Cigarette, and Marijuana Use));     

%mosaic(var=alcohol cigaret marijuan, count=freq, plots=2:3,
        fittype=user, config=alcohol cigaret/alcohol marijuan/cigaret 
marijuan,
        title=%str(&MODEL));    

/*
  Name:  ishi.sas
 Title:  Ethnicity, religiosity, and gender ideology
Source: Ishi-Kuntz, M. (1994) ``Ordinal log-linear models'', Sage, p.37
*/
options ls=80 ps=60 nocenter;
%include goptions;
goptions hsize=7in vsize=7in;
proc format ;
   value efmt 1='Hispanic'    2='White'    3='Black';
   value rfmt 1='Religious'   2='Moderate' 3='Non-rel.';
   value gfmt 1='Traditional' 2='Moderate' 3='Liberal';

data ishi ;
   input e  r  g  count @@;
   format e efmt. r rfmt. g gfmt.;

   label ethnicty = 'Ethnicity'
         relig    = 'Religiosity'
         gendidol = 'Gender ideology';
   ethnicty = put(e, efmt.);
   relig    = put(r, rfmt.);
   gendiol  = put(g, gfmt.);
cards;
3 1 1  58    3 1 2  45    3 1 3  49
3 2 1  11    3 2 2  17    3 2 3  21
3 3 1   3    3 3 2   4    3 3 3   7
1 1 1  83    1 1 2  24    1 1 3   8
1 2 1  16    1 2 2  17    1 2 3  13
1 3 1   7    1 3 2   6    1 3 3   2
2 1 1 317    2 1 2 242    2 1 3 145
2 2 1 105    2 2 2 157    2 2 3 148
2 3 1  41    2 3 2 109    2 3 3 150
;
/*
proc freq order=formatted ;
     weight count ;
     tables gendiol*(relig ethnicty) / noprint chisq measures cmh;
     tables relig*ethnicty / noprint chisq measures cmh;
     tables ethnicty*gendiol*relig / chisq measures cmh;
run ;
*/

%mosaic(data=ishi, var=Ethnicty Relig Gendiol,
        sort=g r descending ethnicty, plots=2 3, htext=1.7,
        title=%str(Gender Ideology, Religiosity and Ethnicity
INDEX   
druguse.sas     Alcohol, Cigarette, and Marijuana Use by High School 
Seniors
ishi.sas        Ethnicity, religiosity, and gender ideology
karger.sas      Sex, Occupation, and Heart Disease
mosademo.sas    Demonstration program for MOSAICS
mosaicd.sas     Mosaic displays for externally-fitted models
mosaicm.sas     Install mosaic modules
mosaics.sas     IML modules for general n-way mosaic display
mosdata.sas     Assorted contingency table data sets for mosaic displays
moseye.sas      Mosaics for specialized models fit externally to 
mosaics.sas
mosmat.sas      Scatterplot matrix of mosaic displays
mospart.sas     Mosaics plots for partial association
victims.sas     Repeat Victimization Data

Installing MOSAICS

mosaics.sas consists of a collection of SAS/IML modules which are
designed to be called from another program in a PROC IML step (or
via the MOSAIC or MOSMAT macros).  Because the program is large,
the modules are most conveniently stored in compiled form in a
SAS/IML storage catalog, called MOSAIC.MOSAIC.  To install the
program in this way,

1.  Copy the files mosaics.sas and mosaicm.sas to a directory, 
(~/sasuser/mosaics/, or c:\sasuser\mosaics\, say),

2.  Edit the LIBNAME and FILENAME statements to
correspond to this directory,

  *-- Change the path in the following filename statement to point to
      the installed location of mosaics.sas;
  filename mosaics  '~/sasuser/mosaics/';

  *--- Change the path in the libname to point to where the compiled
     modules will be stored, ordinarily the same directory;
  libname  mosaic   '~/sasuser/mosaics/';

3. You may wish to change some of the program default values, (in
the module globals in mosaics.sas) particularly the font= value.
As of V3.5, this is set to font='SWISS', unless the current graphics
device (&SYSDEVIC) is one of the Postscript drivers (e.g., PSCOLOR,
PSMONO, PSLEPS), in which case the program uses the hardware
Helvetica font (font='hwpsl009') because the resulting output
graphic files are much smaller and can be potentially edited.

4. To store the modules in compiled form, run the mosaicm.sas
program, with the command,

sas mosaicm

5 Optionally, install the sample data sets by running mosdata.sas



Further details are givem in Sect 2.2 of doc/mosaics.pdf.

/*
 Name: TESTMOS.SAS
 Title: Test stream for VCD mosaics programs
 Generator: ls -1 *.sas | tcgrep -v 'mosaic|mosmat|mospart' | perl -pe 
's/(\w+)\.sas\@?/%include mosaics($1);/' > TESTMOS.SAS
*/

/* The following FILENAME and LIBNAME should point to the directory
    where the mosaic programs were installed;
filename mosaics  '~/sasuser/mosaics/';
libname  mosaic   '~/sasuser/mosaics/';
*/

%include mosaics(druguse);
%include mosaics(ishi);
%include mosaics(karger);
%include mosaics(mosademo);
%include mosaics(mosdata);
%include mosaics(moseye);
%include mosaics(victims);