Introduction to SAS/IML

Michael Friendly


This is an outline developed for a 'short course' on SAS/IML developed by Walter Davis for the Institute for Research in Social Science at the University of North Carolina - Chapel Hill. The outline itself is based in part on one put together by Tim Dorcey while he was at the Purdue University Computation Center. I have modified it further. Please direct all comments, suggestions and other correspondence on this version to Michael Friendly, <friendly@yorku.ca>.

Contents

  1. Introduction
  2. Using SAS/IML
  3. Defining and indexing matrices
  4. Reading and Creating SAS datasets in IML
  5. Introduction to IML programming
  6. Storage of IML modules and matrices

Introduction

Using SAS/IML

SAS/IML is a SAS procedure, so you start with a proc iml statement and end with a quit; statement. You can use SAS/IML either interactively, where statements are executed immediately, or noninteractively. For interactive use, it is convenient to use the reset log print; statement, which causes IML to display results in the log file, together with your input statements; the print option causes IML to display the result of each assignment statement. In this document, this is shown with the prompt character, >
    proc iml;
     IML ready
 >  reset log print;
 >    x = 12.3;
         X             1 row       1 col     (numeric)
                                12.3

>   quit;
    Exiting IML

Defining and indexing matrices

IML functions and operators which create matrices

Simple matrix operations

Indexing matrices

Two sets of symbols are used to refer to the indices (or subscripts) of a matrix. These are: [ ] or (| |). Unfortunately, some terminal emulations (most notably 3270) do not support the [ ] characters, so (| |) must be used. This document will use the [ ] convention. Matrices require double subscripts (e.g. X[1,3] while vectors need only one subscript (e.g. V[2]).

Missing values

IML accepts missing values in numeric matrices, but doesn't handle them particularly well.

Working with matrices and vectors

Popular IML Operators

- see handout from IML Quick Reference.

Reading and Creating SAS datasets in IML

In some situations, you need to input data from a SAS dataset to SAS/IML and/or create a SAS dataset from an IML matrix. For input, use the use and read statements, as in
  use psy303.fitness;
  read all into mat[rowname=name];
The rowname option reads the variable name from the dataset, creating a character vector to be used as row labels.

For output to a SAS dataset, use the create and append statements, as in

*-- Output results to data set out ;
  xys =    yhat || res || weight;
  cname = {"_YHAT_" "_RESID_" "_WEIGHT_" };
  create out from xys [ colname=cname ];
  append from xys;
This creates the SAS dataset, WORK.OUT, containing three variables, whose names are specified by the vector cname.

IML input statement syntax summary

Introduction to IML programming

IML has programming features like those of most other procedural languages. The main programming features are DO loops, IF-THEN/ELSE statements, program modules and function (assignment) modules.

This section focusses on IML programming features, namely iterative and conditional processing. IML programming can take place in 'open' code or within modules (compiled programs). If the program is only used once then open code is generally preferable. If the program is used often, whether in one session or across sessions, the module format is probably preferred. Modules may be stored permanently in compiled form.

IF-THEN/ELSE statements : conditional processing

These take the same form in IML as they do in regular SAS:
     IF expression THEN statement1;
     ELSE IF expression THEN statement2;

Note: IML uses the symbol | for OR and the symbol & for AND. It will not accept the words as alternatives for logical operators as in the data step.

Example

  x=3;
   if x=3 then print 'x=' x;
   else if x=4 then print 'x is 4';
   else print 'x is bad';
                           x=         3

  x=4;
   if x=3 then print 'x=' x;
   else if x=4 then print 'x is 4';
   else print 'x is bad';
                              x is 4

  x=5;
   if x=3 then print 'x=' x;
   else if x=4 then print 'x is 4';
   else print 'x is bad';
                                 x is bad

DO loops : iterative processing

These also work similar to how they do in regular SAS. These may be specified as the statement executed when an IF condition is met (e.g. IF x=3 then DO;). Additionally, DO loops may be nested. Following are valid forms for DO loops:
     DO variable = start TO stop 
e.g.,       do i=1 to 100 by 10; ... end;
            do j=1 to 10;        ... end;

     DO WHILE (expression);
e.g.,       count=1;
            do while (count<5);  ... end;

     DO UNTIL (expression);
e.g.,       do until (count<5);  ... end;

Note: the DO WHILE loop is evaluated at the top, meaning that if count was 10 in this example, the loop would not execute. The DO UNTIL loop is evaluated at the bottom, meaning that it will always execute at least once. In the above example, if count equals 1 to start, the DO loop will still execute once even though count is less than 5 to start with.

Example

   reset name;
   x=1;
   do while (x<2);
      print x;
      x=x+1;
      end;
                                    X
                                    1
   x=3;

   do while (x<2);  /* note this loop does not execute */
      print x;
      x=x+1;
    end;
    do until (x<4);
      print '** do until loop executes although X is less than 4', x;
      x=x-1;
    end;
        ** do until loop executes although X is less than 4

                                    X
                                    3

Modules

A module is a set of IML statements compiled as a single program. Program-type modules are activated using a RUN name statement. Function (or assignment) type modules return a value which is assigned to an IML matrix. Both types of modules accept arguments.

Global vs. Local Symbol Tables

IML statements outside a module have access to the global symbol table -- all matrices defined previously. When a module is defined without arguments, it also uses the global symbol table. Any matrices created or changed in the module will be created or changed in the global environment.
  a=10; b=20; c=30;  /* A,B,C are all global */
  start mod1;        /* module uses global table */
    p=a+b;           /* p is global */
    c=40;            /* c already global */
  finish;
  run mod1;
    print a b c p;   /* note c changed to 40 */
                  A         B         C         P
                 10        20        40        30
When a module is defined with arguments, a local symbol table is created. This symbol table is temporary and is unique to the module. These modules will have access only to specified arguments from the global symbol table. If modules are nested, the local symbol table of the 'parent' module acts as the global symbol table for the called module. If matrix C exists in the local table and the global table, the global value of C will not be affected by operations on the local value of C (unless global C was specified as the argument corresponding to local C).
   start mod2(a,b);    /* module with args creates local table */
     p=2*(a+b);        /* p is local */
     b=50;             /* b is local */
   finish;
   run mod2(a,c);
           /* note that b (global) remains the same.  Since C (global)
              is defined as b (local) and b is changed in the module,
              C (global) is changed.  Note that p also remains the
              same. */
   print a b c p;
                   A         B         C         P
                  10        20        50        30

Storage of IML modules and matrices

SAS/IML has the ability to store matrices and compiled modules in specially defined SAS catalogs. These are not SAS datasets. But similar to SAS datasets, a temporary storage catalog is referred to with a one-level name and a permanent one is referred to with a two-level name. SAS will provide a default temporary IML storage catalog.

IML storage catalogs are useful for saving large intermediate results for later use when memory is a concern. Also, these catalogs are necessary for having access to IML matrices and modules after an IML session is completed.