# Introduction to SAS/IML

Michael Friendly

This is an outline developed for a 'short course' on SAS/IML developed by Walter Davis for the Institute for Research in Social Science at the University of North Carolina - Chapel Hill. The outline itself is based in part on one put together by Tim Dorcey while he was at the Purdue University Computation Center. I have modified it further. Please direct all comments, suggestions and other correspondence on this version to Michael Friendly, <friendly@yorku.ca>.

# Introduction

• IML is a matrix language - similar to Gauss, APL, and MATLAB
• built-in operators and functions for most standard matrix operations
• you can define your own modules (subroutines) and functions.
• But IML has only 2-dimensional matrices, not multi-dimensional ones.
• expectations of users
• assume have basic SAS knowledge
• will not cover matrix algebra, so assume your expertise
• when to use IML
• for programming statistical procedures that SAS does not have (including iterative procedures)
• for doing matrix operations
• for doing operations on rows and columns of a data table.
• IML allows you to construct (novel) graphics which could not be created with SAS/GRAPH.
• when not to use IML
• in general, if 'regular' SAS can do it, don't use IML.
• IML can be used for data management and graphics, but regular SAS data step and SAS/GRAPH are often more high-level, and therefore easier to use for many graphs.
• SAS macro facility can provide some of the programming capabilities of IML.

# Using SAS/IML

SAS/IML is a SAS procedure, so you start with a proc iml statement and end with a quit; statement. You can use SAS/IML either interactively, where statements are executed immediately, or noninteractively. For interactive use, it is convenient to use the reset log print; statement, which causes IML to display results in the log file, together with your input statements; the print option causes IML to display the result of each assignment statement. In this document, this is shown with the prompt character, >
```    proc iml;
>  reset log print;
>    x = 12.3;
X             1 row       1 col     (numeric)
12.3

>   quit;
Exiting IML
```

# Defining and indexing matrices

• IML supports both character and numeric matrices, vectors and scalars. Numeric and character information cannot be combined in one matrix, however, character vectors can be used for row and column labels of a matrix.
• scalars
• Numeric values can be used as is, or surrounded by braces or squiggly brackets, { }. Character values must be within single or double quotes. The following are valid:
``` >    x = 12.3;
X             1 row       1 col     (numeric)
12.3

>    y = {57};
Y             1 row       1 col     (numeric)
57
>    name = 'Bob';
NAME          1 row       1 col     (character, size 3)
Bob
```
• if you use name=Bob;, rather than name='Bob'; IML will look for a matrix called bob.
• matrices (including vectors). To define a matrix, squiggly brackets {} must be used. (You can also use (| and |) as matrix delimiters. Commas are used to separate rows. Some examples:
``` >    a = { 2 4,
3 1};
A             2 rows      2 cols    (numeric)

2         4
3         1

>    b = { 4 5, 0 1};
B             2 rows      2 cols    (numeric)

4         5
0         1

aa={'a' 'b' 'c', 'd' 'e' 'f'};    /* a 2 x 3 char matrix */

b = { 1 2 3 4 5 };                /* row vector */
c = { 1, 2, 3, 4, 5};             /* column vector */
```

## IML functions and operators which create matrices

• index operator (:) - creates a row vector of consecutive integers or character strings. Use the transpose operation ( t() or the backquote character, `) to get a column vector.
``` >    index = 1:5;
INDEX         1 row       5 cols    (numeric)

1         2         3         4         5

>    col = 4:6`;
COL           3 rows      1 col     (numeric)
4
5
6

>    rindex= 5:1;
RINDEX        1 row       5 cols    (numeric)

5         4         3         2         1

>    vars  = 'XX1': 'XX7';
VARS          1 row       7 cols    (character, size 3)

XX1 XX2 XX3 XX4 XX5 XX6 XX7
```
• The do() function creates an arithmetic series with any increment.
``` >   series= do(12,72,12);
SERIES        1 row       6 cols    (numeric)

12        24        36        48        60        72
```
• identity matrix: I(size)
```     a=I(6);      /*  a 6x6 identity matrix */
```
• constant matrix: J(nrow,ncol,)
```     a=j(5,5,0);       /*  a 5x5 matrix of zeroes */
b=j(6,1);         /* a 6x1 column vector of 1's */
```
• diagonal matrices: DIAG(vector) or DIAG(matrix) and VECDIAG(matrix)
• DIAG(vector) - creates a square matrix with the elements of the vector along the main diagonal and zeroes elsewhere.
``` >    d     = diag( {1 2 4} );
D             3 rows      3 cols    (numeric)

1         0         0
0         2         0
0         0         4
```
• DIAG(matrix) - creates a square matrix which retains the main diagonal of the argument matrix, but places zeroes elsewhere.
``` >    d     = diag( {1 2, 3 4} );
D             2 rows      2 cols    (numeric)

1         0
0         4
```
• VECDIAG (matrix) - returns a column vector which equals the main diagonal of the argument matrix.
• Transpose: t(mat) or mat` returns a matrix with rows and columns interchanged.

## Simple matrix operations

• Elementwise sum (+) and difference (-):
```  a = { 2 4,
3 1};
b = { 4 5,
0 1};
sum   = a + b;
SUM           2 rows      2 cols    (numeric)

6         9
3         2

diff  = a - b;
DIFF          2 rows      2 cols    (numeric)

-2        -1
3         0
```
• Elementwise product ( #) and matrix product ( *)
```  times = a # b;
TIMES         2 rows      2 cols    (numeric)

8        20
0         1

prod  = a * b;
PROD          2 rows      2 cols    (numeric)

8        14
12        16
```

## Indexing matrices

Two sets of symbols are used to refer to the indices (or subscripts) of a matrix. These are: [ ] or (| |). Unfortunately, some terminal emulations (most notably 3270) do not support the [ ] characters, so (| |) must be used. This document will use the [ ] convention. Matrices require double subscripts (e.g. X[1,3] while vectors need only one subscript (e.g. V).
• Both scalars and vectors may be used as arguments for indexing. Some examples:
```  a[1,2]=0;      /* changes element [1,2] to zero              */
a[1,]=0;       /* changes first row values to 0              */
a[1,1:3]=0;    /* 1:3 creates a vector of values from 1 to 3.
So this changes values of the first row,
in columns 1-3 to zero                     */
ind={2 3};
b=a[ind,ind];  /* set b equal to rows 2,3 and cols 2,3 of a  */
```
• element operations may also be used as the index of a matrix to perform reduction operations:
```  b=a[+,];       /* b is the column sums of a                  */
b=a[##,];      /* b is the column sums of squares of a       */
b=a[,:];       /* when used alone as an index, the : operator
gives the mean.  So b will be a vector
containing the mean of each row            */
```

Note: IML functions will often perform the same task more quickly and efficiently than using index operators as above. For example:

```  b=a[+,+];
c=sum(a);      /* b and c will have the same value, but
the sum function will be more efficient.*/
```
• the LOC function is often very useful for subsetting vectors and matrices. This function is used for locating elements which meet a given condition. The positions of the elements are returned in row-major order. For vectors, this is simply the position of the element. For matrices, some manipulation is often required in order to use the result of the LOC function as an index. The syntax of the function is:
```  matrix2=LOC(matrix1=value);
```
For example:
```  a={1 . 1 1, 2 2 2 2, 3 3 3 3, 4 4 4 4};
notmiss=loc( a[,2] ^= .);
/* notmiss will equal the location (rows) in which
which the second column of a is not missing */
newa=a[notmiss,];
/* newa contains rows of a with no missing elements
elements in the second column.
So, newa = {2 2 2 2, 3 3 3 3, 4 4 4 4};
```
Even more efficient (but more confusing to follow) is to bypass the intermediate step of creating the vector notmiss:
```     newa = a[ loc(a[,2]^=.), ];
```

## Missing values

IML accepts missing values in numeric matrices, but doesn't handle them particularly well.
• element operations - for operations which work on elements (e.g. addition, element multiplication, minimum/maximum, etc.), a missing value will be assigned to each element for which a missing value is used. For example:
```a={1 . 1 1, 2 2 2 2, 3 3 3 3, 4 4 4 4};
b=a+a;
/* b equals  {2 . 2 2,
4 4 4 4,
6 6 6 6,
8 8 8 8};  */
c=a#a;
/* c equals  {1 . 1 1,
4 4 4 4,
9 9 9 9,
16 16 16 16} */
```
• matrix algebra - for operations which are uniquely matrix operations (e.g. multiplication, inversion, etc.), IML will not accept matrices with missing values.
• IML functions - some IML functions will ignore missing values (e.g. SUM), while others will treat them as missing (e.g. INV). Unfortunately, there is little in the manual on this.

## Working with matrices and vectors

• Define a matrix representing number of cups of coffee drunk by three staff members on each day of the week. Define character vectors for row and column labels.
```     coffee = { 4 2 2 3 2,
3 3 1 2 2,
2 1 0 4 5 };
COFFEE        3 rows      5 cols    (numeric)

4         2         2         3         2
3         3         1         2         2
2         1         0         4         5

days   = { Mon Tue Wed Thu Fri };
DAYS          1 row       5 cols    (character, size 3)

MON TUE WED THU FRI

names  = { 'Lenny', 'Linda', 'Sue'};
NAMES         3 rows      1 col     (character, size 5)

Lenny
Linda
Sue
```
• Print the matrix using row and column labels. Note that rowname and colname can be abbreviated.
```     print coffee[r=names c=days];

COFFEE       MON       TUE       WED       THU       FRI

Lenny          4         2         2         3         2
Linda          3         3         1         2         2
Sue            2         1         0         4         5
```
• Calculate daily and weekly cost at \$.50/cup
```        daycost = .50 # coffee;
DAYCOST       3 rows      5 cols    (numeric)

2         1         1       1.5         1
1.5       1.5       0.5         1         1
1       0.5         0         2       2.5

ones = j(5,1);
weektot = daycost * ones;
WEEKTOT       3 rows      1 col     (numeric)

6.5
5.5
6

weektot = daycost[,+];
WEEKTOT       3 rows      1 col     (numeric)

6.5
5.5
6

daytot  = daycost[+,];
DAYTOT        1 row       5 cols    (numeric)

4.5         3       1.5       4.5       4.5

total   = daycost[+,+];
TOTAL         1 row       1 col     (numeric)

18
```
• Print a formatted table. Specify a format by [format= ] following the name of the matrix.
```        print coffee[r=names c=days] weektot[format=dollar7.2] ,
daytot[c=days f=dollar8.2] '  ' total[f=dollar7.2];

COFFEE       MON       TUE       WED       THU       FRI WEEKTOT
Lenny          4         2         2         3         2   \$6.50
Linda          3         3         1         2         2   \$5.50
Sue            2         1         0         4         5   \$6.00

DAYTOT      MON      TUE      WED      THU      FRI      TOTAL
\$4.50    \$3.00    \$1.50    \$4.50    \$4.50     \$18.00
```

# Popular IML Operators

- see handout from IML Quick Reference.

# Reading and Creating SAS datasets in IML

In some situations, you need to input data from a SAS dataset to SAS/IML and/or create a SAS dataset from an IML matrix. For input, use the use and read statements, as in
```  use psy303.fitness;
```
The rowname option reads the variable name from the dataset, creating a character vector to be used as row labels.

For output to a SAS dataset, use the create and append statements, as in

```*-- Output results to data set out ;
xys =    yhat || res || weight;
cname = {"_YHAT_" "_RESID_" "_WEIGHT_" };
create out from xys [ colname=cname ];
append from xys;
```
This creates the SAS dataset, WORK.OUT, containing three variables, whose names are specified by the vector cname.

## IML input statement syntax summary

```     USE libref.dataset (dataset options);
```
```     EDIT libref.dataset (dataset options);
```

Note: IML can have only one input and one output dataset at a time. The EDIT statement will assign one dataset as the current input and current output dataset. The USE statement will assign the dataset as the current input dataset without changing the current output dataset.

```     READ <range> <VAR operand> <WHERE (expression)>
<INTO name <[rowname=variable colname=matrix]>>
```
• range : specifies the range of observations to be read from the dataset. Valid values for range are:
ALL
all observations. This is the usual case when you want to read all observations into a matrix.
CURRENT
current observation (default)
NEXT n
next observation or next n observations
AFTER
all observations after the current one
POINT operand
obs specified by number where operand may be:
point 5
number
point {2 5 10}
list of obs
point p
matrix containing observation numbers
point (p+1)
expression yielding obs numbers
• VAR operand : specifies variables to be read in. Default is all numeric variables. The operand may take the following values
_ALL_
for all variables (must be same type)
_NUM_
all numeric variables (default)
_CHAR_
all character variables
{var1 var2..}
list of variable names
matrix
character matrix containing names of desired variables.
• WHERE(expression) : conditional selection of observations. Expression is a logical condition which is evaluated as either true or false. For example:
```     WHERE (var1^=.)
```

Note: the WHERE clause does not 'override' the range specification. If range is not specified (default is current), WHERE clause will only evaluate current observation.

• INTO matrix "rowname=varname colname=matrix" : names the target matrix for the data read in. Only one target matrix can be specified per READ statement. If the INTO clause is not used, the default is to create a single vector for every variable read in and give that vector the name of the variable. The rowname option allows you to permanently assign a CHARACTER variable in the incoming dataset as a vector of rownames to be used whenever the target matrix is printed. The colname option creates a character vector containing the names of the variables read in and also permanently assigns that colname to the target matrix.

• Read all numeric variables for all observations from the dataset psy303.fitness into a matrix named mat. Create a character vector of observation labels from the dataset variable name
```  use psy303.fitness;
```
```  use psy303.fitness;
read all into mat[rowname=name] where sex='M';
```
• Read the variables x1, x2 and x3 for all observations in the dataset work.data1 into the matrix X.
``` use data1;
read all var {x1 x2 x3} into x;
```
• Read every 10th observation up to 100 into a matrix X. A character vector id will be created from the variable id and will be permanently associated as the rowname for x. A character vector containing the names of all variables read into x will be called coln and will be permanently assigned as the colname for x.
``` keepobs=(1:10)#10;
READ point keepobs into x[rowname=id  colname=coln];
```

## IML output statements

• CREATE statement - creating new datasets from IML:
```CREATE libref.dataset <VAR operand>;
CREATE libref.dataset <FROM matrix <[r=vector1 c=vector2]>>;
```
• VAR operand : structures a dataset from the listed IML vectors (and only vectors may be used). The operand may take the same form as the operands used on the VAR clause in the READ statement. The dataset variables will be given the same name as the IML vectors.
• FROM matrix <[r=vector1 c=vector2]> : structures a dataset from the IML matrix named. Note that since an IML matrix can only be all numeric or all character that the dataset created will contain variables of only one type. The r(owname) option will create a single character variable from a character vector and the variable will have the same name as the vector. The c(olname) option will assign a name to the variables in the new dataset (other than that created by the r option). If c= is not specified, the variables will be called COL1, COL2, etc.

Note: The CREATE statement does not put data into the dataset but only defines the structure of the dataset. The VAR clause and the FROM clause are mutually exclusive.

• the APPEND statement - place data from IML vectors or matrices into the current output dataset:
```     APPEND <VAR operand>;    OR
APPEND <FROM matrix <[r=vector1]>>;
```
- the VAR clause and the FROM clause operate as they do in the CREATE statement. Note that the FROM clause does not have a c= option, since no data need actually be read from the colname vector. When the VAR clause has been used on the CREATE statement, it need not be specified on the APPEND statement.

Note: The VAR clause and the FROM clause are mutually exclusive. The APPEND statement will always output to the current output dataset, whether that dataset has been specified via the CREATE or the EDIT statement.

Note: It is possible to use external files (i.e. not SAS datasets) in IML. In terms of syntax, this is very similar to reading and writing these files in the SAS data step, but it is usually easier to do this in a SAS data step.

# Introduction to IML programming

IML has programming features like those of most other procedural languages. The main programming features are DO loops, IF-THEN/ELSE statements, program modules and function (assignment) modules.

This section focusses on IML programming features, namely iterative and conditional processing. IML programming can take place in 'open' code or within modules (compiled programs). If the program is only used once then open code is generally preferable. If the program is used often, whether in one session or across sessions, the module format is probably preferred. Modules may be stored permanently in compiled form.

## IF-THEN/ELSE statements : conditional processing

These take the same form in IML as they do in regular SAS:
```     IF expression THEN statement1;
ELSE IF expression THEN statement2;
```

Note: IML uses the symbol | for OR and the symbol & for AND. It will not accept the words as alternatives for logical operators as in the data step.

### Example

```  x=3;
if x=3 then print 'x=' x;
else if x=4 then print 'x is 4';
x=         3

x=4;
if x=3 then print 'x=' x;
else if x=4 then print 'x is 4';
x is 4

x=5;
if x=3 then print 'x=' x;
else if x=4 then print 'x is 4';
```

## DO loops : iterative processing

These also work similar to how they do in regular SAS. These may be specified as the statement executed when an IF condition is met (e.g. IF x=3 then DO;). Additionally, DO loops may be nested. Following are valid forms for DO loops:
```     DO variable = start TO stop
e.g.,       do i=1 to 100 by 10; ... end;
do j=1 to 10;        ... end;

DO WHILE (expression);
e.g.,       count=1;
do while (count<5);  ... end;

DO UNTIL (expression);
e.g.,       do until (count<5);  ... end;
```

Note: the DO WHILE loop is evaluated at the top, meaning that if count was 10 in this example, the loop would not execute. The DO UNTIL loop is evaluated at the bottom, meaning that it will always execute at least once. In the above example, if count equals 1 to start, the DO loop will still execute once even though count is less than 5 to start with.

### Example

```   reset name;
x=1;
do while (x<2);
print x;
x=x+1;
end;
X
1
x=3;

do while (x<2);  /* note this loop does not execute */
print x;
x=x+1;
end;
do until (x<4);
print '** do until loop executes although X is less than 4', x;
x=x-1;
end;
** do until loop executes although X is less than 4

X
3
```

## Modules

A module is a set of IML statements compiled as a single program. Program-type modules are activated using a RUN name statement. Function (or assignment) type modules return a value which is assigned to an IML matrix. Both types of modules accept arguments.
• Program modules The following syntax DEFINES a program module:
```     START module-name <(argument1, argument2,...)>;
IML statements;
FINISH;
```
To run a program module:
```          RUN module-name <(argument1, argument2, ...)>;
```

### Example

MODULE RMISS: delete rows of a matrix which contain missing values. Arguments are mat1 (original matrix), mat2 (target matrix) and miss (missing value indicator, . is the default. Miss must be specified in the argument list even if default is to be used.
```
start rmiss(mat1, mat2, miss);
if nrow(miss)=0 then miss={.};

/* positions of missing values in row-major order */

/* badrow will be rows with at least one msg value */
print keeprow;

/* 1:nrow(mat1) creates vector of values from 1 to
the number of rows of mat1.  Then badrow numbers
are removed from this vector */
mat2=mat1[keeprow,];
print mat2;
/* mat2 is subset of mat1 containing only rows with
no msg values */
finish;
```
The RMISS module is used as follows:
```   x={1 . 1 1, 2 2 2 2, 3 . 3 ., 4 4 4 4};
run rmiss(x,y,miss);
2        10        12
1         3         3
KEEPROW
2         4
MAT2
2         2         2         2
4         4         4         4
```
• Function (assignment) modules The purpose of such a module is to return a value (scalar, vector or matrix) and assign that value to the specified matrix. In general, it is best to specify arguments for this type of module. The syntax is very similar, except note that the RETURN statement is required for this type of module:
```     START module-name <(argument1, argument2, ...)>;
IML statements;
RETURN matrix;
FINISH;
```
To use an assignment module:
```     mat1=module-name <(argument1, argument2, ...)>;
```

### Example

LEN : A function module to return the vector length of each column of a matrix, X.
```   *-- Define a length function (LEN);
start len(X);
ssq = X[##,];
return (sqrt( ssq ));
finish;
```

Note: It is not possible to directly assign default values for module arguments. It seems to be completely impossible for function modules. For an example of how this can be done in a program module, see the RMISS module example.

• Differences between Program and Function modules There is really only one main difference. The purpose of a function model is to return one and only one value. Consequently, it contains a RETURN statement and cannot accept arguments which do not exist. Program modules can accept arguments which do not (yet) exist (see RMISS example) and create them.

### Global vs. Local Symbol Tables

IML statements outside a module have access to the global symbol table -- all matrices defined previously. When a module is defined without arguments, it also uses the global symbol table. Any matrices created or changed in the module will be created or changed in the global environment.
```  a=10; b=20; c=30;  /* A,B,C are all global */
start mod1;        /* module uses global table */
p=a+b;           /* p is global */
c=40;            /* c already global */
finish;
run mod1;
print a b c p;   /* note c changed to 40 */
A         B         C         P
10        20        40        30
```
When a module is defined with arguments, a local symbol table is created. This symbol table is temporary and is unique to the module. These modules will have access only to specified arguments from the global symbol table. If modules are nested, the local symbol table of the 'parent' module acts as the global symbol table for the called module. If matrix C exists in the local table and the global table, the global value of C will not be affected by operations on the local value of C (unless global C was specified as the argument corresponding to local C).
```   start mod2(a,b);    /* module with args creates local table */
p=2*(a+b);        /* p is local */
b=50;             /* b is local */
finish;
run mod2(a,c);
/* note that b (global) remains the same.  Since C (global)
is defined as b (local) and b is changed in the module,
C (global) is changed.  Note that p also remains the
same. */
print a b c p;
A         B         C         P
10        20        50        30
```

# Storage of IML modules and matrices

SAS/IML has the ability to store matrices and compiled modules in specially defined SAS catalogs. These are not SAS datasets. But similar to SAS datasets, a temporary storage catalog is referred to with a one-level name and a permanent one is referred to with a two-level name. SAS will provide a default temporary IML storage catalog.

IML storage catalogs are useful for saving large intermediate results for later use when memory is a concern. Also, these catalogs are necessary for having access to IML matrices and modules after an IML session is completed.

• RESET STORAGE=libref.catalog; This specifies the name of the currently open catalog (only one catalog may be open at a time). You can both store items to and load items from this catalog. SAS will provide a temporary catalog by default, but you would need to use this statement, for example, to change the open catalog to a permanent one.
• SHOW STORAGE; This will list all items stored in the current catalog.
• STORE ; This stores the named matrices and/or modules in the open catalog. Modules are stored in compiled form. The stored matrix or module still remains in the active workspace after storing. If you are using STORE to conserve memory, first STORE the matrix then FREE it. The following are examples of STORE statements:
```     STORE a b c;        /* stores matrices A,B and C*/
STORE module=mod1;  /* stores module MOD1 */
STORE module=(mod1 mod2);
/* stores modules MOD1 and
MOD2 */
STORE;              /* stores EVERYTHING */
```
• LOAD ; The opposite of STORE, this loads matrices and modules into the active workspace. The syntax for the arguments is the same as for STORE. The matrix or module will still be present in the catalog after it has been loaded.
• REMOVE ; Erases matrices and modules from the storage catalog. Arguments take the same form as the STORE statement.