lags |
Macro for lag sequential analysis |
lags |
Visualizing Categorical Data: lags
$Version: 1.2 (2 Feb 2001)
Michael Friendly
York University
Macro for lag sequential analysis
Given a variable containing event codes (char or numeric), the LAGS macro
creates:
-
a dataset containing n+1 lagged variables, _lag0 - _lagN (_lag0 is just a
copy of the input event variable)
-
optionally, an (n+1)-way contingency table containing frequencies of all
combinations of events at lag0 -- lagN
Either or both of these datasets may be used for subsequent analysis of
sequential dependencies. One or more BY= variables may be specified, in which case separate lags and frequencies are
produced for each value of the BY variables.
A WEIGHT= variable may also be specified,
giving frequencies weighted by that variable. For example, using the
duration of an event as a weight gives 'frequencies' which represent
the total number of time units for state sequential data.
One event variable must be specified with the VAR= option. All other options have default values. If one or more BY= variables are specified, lags and frequencies are calculated separately for
each combination of values of the BY= variable(s).
The arguments may be listed within parentheses in any order, separated by
commas. For example:
%lags(data=codes, var=event, nlag=2)
- DATA=
-
The name of the SAS dataset to be lagged. If DATA= is not specified, the most recently created data set is used.
- VAR=
-
The name of the event variable to be lagged. The variable may be either
character or numeric.
- BY=
-
The name of one or more BY variables. Lags will be restarted for each level
of the BY variable(s). The BY variables may be character or
numeric.
- WEIGHT=
-
Specifies a numeric variable whose value represents the
frequency or weight of an observation. The weight values
must be non-negative, but need not be integers.
- VARFMT=
-
An optional format for the event VAR= variable. If the codes are numeric, and a format specifying what each
number means is used (e.g., 1='Active' 2='Passive'), the output lag
variables will be given the character values.
- NLAG=
-
Number of lags to compute. Default = 1.
- OUTLAG=
-
Name of the output dataset containing the lagged variables. This dataset
contains the original variables plus the lagged variables, named according
to the PREFIX= option.
- PREFIX=
-
Prefix for the name of the created lag variables. The default is
PREFIX=_LAG
, so the variables created are named _LAG1, _LAG2, ..., up to
_LAG&nlag. For convenience, a copy of the event variable is created as
_LAG0.
- FREQOPT=
-
Options for the TABLES statement used in PROC FREQ for the frequencies of
each of lag1-lagN vs lag0 (the event variable). The default is
FREQOPT= NOROW
NOCOL NOPERCENT CHISQ.
Arguments pertaining to the n-way frequency table:
- OUTFREQ=
-
Name of the output dataset containing the n-way frequency table. The table
is not produced if this argument is not specified.
- COMPLETE=
-
NO, or ALL specifies whether the n-way frequency table is to be made
'complete', by filling in 0 frequencies for lag combinations which do not
occur in the data.
Assume a series of 16 events have been coded with the 3 codes, a, b, c, for
2 subjects as follows:
Sub1: c a a b a c a c b b a b a a b c
Sub2: c c b b a c a c c a c b c b c c
and these have been entered as the 2 variables SEQ (subject) and CODE in
the dataset CODES:
SEQ CODE
1 c
1 a
1 a
1 b
....
2 c
2 c
2 b
2 b
....
Then the macro call:
%lags(data=codes, var=code, by=seq, outfreq=freq);
produces the lags dataset _lags_ for NLAG=1
that looks like this:
SEQ CODE _LAG0 _LAG1
1 c c
a a c
a a a
b b a
a a b
....
2 c c
c c c
b b c
b b b
a a b
....
The output 2-way frequency table (outfreq=freq) looks liks this:
SEQ _LAG0 _LAG1 COUNT
1 a a 2
b a 3
c a 2
a b 3
b b 1
c b 1
a c 2
b c 1
c c 0
2 a a 0
b a 0
c a 3
a b 1
b b 1
c b 2
a c 2
b c 3
c c 3
See also
meanplot
panels
scatmat
stat2dat
Previous: label Next: logodds Up: Appendix A Top: index.html