14 July 1993
APL2STAT USER INFORMATION
John Fox and Michael Friendly
Introduction
APL2STAT is an integrated set of APL2 programs (functions and
operators) for statistical analysis, with an emphasis on
statistical graphics. The programs use a simple object system and
employ a common procedure for accessing data. Because APL2STAT is
not a statistical package, but rather is built upon a general,
extensible, interactive programming language (i.e., APL2), it is
simple to modify and supplement.
In its current form, APL2STAT includes functions and operators
(among others) for jackknifing and bootstrapping any statistic; for
linear models estimated by least-squares with extensive
diagnostics; for linear regression analysis with autocorrelated
errors; for robust regression analysis; and for dichotomous and
polytomous logit models. Graphical capabilities include a variety
of scatterplots, boxplots, scatterplot matrices, partial regression
and residual plots, regression influence plots, Box-Cox and Box-
Tidwell transformation constructed-variable plots, lowess
scatterplot smoothing, and multivariate graphical displays. Points
on plots may be interactively identified, deleted, moved, and
highlighted using a pointing device (such as a mouse) or the cursor
keys.
APL2STAT is programmed for IBM's APL2 implementation for the
IBM/PC and compatibles, and is furnished in two forms: for the
freeware TRYAPL2 interpreter and for IBM's commercial APL2
interpreter. The latter environment (called simply "the APL2
interpreter" below) permits much larger workspaces on 386 and 486-
class machines. This document assumes some knowledge of the APL2
programming language (see, e.g., Gilman and Rose, 1984, or Brown,
Pakin, and Polivka, 1988), and of the specific implementation of
APL2 employed. As well, because information about all APL2STAT
user functions is available through the HOW function, this document
is brief. Note that APL symbols are rendered as the symbol name
preceeded by a '.', as in '.delta' for the APL delta character.
Installing and Customizing APL2STAT
APL2STAT is designed to be used with either TryAPL2 or
APL2/PC. If your computer has a hard disk, then it will be most
convenient to copy the APL2STAT files to the hard disk. The
distribution files assume that you will be installing these files
in the \TRYAPL2 or \APL2 directory on drive C: (for use with
TryAPL2 or APL2/PC, respectively). You should adjust the path
specifications below if you are installing the files in another
directory.
Installation for use with TryAPL2:
(1) Create the \TRYAPL2 directory if it does not already
exist:
C:\> md tryapl2
(2) Make the \TRYAPL2 directory the current directory:
C:\> cd tryapl2
(3) Insert the APL2STAT diskette in a drive (say, drive
A:), and copy the files to your hard disk:
C:\TRYAPL2> copy a:*.try
C:\TRYAPL2> copy a:*.exe
C:\TRYAPL2> copy a:apl2stat.200
C:\TRYAPL2> copy a:ps.drv
Installation for use with APL2/PC:
These instructions assume that you have already installed
APL2/PC in a directory named \APL2 on your hard disk,
drive C:. To install APL2STAT on your hard disk:
(1) Make the \APL2 directory the current directory:
C:\> cd apl2
(2) Insert the APL2STAT diskette in a drive (say, drive
A:), and copy the files to your hard disk:
C:\APL2> copy a:*.*
Alternatively, you may wish to keep the APL2STAT files in a
separate subdirectory, say \APL2\STAT. In this case, you
should modify your APL2 profile so that the APL2STAT
subdirectory may be referenced by a library number.
Customizing APL2STAT
APL2STAT searches for functions, operators, and data in
one or more component files whose DOS path names are
contained in the variable þPATH. Saving APL2STAT
graphics in Postscript files depends on the Postscript
driver, which resides in the file PS.DRV, the location of
which is recorded in the variable þPS_DRIVER_FILE. The
default path and driver file specifications are as
follows:
.deltaPATH 'C:\TRYAPL2\APL2STAT.200' 'C:\TRYAPL2\DATA.TRY'
.deltaPS_DRIVER_FILE 'C:\TRYAPL2\PS.DRV'
or
.deltaPATH 'C:\APL2\APL2STAT.200' 'C:\APL2\DATA.TRY'
.deltaPS_DRIVER_FILE 'C:\APL2\PS.DRV'
To change the default path and Postscript driver-file
location under the TryAPL2 version of APL2STAT:
Starting with a CLEAR workspace, )COPY the SHELL
workspace into TryAPL2. Change the values of .deltaPATH
and/or .deltaPS_DRIVER_FILE. (Note that the .deltaPATH must be a
nested vector or scalar. If there is just one file on
the path, then the character string specifying its path
should be enclosed as a nested scalar.) Then re-)SAVE
the SHELL workspace. When you next )LOAD SHELL, it
should automatically execute the SHELL function.
To change the default path and Postscript driver-file
location under the APL2 version of APL2STAT:
Using the )IN SHELL command, input the SHELL.ATF file.
Change the values of .deltaPATH and .deltaPS_DRIVER_FILE as
appropriate. Then )SAVE SHELL. When SHELL is
subsequently )LOADed, the SHELL function will
automatically be executed.
Several other aspects of APL2STAT may also be customized,
by specifying the values of variables beginning with the
character .delta in the SHELL workspace. These include colour
specifications for graphs (.deltaLINE_COLOUR, .deltaSYMBOL_COLOUR,
etc.), plotting symbols (.deltaSYMBOL, .deltaSYMBOLS), the memory
threshold that triggers the memory-management mechanism
(.deltaMEMORY_THRESHOLD) and some other characteristics. Do
not change the values of variables with names starting
with .delta-underscore.
Objects
APL2STAT employs a simple object system for organizing
datasets and the output of certain functions and operators (e.g.,
the results of fitting a regression model or of building a table of
frequencies or other statistics). All objects inherit properties
from other objects, with the exception of the root object,
OBJECT_PROTO.
New objects are created from prototypes using the function MAKE.
A newly created object has the same slots as its parent. With the
exception of the slots PARENT and TYPE, the slots of a newly
created object are empty. The function PUT serves to enter
information into the slots of an object, perhaps replacing previous
contents. If the slot named to PUT does not exist, then it is
created. The function GET serves to retrieve information stored in
a slot of an object. The function SLOTS lists an object's slots.
Some object-oriented functions (such as PRINT, which prints an
object in a suitable format on the screen) employ the METHOD
function to find a method appropriate to the object's type. If no
appropriate method exists, then METHOD examines the object's parent
and ancestors until a method is located or until the root object is
reached.
Data
Data values may be numeric scalars or character scalars (or
character strings enclosed as nested scalars). Many APL2STAT
functions handle only numeric data. (The CATEGORY function
converts character data to integer numeric codes.) Missing data
are indicated by the character '.'.
Data employed by APL2STAT functions may exist simply as
vectors and matrices in the active workspace or may be stored as
dataset objects. In the latter event, a user proceeds by defining
or retrieving the dataset to be analyzed into the active workspace
and employing the USE function. Under the APL2STAT SHELL
(described below), data may be automatically retrieved from files
along a search path.
USE creates a set of vectors in the active workspace, each
named according to the corresponding variable name in the dataset.
In addition, USE creates a selection vector (.deltaSELECT), an
observation-names vector (.deltaOBS_NAMES), and a missing-data vector
(.deltaMISSING). The selection and missing-data vectors, which each
have one entry for each observation in the dataset, are initialized
with ones.
APL2STAT functions apply the selection vector to their data
arguments to determine which observations should be used in a
computation. By directly manipulating the selection vector, the
user can perform data analysis on subsets of the data; observations
with missing data are screened out of a computation by setting
corresponding entries of the missing-data vector to zero. When
APL2STAT functions return data results, they apply the selection
and missing-data vectors in reverse to fill excluded observations
with missing data.
APL2STAT currently recognizes three data types: numeric data,
which may be treated as quantitative or categorical depending upon
context; character data, which will be treated as categorical where
appropriate; and character data beginning with the character '(',
'[', or a numeral, which are treated as ordered categories. For
example, in the context of a linear model, an independent variable
containing character data will be treated
as a factor, and will generate an appropriate set of contrasts or
dummy regressors.
Numeric data may be converted to character data using
appropriate APL primitives or the RECODE function. Character data
may be converted to numeric using the CATEGORY function, which
typically is invoked automatically by functions that understand how
to handle character data. It is therefore usually convenient to
represent a categorical variable as character data. For both
character and numeric variables, the character '.' represents
missing data.
As an alternative to employing data stored in dataset-objects,
the user may call the OBSERVATIONS function to define the
selection, missing-data, and observation-names vectors.
OBSERVATIONS 50, for example, defines selection and missing-data
vectors consisting of 50 ones and an observation-names vector with
the names '1', '2',..., '50'. Data vectors with 50 entries or
matrices with 50 rows could then be defined and employed as data
arguments to APL2STAT functions.
Creating Dataset Objects
Dataset objects may be created in three ways: (1) by
interactive, prompted input, using the ENTER_DATA function; (2)
using direct, APL2 statements to prepare the componets of the
dataset object, and the MAKE and PUT functions to construct the
dataset object from these components; or (3) by reading data from
a DOS file.
The ENTER_DATA function is useful for small datasets
that you want to enter from the keyboard. The function
prompts for the name of the dataset object, the number of
variables to be entered, the variable names, and the
observations.
Using MAKE and PUT: Consider the following
illustration, assuming that data already reside in the
matrix ANGELL_DATA; that observation names are in the
(nested) vector CITIES; and that a codebook for the
dataset is in the character matrix ANGELL_CB.
MAKE 'DATASET_PROTO' 'ANGELL'
PUT 'ANGELL' 'DATA' ANGELL_DATA
PUT 'ANGELL' 'VARIABLES' ('MORAL_INT' 'HETERO' 'MOBILITY' 'REGION')
PUT 'ANGELL' 'OBSERVATIONS' CITIES
PUT 'ANGELL' 'CODEBOOK' ANGELL_CB
USE 'ANGELL'
VARIABLES CREATED: MORAL_INT HETERO MOBILITY REGION
USING
Current dataset: ANGELL
Contains 43 observations
43 observations selected
43 observations with valid data (in last operation)
4 variables: MORAL_INT HETERO MOBILITY REGION
Reading data from a DOS file: The data file is assumed to contain records
delimited by carriage-return/linefeed characters. All records are assumed to
contain the same number of fields, separated by one or more blanks. (This
implies that data fields cannot contain blanks; use an underscore or dash instead.)
The first field in each record may be a numeric or character-string observation
name. The remaining fields may be numeric or character. Missing data in either
a character or numeric variable is represented by a single '.' character. Here is
a sample DOS file, CLASS.DAT, used to create a dataset object, CLASS:
ALFRED M 14 69 112.5
ALICE F 13 56.5 84
CAROL F 14 62.8 102.5
HENRY M 14 63.5 .
JANET F 15 62.5 112.5
JOHN M 12 59 99.5
JOYCE F 11 51.3 50.5
LOUISE F 12 56.3 77
MARY F 15 66.5 112
ROBERT M 12 64.8 128
RONALD M 15 67 133
THOMAS M 11 57.5 85
An optional codebook can be created as a separate file with the same filename
and extension CBK. The codebook for the CLASS dataset, CLASS.CBK, might
consist of these lines:
[1] Sex (M/F)
[2] Age
[3] Height (inches)
[4] Weight (pounds)
Source: fictitious data
The READ_DATA function prompts for the name of the input file (whose
extension, if omitted, is assumed to be .DAT), the names of the variables, and
the name of the dataset object to be created. Since DOS files cannot be directly
accessed from TryAPL2, READ_DATA is also supplied as a stand-alone program
(READDATA.EXE), which was used to generate the following example:
+--------------------------------------------------------+
| ReadData Module for TryAPL2 Version 2.00 of APL2STAT |
| July, 1993 John Fox & Michael Friendly |
+--------------------------------------------------------+
File to read (omitted extension is assumed to be '.DAT') : A:\CLASS
12 lines read from A:\CLASS.DAT
First 2 lines:
ALFRED M 14 69 112.5
ALICE F 13 56.5 84
Are there missing (.) or character values in the data? (y/N/q) : Y
This may take some time, patience!
Enter dataset name : CLASS
Are observation names in the data? (y/N/q) : Y
Enter 4 variable names, or press for VAR1,...VAR4:
1 : SEX
2 : AGE
3 : HEIGHT
4 : WEIGHT
Read codebook entries from A:\CLASS.CBK? (Y/n/q) : Y
Dataset object CLASS written to file A:\CLASS.TRY
Exporting APL2STAT data: The SASOUT function in APL2STAT takes a dataset
object and writes the data to a DOS file together with the SAS data step for
reading the data into SAS. (This text file may simply be edited to input the data
into a program other than SAS.) Again, since DOS files cannot be accessed
directly in TryAPL2, a standalone version of SASOUT is supplied as
SASOUT.EXE. In this case, the dataset object must have previously be saved
in a TryAPL2 workspace or, equivalently, and APL2STAT component file.
Models
There are currently three functions in APL2STAT -- LINEAR_MODEL, LOGIT_MODEL,
and PLOGIT_MODEL -- that permit a statistical model to be specified symbolically as a
character string (enclosed in quotes) according to the following conventions:
The dependent variable is specified to the left of an assignment arrow. The
specification is evalulated as an APL2 expression which must produce an
appropriate vector of dependent-variable values. For example 'INCOME*.5
...' specifies the square root of the values in INCOME as the dependent variable,
say in a linear model; INCOME is presumably a vector of scores of length n.
Note that if INCOME contains missing data, then 'INCOME*MD .5 ...' would
be appropriate.
The columns of the model (or `design') matrix are specified to the right of the
assignment arrow, either implicitly or explicitly. Different terms are separated
by +'s.
A numerical independent variable (either a vector of length n or an n-row
matrix) is entered directly into the model matrix.
A character independent variable is treated as a factor and produces one or
more columns of the model matrix. The factor is converted into contrasts (or
dummy variables) according to the function named in .deltaFACTOR_ACTION. If
.deltaFACTOR_ACTION does not exist, then its value defaults to 'DEVIATION', the
name of a function that creates deviation contrasts, using the last category of the
factor as the `reference' category.
A term containing one or more times-signs is treated as specifying an
interaction, and is passed (from right to left) to the INTERACT function to create
interaction regressors. Character variables are first expanded into their contrasts;
numeric vectors and matrices are passed directly to INTERACT.
Terms enclosed in parentheses are `protected' in the sense that they are
evaluated directly without interpretation. Thus, for example, '... + (DUMMY
EDUCATION) + ...' will construct dummy variables for EDUCATION, even if
education is numeric or if the default .deltaFACTOR_ACTION is 'DEVIATION'.
Likewise, ' ... + (INCOME AND INCOME*2) + ...' specifies linear and
quadratic terms in INCOME. The only requirements are that the expression in
parentheses must evaluate to a numeric matrix with n rows and may not contain
the symbol '+'.
An example, using the LINEAR_MODEL function (note that the spaces are not
required):
LINEAR_MODEL 'INCOME*.5 .gets (EDUCATION AND EDUCATION*2)
+ REGION + GENDER + GENDER.deltaREGION'
A new model parser that supports a much wider range of design formulas is
in the works.
User Functions, Auxiliary Functions, and Global Variables
APL2STAT functions and operators meant to be called directly by the user are named with
upper-case letters. Auxiliary functions are named with lower-case letters. APL2STAT functions
use a few global variables to pass information and set options. As explained above, data used
by functions are normally stored in APL2 vectors. Information and options that may be set by
the user are stored in global variables whose names begin with a delta -- e.g., the selection and
observation-names vectors. Similarly, colours for displaying various parts of graphs --
background, lines, symbols, highlighting, etc. -- are also stored in global variables --
.deltaBACK_COLOUR, .deltaLINE_COLOUR, .deltaSYMBOL_COLOUR, .deltaHIGH_COLOUR. Global
variables whose names begin with underscored deltas (e.g., .deltaCURRENT_DATASET) should not
be altered directly by the user.
Programming and Usage Conventions
Most APL2STAT functions do not require left arguments, but optional information is
sometimes supplied as a left argument. For example, functions that create objects take an
object-name as the left argument. If this argument is not supplied, then an object whose name
begins with "LAST_" is created. For example, the REGRESS function creates the regression
object LAST_REGRESSION unless another object name is given explicitly as a left argument.
Functions that create objects return the name of the object created.
Such returned objects are not automatically printed and generally contain some
information that should not be printed. A regression object, for example, contains not only
regression coefficients and other quantities that typically appear on a regression report, but also
residuals, fitted values, and so on. Use the PRINT function to print objects on the screen in an
appropriate format. Use GET to retrieve slot contents for further computation.
Some functions require that a data argument be passed to the function as a character
string that evaluates to a data matrix. For example,
REGRESS 'MORAL_INT ON HETERO AND MOBILITY'
(The functions ON and AND are simply column concatenators, so the expression in quotes
evaluates to a three-column matrix.) Even functions that do not require data arguments in this
form -- i.e., that would take a data matrix directly as an argument -- accept a character-string
argument that evaluates to a data matrix.
Variables local to APL2STAT functions and operators -- including arguments, result
variables, and statement labels -- are lower case. Consequently, if the user restricts him or
herself to upper-case variable names, these will never be shadowed by local variables.
Getting Information
To determine the purpose and use of any APL2STAT function or operator, type HOW
'function_name', e.g., HOW 'REGRESS'. A new user of APL2STAT should begin by listing the
APL2STAT functions (or examining the listing included with this document) and using HOW to
explore functions and operators whose purpose is not obvious from their names. The STAT_FNS
function reports all functions and operators with names starting with upper-case letters in the
active workspace or saved in any files on the search path. The function APROPOS looks for
all functions in the active workspace or in files on the path whose names contain a specified
character string. For example, APROPOS 'REG' will find the functions REGRESS,
ROBUST_REGRESS, TS_REGRESS, and some others. Likewise, APROPOS 'is_' will print out
a list of `predicates', such as is_numeric and is_matrix, which may be used to test a variety of
conditions.
Using Graphics Functions
Many APL2STAT functions create high-resolution graphical displays. Graphics mode is
entered by typing GRAPHICS 'ON', which divides the screen into a graphics window (at the top)
and an APL session-manager window at the bottom. The user can continue to enter APL
expressions in the session-manager window. Many graphics functions modify the plot currently
displayed in the graphics window. The graphics window may be erased by entering GRAPHICS
'CLEAR', and the full-screen session-manager can be restored by entering GRAPHICS 'OFF'.
High-level graphics functions automatically set graphics on and clear the graphics window.
The global variables .deltaSESSION_WINDOW and .deltaGRAPHICS_WINDOW define (roughly)
the proportions of the screen devoted to the two windows. (To prevent overlap, the two
variables should sum to a bit less than 1.0.) Settings for the two window-control variables
depend upon the graphics adaptor and the desired number of lines in the session-manager
window. Start with .deltaSESSION_WINDOW .gets .25 and .deltaGRAPHICS_WINDOW .675 (which
should produce about three session-manager lines at the bottom of the screen), adjusting these
values if they do not produce a satisfactory result.
Obtaining Printed Copies of APL2STAT Graphs
APL2STAT permits graphics to be `captured' to APL2 variables. A captured graph may
then be redisplayed, modified, and saved (as Postscript output) to a file. In order to capture,
save, and print a graph:
(1) During your APL2 session, call the APL2STAT CAPTURE function prior to constructing the
graph. For example,
CAPTURE 'FIGURE1'
will save subsequent graphical output in the APL2 variable FIGURE1 (as well as displaying the
output on the screen). If FIGURE1 already exists, then it is automatically redisplayed on the
screen, and subsequent graphical output will modify it. Note that if you capture two plots to the
same variable then they will overplot each other when re-displayed or printed.
(2) Proceed to construct the graph in the usual manner (e.g., with calls to PLOT, IDENTIFY,
etc.).
(3) When you are finished with the current graph, type
CAPTURE 'OFF'
(4) You may use the SHOW function to re-display a graph that is captured in an APL2 variable;
for example,
SHOW 'FIGURE1'
If CAPTURE is currently directed to this variable, then any modifications to the graph will be
saved. There is no prespecified limit on the number of captured graphs that may be present in
your active workspace, but each occupies some space in memory.
(5) Use the SAVE_GRAPH function to save a captured graph to a DOS file. For example,
'FIGURE1' SAVE_GRAPH 'C:\PLOTS\FIGURE1.PS'
If you fail to specify the left argument to SAVE_GRAPH, then it defaults to the name of the last
graph that you captured.
(6) After you exit from the APL2 system, you can send your graphics file to a Postscript printer.
The APL2STAT Shell
The APL2STAT SHELL automatically copies functions, operators, and variables into the
active workspace as they are encountered in APL statements typed by the user. The workspace
also manages memory, and copies information in the active workspace to a disk file when
available memory falls below a predefined threshold. Finally, the SHELL has facilities for
capturing and re-executing APL expressions (see the SCRIPT and DO functions).
The SHELL function, which is invoked when the SHELL workspace is loaded, searches
in disk files for objects that are not already present in the active workspace. The use of the
APL2 session manager under the shell is essentially normal. (By default, the SHELL function
provides APL expression numbers as a prompt, primarily to remind the user that the shell is
active. This prompt may be changed by assigning another expression, which should evaluate
to a prompt, to the global variable .deltaPROMPT. For example, .deltaPROMPT .gets ''' ''' evaluates
to the usual six-space session-manager prompt.)
The SHELL function searches the component files listed in .deltaPATH in the given order,
from left to right; therefore, if any object exists in more than one workspace in the path, it is
copied from the first location in which it is encountered. The initial contents of .deltaPATH are set
to the locations of the supplied DATA and APL2STAT files. You may add other files to the
search path. The ATTACH function conveniently appends workspaces to the search path.
The SHELL function has some small limitations:
It is not possible to execute APL2 system commands [ )COPY, )FNS, etc.]
from inside the shell. You may, however, exit from the shell, execute a
command, and then re-enter the shell (see below). Alternatively, many functions
may be simulated within the SHELL, including )FNS, )VARS, )OPS, )NMS, and,
significantly, )SAVE, )COPY, and )PCOPY, which allow you to save and copy
from component files in any disk or directory. These "commands" are
implemented through functions with the same names, e.g. )SAVE calls the
function SAVE. The functions may be called directly, but in this instance, it's
necessary to enclose arguments in primes ' ', which are not needed when the
command form is typed. Thus, )SAVE A:\LIB\MYWS is equivalent to SAVE
'A:\LIB\MYWS'. See, e.g., HOW 'SAVE' for further information.
Under the TRYAPL2 interpreter, it is not possible to enter function definition
mode from inside the shell. Again, you may exit from the shell, define or edit
a function, and re-enter the shell. (This limitation does not apply to the APL2
interpreter.)
When a line is transfered from the session log and re-executed, the first few
characters (equal to the length of the prompt) are stripped off. If, therefore, the
line begins too close to the left margin, insert a sufficient number of blanks
before re-executing the line.
To minimize unnecessary disk access, you can modify the SHELL function so
that the automatic search for functions, operators, and variables does not take
place within character strings enclosed in quotes. (Instructions for making this
modification are included as comment lines in the SHELL function.) An
unfortunate byproduct of this alternative is that, for example
REGRESS 'MORAL_INT ON HETERO AND MOBILITY'
will not cause a search for the functions ON and AND. You can copy these
functions explicitly into the workspace, however, prior to their use. Some
functions, such as USE and HOW, search directly for objects enclosed in quotes.
To exit the shell, type the `pseudo-command' )QUIT. To re-enter the shell from the
session manager, type SHELL.
Managing Memory
Because APL2STAT functions and operators only can access data in the active workspace,
the size of datasets is limited by available memory. This is not a serious limitation for the APL2
interpreter running on a well endowed 386 or better system, but it is a substantial limitation
when running APL2STAT under the TRYAPL2 system.
When using TRYAPL2 or a sub-386 system, it is therefore important to free as much
memory as possible within the standard DOS limitation of 640K. On a computer using a 286
or higher processor, and running under DOS 5 or 6, it is useful to load DOS into high memory
(along with necessary device drivers and memory-resident programs, if possible). It should be
possible to free at least 600K of DOS memory.
Free memory may be checked with the APL2STAT MEMORY function. It is our
experience that the system may fail to function properly during the analysis of moderately small
datasets (say 100 observations and 10 variables) when free memory falls below about 100K.
After executing each expression, the SHELL function checks the amount of free memory.
If free memory falls below the amount specified in .deltaMEMORY_THRESHOLD (which defaults
to 0, representing 0K bytes -- i.e., disabling memory management), then SHELL takes the
following actions:
The current contents of the workspace are saved to the file specified in
.deltaTEMP_FILE, (using the extension .1 for the first such save, .2 for the second,
etc.).
This file is appended to the search path, .deltaPATH, as its first entry.
All objects, except those necessary to the operation of SHELL, are deleted from
the active workspace. Note that any deleted object is available in the saved file,
which is automatically searched first by SHELL.
If necessary, some memory may be freed by erasing extraneous APL variables (such as
a dataset object already accessed by USE or a returned regression object that is no longer
needed). Likewise, APL2STAT functions that are not required may be erased from the active
workspace. The function NO_HOW (a bad pun!) erases all documentation lines from the
APL2STAT functions, freeing a few K. This is an undesirable measure, however, since the user
cannot subsequently employ HOW to determine function usage without recopying the function
from the APL2STAT file.
APL2STAT functions are written to copy auxiliary functions and variables automatically
from a component file if they are not already present in the active workspace. The same
component file serves both the TRYAPL2 and APL2 implementations. The global variable
.deltaAPL2STAT_FILE should be set to the complete path to the component file. This component
file has the same structure as a TRYAPL2 workspace. Consequently, if it is renamed to
APL2STAT.TRY and placed in the TRYAPL2 directory, the file may be accessed via the )COPY
command; it is too large to be loaded as the active workspace under TRYAPL2.
Using APL2STAT as a `Statistical Package'
Although it differs from a statistical `package' in several respects, for most routine data
analysis within its capabilities, APL2STAT can be used much like a command-driven statistical
package. The tips in this section are meant to facilitate such use:
You can employ the following sequence of steps to carry out a routine analysis:
(1) After starting the APL2 or TRYAPL2 interpreter, )LOAD the SHELL workspace.
Alternatively, if you're using the APL2 interpreter on a 386 or better system with
sufficient memory, a viable option is to prepare a workspace containing all of the
APL2STAT functions and to load this workspace at startup.
(2) Get the data on which you plan to work into the active workspace, either
automatically by using a dataset saved in a component file on the search path, by defining
an APL2STAT dataset object explicitly in the current session, or by reading data from an
ASCII file into a dataset object. (ASCII files cannot be accessed under the TRYAPL2
interpreter; you can used the separately supplied DATA.EXE program to create a dataset
object from an ASCII data file and to place this object in a component file readable by
TRYAPL2.)
(3) Alternatively, if you are not working with a dataset object -- e.g., when you want to
generate and analyze random data -- use the OBSERVATIONS functions to tell APL2STAT
functions the number of observations with which you intend to work.
(4) Issue the COPY ON ID filename session-manager command to place a log of your
session in a journal file.
(5) Proceed to employ APL2STAT functions to perform the data analysis --
LINEAR_MODEL to fit a model by least-squares, PLOT to make a scatterplot or a line
plot, and so on.
(6) After you finish your data analysis and exit from TRYAPL2, print the journal file (and
any graphics files that you have saved)
Handling missing data:
(1) Some datasets have missing data, encoded by the period character, '.'. Although
APL2STAT functions are written to handle missing data automatically, it is often easiest
simply to eliminate observations that contain any missing information. This operation
can be accomplished by invoking the SQUEEZE function after making a dataset current
with USE.
(2) If you decide to maintain missing data in the variables that you are analyzing, you
need to be careful in using functions not in APL2STAT -- including the APL2 primitive
functions -- which do not know how to process the '.' missing-data code. In many cases,
you can handle missing data properly by employing the MD operator: For example
INCOME *MD .5 will properly find the square-roots of the non-missing values of the
income variable, and return a missing value when income is missing; INCOME*.5 will
not work if the income variable contains some missing (i.e.,'.') values. Likewise,
HUS_INCOME +MD WIF_INCOME will produce a correct result, but, if either variable
contains missing data, HUS_INCOME+WIF_INCOME will not.
Conventions employed by APL2STAT functions:
(1) Most functions take a data vector or matrix as a right argument, or an expression that
evaluates to a data matrix. Some functions, such as REGRESS, require an expression
that evaluates to a data matrix enclosed in primes (i.e., written as a character vector).
All APL2STAT functions that expect data as a right argument will accept an appropriate
expression in primes.
(2) Most APL2STAT functions that permit a left argument use that argument to select
options. If the left argument is omitted, then a default value for the option is assumed.
Consult the function documentation (accessible through HOW) to obtain specific
information about the arguments of any function.
(3) When a function fits a model in which there is a dependent variable and one or more
independent variables, or produces a plot with a vertical and a horizontal variable, then
the dependent or vertical variable is in the first column of the data matrix supplied as a
right argument to the function.
(4) Most APL2STAT functions that produce high-resolution plots split the screen, placing
the plot at the top and leaving a three-line session-manager window at the bottom. You
may modify the current plot by entering appropriate function calls into the session
manager. Erase the plot and restore the full-screen session manager by entering
GRAPHICS 'OFF'.
(5) APL2STAT functions that fit statistical models, such as LINEAR_MODEL, generally
do not print their results. Instead, these functions return an object that contains the
results of the fit. Use the PRINT function to print a summary.
Writing New Statistical Functions
Because APL2STAT is composed simply of user-defined APL2 functions and operators,
and because the basic unit of data is the variable-vector, it is simple to write new APL2STAT
programs. In fact, in the absence of missing data, any APL statistical functions and operators
will work properly.
It is, however, efficient to take advantage of some of the high-level and auxiliary
functions provided by APL2STAT in writing new functions. In particular:
New object prototypes may be defined from the existing prototypes, and
functions and operators may be written to access these objects. Methods for new
object types may be written for functions such as PRINT if the inherited methods
are inappropriate. Method functions are named according to the scheme
method_TYPE, e.g., print_LINEAR_MODEL.
The auxiliary function select applies the selection and vector to its right
argument and may therefore be called at the start of all high-level statistical
functions; select also screens out missing data and sets the missing-data vector.
Likewise, deselect may be applied to data to be returned by functions (filling in
deselected observations with missing data).
The auxiliary function get_data examines its right argument. If the argument
is a data matrix, then it is simply returned. If, however, the argument is a
character vector whose length is different from that of the selection vector (such
a vector is assumed to be an APL2 expression that evaluates to a data matrix) then
the argument is executed.
The auxiliary function get_terms takes a character vector as a right argument
and returns a nested vector containing the "words" in the argument (parsing out
blanks, punctuation, and the reserved words AND, ON, VS, and BY).
The auxiliary function default takes the name of a (global or left-argument)
variable (in quotes) as its left argument and a value as its right argument. If the
named variable does not exist, then it is created and assigned the specified value.
Many of the user-callable plotting functions (e.g., PLOT, HLINE, VLINE,
POINTS, LINES, LOWESS) are suitable for use as auxiliary functions.
The auxiliary function uses may be employed to copy sub-functions, sub-
operators, and variables from the component file pointed to by
.deltaAPL2STAT_FILE.
Observation names for the current dataset are in the global variable
.deltaOBS_NAMES. If you access this vector to label observations, then be sure to
apply the current selection and missing-data vectors to pick up the correct names,
as in
nms(.deltaMISSING^.deltaSELECT)/.deltaOBS_NAMES
If your function call an upper-case APL2STAT function, such as BIGGEST, that
accesses the .deltaOBS_NAMES vector after selection has taken place, then you'll
need to assign the selected observation names to the observation-names vector.
Be sure to restore the original contents of .deltaOBS_NAMES before your function
returns.
A sketched illustration:
.deltaobject_name MYFUNCTION data;something
[1] uses.delta 'select' 'get_data' 'MAKE' 'PUT' 'deselect'
[2] 'object_name' default 'LAST_MY'
[3] data .gets select get_data data
[4] MAKE 'MY_PROTO' object_name
[5] something ...
.
.
.
[10] PUT object_name 'SOMETHING SLOT' (deselect something)
[11] .delta
References
Brown, J.A., S. Pakin and R.P. Polivka (1988). APL2 at a Glance. Englewood Cliffs: Prentice-
Hall.
Gilman, L. and A.J. Rose (1984). APL: An Interactive Approach, Third Edition. New York:
Wiley.
List of APL2STAT Functions and Operators
ABLINE Draw a straight line on the current plot given intercept
and slope.
ADD_POINTS Add points to the current plot using the cursor or mouse.
ADD_VARIABLES Add one or more variables to a dataset object.
AKERNEL Adaptive kernel density estimator.
AND Catenate columns
APROPOS Display names of functions and operators containing a
string.
ATTACH Attach a file containing a tryapl2 workspace to the
search path.
AUTOCORRELATE Calculate autocorrelation and partial autocorrelation
functions
BIGGEST Report the n largest values in x
BIPLOT Produce a 2-dimensional biplot of a data set, showing
both
o BOOTSTRAP Operator for computing bootstrapped sample estimates.
o BOOTSTRAP_FIXED Operator to bootstrap a regression with a fixed
BOXPLOT Draw a boxplot or parallel boxplots.
BOX_COX Box/cox transformation of y in regression.
BOX_COX_PLOT Constructed-variable plot for power transformation of y.
BOX_TIDWELL Find ml estimates of power transformations of the x's in
BOX_TIDWELL_PLOT Constructed-variable plot for power transformation of
an x
BY Catenate columns
CAPTURE Start or stop capture of graphics
CATEGORY Convert a character vector into numeric codes.
CHDIR Ap103 function to query or change the current directory.
CHISQUARE_DENSITY Chisquare density function
CHISQUARE_DIST Cumulative chisquare distribution function
CHISQUARE_QUANT Chi-square quantiles
CHISQUARE_RAND Return n random chisquare variables with df degrees-of-
freedom
CIRCLES Draw circles on the current plot
COL Reshape a vector or scalar as 1-column matrix
COPY Copy objects from a saved tryapl2 workspace.
CORRELATE Calculate correlations, covariances, means, and standard
deviations.
COUNTS Creats a table object containing a frequency table
COV Compute the variance-covariance matrix of a data matrix
DELETE_POINTS Delete points from the current plot
DENSITY_PLOT Plot of kernel density estimate, with observations
plotted as points.
DESCRIBE Mean, standard deviation, median, etc., for vector x.
DEVIATION Create deviation-contrast regressors from a vector of
group
DIAG Form a diagonal matrix from a vector or extract diagonal
from matrix.
DIFFERENCE Differences y[t]-y[t-n] of a series
DO Execute lines previously captured to a shell script
DROP_ROWS Drop specified rows from a matrix
DROP_VARIABLES Delete one or more variables from a dataset object.
DUMMY Creates dummy variable regressors from a vector of group
DURBIN_WATSON Compute durbin-watson test for serially correlated errors
ELLIPSE Draw data ellipse(s) with specified confidence coverage.
ENTER_DATA Interactive data entry with options for row and column
labels.
ERASE Erase objects from the active workspace
FNS Display names of functions in the active workspace
FOLD Fold text vector or matrix lines to given width and
indent.
FORMAT Format a numeric matrix or vector.
FOURIER_PLOT Andrews fourier function plot of a p-dimensional data
matrix.
FUZZ Round near integers to integers
F_DENSITY F density function
F_DIST Cumulative f distribution function
F_DIST1 Cumulative f distribution, by integrating f_density
F_QUANT F distribution quantiles
F_RAND Return n random f variables with df[1 2] degrees of
freedom
GAMMA_DENSITY Gamma density function
GAMMA_DIST Gamma distribution function
GAMMA_QUANT Gamma distribution quantiles
GET Return the value of a specified slot from an object or
null
GRAPHICS Enter and exit graphics mode or clear current plot
GRID Construct a grid of values for one or more variables.
o GRID_MIN Grid-search operator to minimize a function of several
HISTOGRAM Draw a histogram (frequency bar graph).
HLINE Draw a horizontal line on the current plot
HOW Produce a brief description of a function or operator.
HYPOTHESIS Test a general linear hypothesis given by for a
model.
IDENTIFY Identify points on current graph using the cursor.
INFLUENCE_PLOT Bubble-plot of studentized residuals vs. hat-values
INSERT_ROW Insert a row in a matrix.
INTEGRATE Integrate a function by adaptive integration.
INTERACT Form matrix of interaction regressors from main effects
or
o JACKKNIFE Operator for computing jackknifed sample estimates.
JOIN Catenate arrays by columns, extending the shorter if rows
unequal.
KERNEL Kernel density estimator.
LAG Lag a series
LEAD Lead a series
LINEAR_MODEL Fit a general linear model by least-squares.
LINES Draw connected lines on the current plot in the current
line_colour
LOGIT Logistic regression analysis.
LOGIT_MODEL Fits a general linear logit model to data by maximum
likelihood.
LOWESS Locally weighted scatterplot smoother.
MAKE Create an object from a prototype.
MAKE_DATASET Make a dataset object. used by enter_data.
o MD Missing-data operator for scalar functions.
MEAN Returns the mean of .
MEDIAN Returns the median of x
MEMORY Report free workspace memory in kilobytes
MERGE Merge two dataset objects by observation names.
METHOD Return the name of an appropriate method for an object.
o MINIMIZE Hooke-jeeves minimization of f, a function of several
variables.
MOVE_POINTS Move points on the current plot using cursor or mouse.
NAMES List objects in the active workspace.
NEXT_TO Display captured graphs side by side
NMS List objects with names beginning with characters
o NONLINEAR Operator to fit nonlinear models by the method of least
squares
NORMAL_DENSITY Normal-density function
NORMAL_DIST Cumulative unit-normal distribution function
NORMAL_QUANT Unit-normal quantile function (inverse normal
distribution)
NORMAL_RAND Returns n normal random variables
NO_HOW Remove documentation comments from apl2stat functions.
NQPLOT Normal_quantile comparison plot
OBSERVATIONS Prepare the selection and observation-names vectors for n
observations
OMIT Toggle the selection status of an observation in the
selection vector.
ON Catenate columns
ONTOP Catenate two arrays vertically by rows (max. rank 2)
OPS List operators with names beginning with characters
OUTLIER_PLOT Robust multivariate outlier detection plot. calculates
robust
PAIRS Scatterplot matrix with interactive point identification.
PARTIAL_PLOT Partial regression and partial residual plot.
PARTIAL_PLOTS Partial regression and partial residual plots for all
x's.
PARTITION Partition an array into sub-arrays.
PCOPY Protected copy of objects from a saved tryapl2 workspace.
PLOGIT Fit unordered polytomous logit models.
PLOGIT_MODEL Fit a general polytomous linear logit model data by
maximum likelihood.
PLOT Create an xy plot with points, joined lines, or
regression line.
PLOT_ARROW Place text and an 'arrow' on the current graph using the
cursor.
PLOT_TEXT Place text on the current graph using the cursor or
mouse.
POINTS Add points to the current plot.
PRINT Print an apl2stat object in an appropriate format.
PUT Put a value into a specified slot of an object.
QR Compute the qr factorization of a matrix.
o QUANTILE_PLOT Operator to construct a quantile-comparison plot
QUICK_LOWESS Locally weighted scatterplot smoother, fast version.
QUIT Exit to session manager from the shell
RANKS Transform a data vector or matrix to ranks, allowing for
ties.
READ_DATA Read data from a dos file into an apl2stat dataset.
RECODE Recode a numeric or character data vector.
REGRESS Fit a least-squares linear regression.
ROBUST_REGRESS Robust regression by iteratively reweighted least
squares.
ROUND Round an array to a specified number of decimal places.
ROW Reshape a vector or scalar to a 1-row matrix.
SASOUT Writes a dataset object to a .sas file with a data step
SAVE Write the active workspace to a tryapl2 component file.
SAVE_GRAPH Save a captured graph to a postscript file or device.
SCALE Scale a data matrix or vector to specified min/max.
SCRIPT Start or stop storing of shell commands to a script.
SEQUENCE Generate a sequence of numbers spanning a range.
SHELL The apl2stat shell; provides automatic loading of
objects.
SHOW Redisplay a captured graph.
SLOTS Return the slot names of an apl2stat object.
SMALLEST Report the n smallest values in x.
SORT Sort numeric or character data into ascending order.
SQUEEZE Eliminate all observations with missing data from the
SSE Find the sum of squares of errors for a linear model.
STANDARDIZE Standardize a data matrix or vector to specified mean/sd.
STAT_FNS List high-level functions and operators in workspace or
on the path
STEMLEAF Stem-and-leaf display with automatic scaling & outlier
identification.
SYMBOL_PLOT Scatterplot with identifying symbols.
o TABLE Operator to calculate multi-way tables of statistics.
TAKE_ROWS Take specified rows of a matrix
TEST Test a hypothesis by contrasting two nested models.
o TIMER Operator to time the execution of a function.
TIMES Matrix multiplication of simple or partitioned matrices.
TS_REGRESS Linear regression with autocorrelated ar(1) errors,
T_DENSITY t density function with df degrees of freedom.
T_DIST t cumulative distribution function.
T_QUANT Quantiles of the t-distribution.
T_RAND Returns n t variables with df degrees-of-freedom
UNIFORM_RAND Returns n [0,1] uniform random variables
UNIQUE Find the unique values in a vector
USE Define the 'current' dataset.
USING Report information about the current dataset
VARIANCE Return the variance(s) of a vector or matrix
VARS List variables in active w.s. with names starting
VIF Calculate and report variance-inflation factors.
VLINE Draw a vertical line on the current plot
VS Catenate columns
WHEREIS Return the position of an observation from its name
akernel Subfunction of akernel
andrews Andrews fourier transform of a data matrix .
ap207 Send a graphics command to ap207
ap_list Returns a list of ap's currently active in this session.
apl_type Returns the type of an apl object, 1=char, 2=num, 3=mixed
bar Draw one bar for a histogram
beta Beta distribution with parameters p,q
biplot Calculate biplot coordinates for display of a data matrix
.
bisquare Bisquare weight function
box Calculate statistics for boxplot
boxcox Box-cox power transformation
boxcox_reg Calculate log-likelihood of transformed y on x for boxcox
boxplot Draw a boxplot
bstd Standardize data matrix to deviation scores or z-scores
for biplot
capture Capture a graphic primitive
circle Draw a circle
cleanup Erase non-protected objects from the active workspace
clear_plot Clear the graphic area
convert Convert characters to numerics
default Assign default value if variable does not exist
deselect Restore a vector or matrix to full length, inserting '.'s
diff Differences among all pairs of vectors
display_names Display names in 4 columns
divide Tell session manager to use split-screen graphics mode
dsq Squared distance from mean vector for rows of matrix .
ellipse Calculate coordinates of data ellipse with given
confidence coverage
enclose_graph Reshape captured graph to nested vector of pairs
filter_minus Replace apl high-minus with ascii minus in a graphic
object
frame Draw a plot frame in current axis color
fullscreen Setup fullscreen graphics mode
g Subfunction for various probability functions
o gauss Fit a nonlinear model using gauss-newton minimization
get_data Utility to handle numeric or character data, or apl
expression
get_numeric Return numeric columns of a matrix
get_terms Return a vector of terms in a symbolic model
specification
get_variables Auxiliary function for use
get_vname Get variable names for the columns of a data matrix.
gradeup Return indices to sort character/numeric in ascending
order
hatvalue Auxiliary function for regress
haxis Draw a horizontal axis
highlight Subfunction of pairs
histogram Draw histogram bars
o hj_explore Hook-jeeves: explore f about f=x f(x)
house Compute orthogonal householder reflection matrix for qr
hyp_MLE Hypothesis method for mle objects
hyp_REGRESSION Hypothesis method for regression objects
interpolate Linear interpolation
invdepth Find depths from stemleaf display
inverse_scale Find data coordinates from plot coordinates
is_factor is x an unordered categorical (character) variable?
is_integer is x an integer array?
is_logical is x a logical array?
is_matrix Is x a matrix?
is_missing Test whether any observation is missing
is_non_negative Is x a non-negative array?
is_numeric Test if (non-missing) values are numeric
is_object Is x an apl2stat object
is_object_name is x the name of an existing apl2stat object?
is_ordered_factor is x an ordered categorical (character) variable
is_positive Is x a positive array?
is_probability Is x in the open interval between 0 and 1: (0 < x < 1) ?
is_scalar Is x a scalar?
is_simple Is argument x a simple scalar or array?
is_string Is x a character string?
is_sym Test if an array is a symmetric matrix
is_var_name Is x a valid variable name (not necessarily currently
defined)?
is_vector Is x a vector?
jacobi Eigen values and vectors by jacobi method; stable but
slow.
kernel Subfunction of kernel
lean Remove comment lines from a function
length Length of a vector or of cols of matrix (in the
metric of )
line Draw connected lines
logit_diagnose Compute diagnostics for logit
logit_info Evaluate information matrix for logit
lower_case Convert a character string to lower case
lowess_fit Return lowess fitted value at x = xi
lowess_residuals Return lowess residual values
make_model Construct model x matrix from symbolic model
specification
o marquardt Fit a nonlinear model using marquardt minimization
o md Missing data suboperator
model_matrix Find the model matrix column(s) for a term
msg Compose an error message from a list of argument codes.
names Find the names of objects in an input string
nice Find nice-number scale values for a plot axis
noquotes Removes quoted substrings from a character vector.
null Returns nested scalar containing a null vector
object_type Determine the type of an object
objects Find names of objects along the path.
objects_in Find names of objects in a .try file
ols Compute ordinary least squares quantities
outside Find number of observations outside and inside the fences
pairs_coord Calculate coordinates in a plot cell for pairs
pairs_hline Draw a horizontal line for pairs
pairs_vline Draw a vertical line for pairs
parse Parse a symbolic model specification
parse_recode Parse a recode specification
partial_plot Draw a partial regression or partial residual plot
plogit_diff Calculate update to plogit parameter vector
plogit_info Calculate the information matrix for plogit
plogit_start Calculate start values for plogit
plot Low-level plot function
print_AUTOCORR Print method for autocorr objects
print_CORR_MATRIX Print method for corr_matrix objects
print_DATASET Print method for dataset objects
print_FUNCTION Print method for apl variable, function, or operator.
print_LM Print method for lm objects
print_LOGIT Print method for logit objects
print_LOGIT_MODEL Print method for logit_model objects
print_MATRIX Print a matrix with row and column labels.
print_NONLIN Print method for nonlin objects
print_OBJECT Print method for objects of type object (default print
method)
print_PLOGIT Print method for plogit objects
print_REGRESSION Print method for regression objects
print_ROBUST Print method for robust regression objects
print_TABLE Print method for table objects
print_TS_REGRESS Print method for timeseries regression objects
prompt Prompted input, returning a character string
read Try to read an object from a tryapl2 file
robust_cov Calculate asymptotic var-cov matrix for robust
sas_datastep Used by sasout to compose a sas data step
scale Calculate screen coordinates from data coordinates
scale_text Change size of text in a captured graphics object
scatter_symbol Draw a plot symbol at a scaled x,y location
scatterplot Draw one scatterplot for pairs
screen Return size of screen in characters and pels
script Capture a shell command to the current script
sdepth Find depth values for stemleaf
search Search for an object along the ¶path
select Screen out omitted and missing observations
setdown Restore text screen and close the graphics window
setup Setup split-screen and open a graphics window
shell_names Find the names of objects in an input string
simpson Integrate a function by simpson's rule.
sl Stemleaf display for a batch of numbers.
sl_scale Find unit and lines per stem to scale stem-leaf display
sort Low-level sort function
stack Insert character string(s) in the input line command
stack.
startup Autostart function for apl2stat (without the shell)
o steep Fits nonlinear models by the method of least squares
svd Singular value decomposition of matrix .
symbol Draw a plot symbol at a scaled x,y location
symbol_plot Plot points (and line) for one group in symbol_plot
test_MLE Test method for mle objects and descendents
test_REGRESSION Test method for regression objects and descendents
text Draw a text string at a scaled location in a plot
text2 Draw a text string at a graphics location in a plot
tricube Tricube weight function
trim Trim characters in list from beginning and end of string
truncate Truncate numeric values to integers
undivide Tell session manger to use full screen
unique Find unique values in a vector or rows in a matrix.
upper_case Convert a string to upper case
uses Find and load an object of specified type from the path
vaxis Draw a vertical axis
wls Weighted least squares regression
word Split a string into its first word and the rest
words Split a string to a nested vector of (blank-delimited)
words
.deltaFV Emulation of apl2/370 FV built-in function using ap210