sort |
Generalized dataset sorting by format or statistic |
sort |
Visualizing Categorical Data: sort
$Version: 1.1 (19 Nov 1998)
Michael Friendly
York University
Generalized dataset sorting by format or statistic
The SORT macro generalizes the idea of sorting the observations in a
dataset to include:
- sorting according to the values of a user-specified format.
-
With
appropriate user-defined formats, this may be used to arrange the
observations in a dataset in any desired order.
- reordering according to the values of a summary statistic
-
computed
on the values in each of serveral groups, for example, the mean or
median of an analysis variable. Any statistic computed by
PROC UNIVARIATE may be used.
Method
Usage
You must specify one or more BY= variables. To sort by the value of a statistic, specify name the statistic
with the BYSTAT= parameter, and specify the analysis variable with VAR=. To sort by formatted values, specify the variable names and associated
formats with
BYFMT=.
If neither the BYSTAT= or BYFMT= parameters are specified, an ordinary sort is performed.
The sort macro is called with keyword parameters. The arguments may be
listed within parentheses in any order, separated by commas. For example:
%sort(by=age sex, bystat=mean, var=income);
or
proc format;
value age 0='Child' 1='Adult';
%sort(by=age decending sex, byfmt=age:age.);
- DATA=
-
Name of the input dataset to be sorted. The default is the most recently
created data set.
- VAR=
-
Specifies the name of the analysis variable used for BYSTAT sorting.
- OUT=
-
Name of the output dataset. If not specified, the output dataset replaces
the input dataset.
- BY=
-
Names of one or more classification (factor, grouping) variables to be used
in sorting. The BY= argument may contain the keyword DESCENDING before a variable name for
ordinary or formatted-value sorting. For BYSTAT sorting, use
ORDER=DESCENDING. T
he BY= variables may be character or numeric.
- BYFMT=
-
A list of one or more terms, of the form,
VAR:FMT
or
VAR=FMT
, where VAR is one of the BY= variables, and FMT is a SAS format. Do not specify BYSTAT= when sorting by formatted values.
- VAR=
-
Name of the analysis variable to be used in determining the sorted order.
- BYSTAT=
-
Name of the statistic, calculated for the VAR= variable for each level of the BY= variables. BYSTAT may be the name of any statistic computed by PROC
UNIVARIATE.
- FREQ=
-
For BYSTAT sorting, specify the name of a frequency variable if the input
data consists of grouped frequency counts.
- ORDER=
-
Specify
ORDER=DESCENDING
to sort in descending order when sorting by a BYSTAT. The ORDER= parameter applies to all BY= variables in this case.
Example
Given a frequency table of Faculty by Income, sort the faculties so they
are arranged by mean income:
%include macros(sort); *-- or include in an autocall library;
%sort(data=salary, by=Faculty, bystat=mean, var=income, freq=count);
See also
table Construct a grouped frequency table, with recoding