Visualizing Categorical Data: goodfit
$Version: 1.4 (7 May 2002)
Michael Friendly
York University
Goodness of fit tests for discrete distributions
The GOODFIT macro carries out Chi-square goodness-of-fit tests for discrete
distributions. These include the uniform, binomial, Poisson, negative
binomial, geometric, and logarithmic series distributions, as well as any
discrete (multinomial) distribution whose probabilities you can specify.
Both the Pearson chi-square and likelihood-ratio chi-square are computed.
The data may consist either of individual observations on a single
variable, or a grouped frequency distribution.
The parameter(s)
of the distribution may be specified as
constants or may be estimated from the data.
The GOODFIT macro is called with keyword parameters. The arguments may be
listed within parentheses in any order, separated by commas. For example:
%goodfit(var=k, freq=freq, dist=binomial);
You must specify a VAR= analysis variable and the keyword for the distribution to be fit with the DIST= parameter. All other parameters are optional.
- DATA=
-
Specifies the name of the input data set to be analyzed. [Default:
DATA=_LAST_
]
- VAR=
-
Specifies the name of the variable to be analyzed, the basic count
variable.
- FREQ=
-
Specifies the name of a frequency variable for a grouped data set. If no FREQ= variable is specified, the program assumes the data set is ungrouped, and
calculates frequencies using PROC FREQ. In this case you can specify a SAS
format with the
FORMAT= parameter to control the way the observations are grouped.
- DIST=
-
Specifies the name of the discrete distribution to be fit. The allowable
values are: UNIFORM, DISCRETE, BINOMIAL, POISSON, NEGBIN, GEOMETRIC,
LOGSERIES.
- PARM=
-
Specifies the value of
parameter(s)
for the distribution being
fit. If PARM= is not specified, the parameter(s)
are estimated using maximum
likelihood or method of moment estimators.
- SUMAT=
-
For a distribution where frequencies for values of the VAR=
variable = k have been lumped into a single category, specify
SUMAT=
k causes the macro to sum the probabilities and fitted frequencies for all
values >= k. [Default: SUMAT=10000
]
- FORMAT=
-
The name of a SAS format used when no FREQ= variable has been specified.
- OUT=
-
Name of the output data set containing the grouped frequency distribution,
estimated fitted frequencies (EXP) and the values of the Pearson (CHI) and
deviance (DEV) residuals. [Default:
OUT=FIT
]
- OUTSTAT=
-
Name of the output data set containing goodness-of-fit statistics.
[Default:
OUTSTAT=STATS
]
Example
%include vcd(goodfit); *-- or include in an autocall library;
%goodfit();
See also
distplot Plots for discrete distributions
ordplot Diagnose form of discrete frequency distribution
poisplot Poissonness plot for discrete distributions
rootgram Hanging rootograms for discrete distributions