[Previous] [Next] [Up] [Top]
Graphical Methods for Categorical Data
Michael Friendly
Methods of analysis of categorical data fall into two categories:(1)
------------------------
(1) After Koch & Stokes (1991).
------------------------
- Non-parametric, randomization-based methods
- make minimal assumptions
- useful for hypothesis-testing
- SAS: PROC FREQ
- Pearson Chi-square
- Fisher's exact test (for small expected
frequencies)
- Mantel-Haenszel tests (ordered categories: test for
linear association)
- Model-based methods
- Must assume random sample (possibly stratified)
- Useful for estimation purposes
- Greater flexibility; fitting specialized models (e.g.,
symmetry)
- More suitable for multi-way tables
- SAS: PROC LOGISTIC, PROC CATMOD, PROC GENMOD , PROC
INSIGHT (Fit YX)
- estimate standard errors, covariances for model
parameters
- confidence intervals for parameters, predicted
Pr{response}
Getting information from a table is like extracting sunlight
from a cucumber.
Fahrquar & Fahrquar (1891).
You can see a lot, just by looking.
Yogi Berra.
Graphical methods for quantitative data are well-developed. From the
basic display of data in a scatterplot, to diagnostic methods for
assessing assumptions and finding transformations, to the final
presentation of results, graphical techniques are commonplace
adjuncts to most methods of statistical analysis. Graphical methods
for categorical data are still in infancy. There are not many
methods, and they are not widely used. Wondering why this is
provokes several thoughts:
- Exploratory methods
Many of the graphical methods
described here make minimal assumptions about the data, like
the non-parametric statistical methods. Their goal is to help
the viewer see the data, detect patterns, and suggest
hypotheses.
- Graphic metaphor?
The basic metaphor for displaying
quantitative data is magnitude ~ position along
an axis . Categorical data consist of counts of
observations in discrete categories. Some of the methods
described here (e.g., sieve diagram, mosaic display) suggest
the metaphor
count ~ area
- Generalizations?
The scatterplot is a basic tool for
viewing raw (quantitative) data. It generalizes readily to
three or more variables in the form of the scatterplot matrix
-- a matrix of pairwise scatterplots. The mosaic display is a
simple graphic method for looking at cross-classified data
which generalizes to more than two-way tables. Are there
others?
- Analogies?
Model-based methods for analyzing
categorical data, such as logistic regression and log-linear
models, are discrete analogs of methods of regression and
analysis of variance for quantitative data. We can adapt some
of the familiar graphical methods to categorical data.
- Presentation plots for model-based methods
Results of
model-based analysis are almost invariably presented in tables
of estimated frequencies, parameter estimates, log-linear
model effects, and so forth. Effect displays of estimated
probabilities of response or log odds provide a useful
alternative.
- Practical power = Statistical power x Probability of
Use
Statistical and graphical methods are of practical
value to the extent that they are available and easy to use.
Statistical methods for categorical data analysis have
(nearly) reached that point. Graphical methods still have a
long way to go. One aim for this workshop is to show what can now be
done, with some examples of how to do it.
[Previous] [Next] [Up] [Top]
© 1995
Michael Friendly
Email: <friendly@yorku.ca>