Matrix Algebra Problems with Applications in Psychology and
Multivariate Analysis
Multivariate Data Analysis
Michael Friendly
York University
Psychology 6140
Students in Multivariate Analysis need to develop skills in working
with matrices and reading matrix expressions. Some of the skills
which seem to be important are:
 Performing simple matrix calculations
 Manipulating algebraic expressions in matricessubstituting
identities, simplifying expressions, etc.
 Formulating substantive and statistical problems in matrix
termsor, recognizing such in journal articles.
 Understanding certain key results (theorems) of matrix algebra.
 Recognizing these results when they are used in your text.
The following problems are directed toward developing these skills.
Along the way, some matrix applications in psychology and multivariate
statistics are introduced.
I. Elementary matrix expressions

Factoring a matrix into the product of two matrices is the
basis of many methods in multivariate analysis. Choleski's
method involves factoring a matrix into the product of two
triangular matrices, one of which has 0s above the
diagonal, the other below the diagonal.
Fill in the ?
Hint: Let each ? be an unknown letter; solve the equations. 
entries in the following matrix equations:


é ê
ë


ù ú
û

= 
é ê
ë



ù ú
û

× 
é ê
ë



ù ú
û





é ê ê
ê ë


ù ú ú
ú û

= 
é ê ê
ê ë




ù ú ú
ú û

× 
é ê ê
ê ë



ù ú ú
ú û




Premultiply the matrix M by each of the following
matrices. Describe in words in each case what effect the
premultiplier has on the rows of M.
A = 
é ê ê
ê ë


ù ú ú
ú û

B = 
é ê ê
ê ë


ù ú ú
ú û

C = 
é ê ê
ê ë


ù ú ú
ú û




Let D be an m ×n matrix and let e_{i} = ( 0, 0, ¼, 1, 0, ¼, 0 )¢
denote the ndimensional vector consisting of zeros except
for the ith element, which is 1.
 What is D e_{i} (in words)?
 Express the matrices A and B from
Problem 12
above in terms of e vectors.

Experimenters in color vision have found that a subject can
match any spot of colored light by an additive mixture of
three colored lights of fixed spectral composition (Judd,
1951). Any three colors can be used as the primary lights
provided that none is a combination of the other two.
Say that an experimenter has a colorimeter that has three
primary lights with values r, g, and b. She
wants to specify her results in terms of a standard set of
tristimulus values, R, G, and B. By
experiment she finds that the amount of each of the standard
primaries needed to match each of his primaries is


a_{11} R + a_{12} G + a_{13} B 
 

a_{21} R + a_{22} G + a_{23} B 
 

a_{31} R + a_{32} G + a_{33} B 


Define the appropriate vectors and matrices to express the
above system of equations in matrix terms.

Let
x = (x_{1},¼,x_{n})¢
be a
vector containing the number of units purchased of each of a
variety of grocery items. Let
y = (y_{1},¼,y_{n})¢
be a vector of unit prices, such that
y_{i} = the price/unit of item i. For example,
x = ( 4 , 3 , 2 ) ¢ and y = (.95 , .25 , 6.50 ) ¢ might represent 4 dozen eggs at
$0.95 per dozen, 3 lbs. of apples at $0.25/lb, and 2 cans of
pate de fois gras at $6.50 per can (cheap, if it's entier).
 Formulate a matrix expresstion for the total (net) cost
of the commodities in x.
 Suppose each commoditiy is subject to a particular rate
of tax, these being given by a vector,
t = t_{1},¼,t_{n}¢
so that if
commodity i is taxed at 5%, t_{i} = 0.05.
Formulate an expression in terms of matrices and
vectors for the total cost of x including
taxes. [Remember, cost after tax = net cost × ( 1 + t).]

Cliff (1959) showed that the affective value of an
adverbadjective combination (e.g., moderately nice)
could be predicted quite accurately by multiplying an
affective value for the adjective (nice) by an intensity
value for the adverb (moderately).
Suppose one took a set of m personalitytrait adjectives
like industrious, happy, carefree, shy, etc. and formed
all possible combinations of these with n adverbs,
seldom, somewhat, moderately, extremelym ×n pairs.
If a group of
people rated each combination on an evaluative scale, the
average ratings could be assembled in an m ×n
matrix, C = { c_{ij} } = average rating of
adverb i and adjective j. Let v = (v_{1},¼,v_{m} )¢
be a vector of
intensity values for the adverbs and d = (d_{1},¼,d_{n})¢
be a vector of average
ratings for the adjectives by themselves. Cliff's model was:
c_{ij} = v_{i} ×d_{j} + k 

where k is a constant. [Cliff's model fit the data
rather well, in fact, and created considerable interest in the
possibility of developing mathematical models of aspects of
language.]
 Express C in this model as a product of two
matrices.
Hint: The number of summed terms is the number of columns in
a matrix product. 
 If this model holds, what is the rank of C?

A quadratic function of a scalar variable, x can be
written
f ( x ) = b_{0} + b_{1} x + b_{2} x^{2} 

 Express f ( x ) as an algebraic expression of the
vectors b = ( b_{0} , b_{1} , b_{2} ) ¢
and x = ( 1 , x , x^{2} ) ¢.
 In terms of this expression, evaluate f ( 2 ) for
b = ( b_{0} , b_{1} , b_{2} ) ¢.
 Generalize to find an expression for an nth degree
polynomial of x,
f ( x ) = b_{0} + b_{1} x + b_{2} x^{2}+ ¼+ b_{n} x^{n} 

 If person i has a value of x_{i}, let x_{i}
be the vector ( 1 , x_{i} , x_{i}^{2} ) ¢.
Write a matrix expression which will give the values of
f ( x_{i} ) as a vector for all persons in a group.
[Completing this question, you have just (re)discovered the
General Linear Model!]

Let j_{n} be a column vector of n unities
(ones). Using the following vectors,

 

x = ( x_{1} , x_{2} , x_{3} )¢ 




find each of the following inner products with j_{3}.
State in words the effect of taking an inner product with
j_{n}.
 a¢ j_{3}
 b¢ j_{3}
 j¢_{3} c
 x¢ j_{3}
[Matrix notation was invented to simplify the expression of linear
equations. The operation of multiplying vectors by the unit
vector has a particularly simple interpretation.]

Let X be a N ×p matrix consisting of the
scores of N people on p tests, where x_{ij} =
the score of person i on test j. Define the
matrix Z ( N ×p ) of standard scores,
z_{ij} = ( x_{ij}  [`x]_{j} ) / s_{j}. From
the fact that the correlation between two standardized
variables is r_{ij} = ( [1/N] ) å_{i = 1}^{N} z_{ji} z_{ik}, show that the p ×p matrix of all
intercorrelations can be expressed as
[In standard scores, the correlation is simply the average crossproduct.
Isn't that nice?]
II. Applications: Choice, preference and graph theory
The problems in this section deal with applications of matrix algebra
to representing relationships among objects by matrices and directed
or undirected graphs. Two basic type of relations between objects
can be represented in this way: An incidence matrix
contains entries of 0 or 1 in cell i, j to indicate that the
object in row i is related in a symmetric way to the object in
column j. A dominance or preference
matrix also uses binary entries, but the relation is not necessarily
symmetric, as in ``person i likes person j'', or ``player i beats
player j'' in a tournament.

In a sociometric experiment, members of a group are asked which
other members they like. Suppose the data are collected in a
choice diagram as given in Figure 1,
where an arrow going from i to j means that ``i likes j''.
Figure 1:
Directed graph for a sociometric
experiment. A directed arrow points from a person to the person
he/she likes.
 Convert the diagram to a matrix, C, where c_{ij} = 1 if i likes j, and 0 otherwise. The
diagonal elements, c_{ii} = 0.
 Let u be an n ×1 vector with unit
elements. One might say that u ¢ C gives
scores for ``popularity''. Explain why.
 Explain how C u can be interpreted as
scores for ``generosity''.

Suppose B is a similar matrix of choices, where
Imagine a similar matrix, P , where p_{ij} = 1 if
j believes that i likes him and p_{ij} =  1 if j believes that i dislikes him. The
diagonal elements of B and P are equal to 0.
 Let N = P ¢ B, so that
n_{ij} = åp_{ki} b_{kj}. Explain why n_{ij}
can be interpreted as a measure of ``identification''
between person i and person j.
 Determine whether the diagonal elements of N can
be interpreted as a measure of ``realism'' (k is
realistic if his beliefs about who likes him agree with
reality).
 Determine whether the diagonal elements of B P¢ can be interpreted as a measure of
``overtness'' (k is overt if his beliefs about X
conform to X's likes and dislikes).
 Compute the matrix C^{2} = C ×C
and interpret the elements of this matrix.
 Try also to find an interpretation for the elements of
C^{3}.
 Is there any interpretation you can come up with for the
elements of
C C ¢ or C ¢ C?

The expression, u ¢ C, where u is
a unit vector, tells how often a person is liked, so it is a
measure of popularity. Suppose two persons are chosen equally
often, but the first is liked by popular people and the second
is liked by less popular people; then it may be reasonable to
say that the first person is more popular, since he/she has
more indirect choices.
Suppose now we want to combine direct and indirect choices in
a measure of popularity, defined as follows: Define a
``transmission'' parameter, 0 < p < 1, and assume that
each direct choice contributes an amount p to a person's
popularity; an indirect choice through one intermediate person
contributes an amount p^{2}; an indirect choice through two
intermediates contributes an amount p^{3}, etc.
 Show that the measure that takes into account all
indirect choices in this way can be found from the
vector u ¢T, where
T = p C +p^{2} C^{2} +p^{3} C^{3} + ¼+p^{ n1 } C^{ n1 } 
 (1) 
 Show that the following identity holds for T
defined above:
This equation provides a way to solve for T
without evaluating the entire matrix sum in equation
(1).
Hint: Form a second equation by multiplying (1) by C, and subtract. 
 Show that equation (2) above is equivalent to
where, t ¢ = u¢T
and s ¢ = u¢C.
[Source: Van de Geer (1971)]

Consider a hypothetical sociometric choice matrix for three
people, as in the previous problem:
and assume that the quantity p, the relative
effectiveness of transmission thru a sociometric link, has the
value 1/2. Solve for the respective elements of t,
the vector of status indices.

In a paired comparison experiment, a subject is presented with
all pairs of stimuli that can be formed out of a set (e.g.,
political candidates), and is asked to indicate for each pair
which stimulus he prefers. For n stimuli, the data can
be collected in square matrix, X of order n × n where x_{ij} = 1 if j is preferred to
i and is 0 otherwise. The diagonal elements are zero.
 Verify that if x_{ij} = 1 then x_{ji} = 0 where i ¹ j.
 Figure 2 gives a diagram of choices, where an arrow
going from i to j indicates that i is
preferred to j. Convert this diagram to a choice
matrix X.
Figure 2:
Choice graph for a paired comparison
experiment.
 Calculate u¢ X where u is
a unit vector and interpret the elements of the
resulting vector.
 Rearrange the rows and columns of X in such a way
that all 1 elements are above the diagonal, so the
result is a triangular matrix. Interpret the order of
the rows and columns in this new matrix.
 Inspection of the diagram reveals that all choices are
transitive, that is for any three stimuli, if
i is preferred to j and j to k,
then i is always preferred to k. If
k were preferred to i, we would have a
circular triad. Draw a diagram in which circular
triads do occur, and convert it to the corresponding
choice matrix. Calculate u ¢ X for
this matrix, and compare the value for the result for
a transitive matrix.
 Let v = u ¢ X. Compare the
values of v ¢ v as calculated
from the figure with those calculated from the matrix
X you constructed in part (e). What effect does
the presence of circular triads have on the values of
v = u ¢ X?

The problem in archeology of placing sites and artifacts in
proper chronological order is called sequence dating or
seriation. The same problem, of ``putting in order''
arises in developmental psychology and test theory, but the
context in archeology is simplest.
An assumption made in archeology is that graves which are
``close together'' in temporal order will be more likely to
have similar contents than graves further apart in time.
Consider the situation in which various types of pottery,
P_{1} , P_{2} , ... P_{n} are contained in graves G_{1} , G_{2} , ... G_{m}. Let A be the m ×n
matrix with elements,
a_{ij} = 
ì í
î

 

if G_{i} doesn¢t contain P_{j} 




 Show that the element g_{ij} of the matrix G = A A ¢ is equal to the number of common
varieties of pottery found in both graves G_{i}
and G_{j}.
 Show that the diagonal element, g_{ii} of G
gives the number of varieties of pottery in grave
G_{i}
 Show that the archeological assumption stated above is
equivalent to the statement that the larger the
g_{ij}, the closer the graves G_{i} and G_{j} are
in time.
 If five types of pottery are contained in four graves
according to the matrix A below, find G,
and arrange the four graves in time order.
[Source: Williams (198?), Kendall (1969)]

In a Markov model for short term serial recall, Murdock (1976)
suggests that an item (e.g., nonsense syllables, like DAXUR)
may be in any one of three states for a learner:
 state IO
 where both item and order information have
been retained (i.e., the learner remembers the
elements of the item and the order in which they
occurred).
 state I
 where only item information is retained (i.e.,
s/he remembers it contained D, A, X, U, R but not
what order they were in).
 state N
 where neither type of information is retained.
A Markov model is specified by identifying the states
and by specifying a ``transition matrix'', which contains the
probabilities that the item changes from one state to another.
In Murdock's model, the transition probabilities are shown
symbolically in the matrix T below.
The parameters are defined as follows:
 a gives the probability that an item will leave
the state IO.
 If it leaves that state, b is the probability that
item information as well as order information will be
forgotten.
 Given that the item is in state I (order information has
not been retained), c is the probability that
item information will be forgotten as well.
A characteristic of the Markov model is that these
probabilities apply to each interval of time, Dt
during the retention interval. Murdock further assumes that
at the start of the retention interval, an item starts in
state IO with probability p, and with probability 1  p
it starts in state I. The psychological rational
here might be that encoding an item at all establishes item
information, and something more is required for the learner to
encode serial order too. An attentive subject would never
miss an item completely (Murdock assumes), so there is no
probability that an item starts in state N. These starting
state probabilities can be put in a vector, s,
 Show that the probabilities that an item is in any of
the states, IO, I, or N after 1 time interval, D t is s ¢ T.
 Show that after 2 transition intervals, the
probabilities are s ¢ T ×T = s ¢ T^{2}, and after 2 time intervals,
they are s¢ T^{3}. [You may reason
from the result of part (a), together with the diagram
above, rather than multiplying the symbolic matrix
entries.]
 Assume parameter values of p = 1.0, a = b = 0.1, and c = 0.5. Find the probabilities that
an item is in state IO, I, and N after 1, 2, and 3
transitions.

The matrix M below represents the performance of five
people on three binary items, where m_{ij} = 1 indicates
that item i is answered correctly by person j.
 Calculate the product M M¢. What do
the diagonal elements represent? What do the
offdiagonal elements represent?
 Calculate the product M¢ M. What do
the diagonal elements represent? What do the
offdiagonal elements represent?

In studies of perceptual identification, it is common to
present stimuli (say, letters) in a background of noise and
require the subject to identify the stimulus seen or heard.
For n stimuli, the results are collected in an n ×n matrix, X, where
x_{ij} = Prob ( subject says j  stimulus i was presented ) 

In studying the relationships between stimuli in such a
matrix, it is sometimes assumed that confusions are symmetric,
x_{ij} = x_{ji}, but for some types of stimuli, the results
are not symmetric.
 Express the average proportion of correct responses as a
function of the elements of X.
 Show that an arbitrary square matrix, X can be
expressed as the sum of two matrices,
where S = ( X + X ¢) / 2 is
symmetric, and A is a skewsymmetric
matrix, i.e., a_{ij} =  a_{ji}.
 Why might it be reasonable to regard A as
containing ``pure asymmetry'' information and S
as containing ``pure symmetry'' information?
[Psychological models for perceptual identification sometimes
assume symmetry for errors, and sometimes do not. The decomposition
of X allows both to be studied.]
III. Bases, orthogonality and vector spaces

On a sheet of graph paper,
 Draw directed lines from the origin to represent the
vectors,
 Draw on the graph directed lines which represent the
vectors:
 The vectors a and b form a basis
for the twodimensional vector space of the graph
paper. Show that each of p, q,
r, s, and t can be
expressed as linear combinations of a, and
b of the form,
where n and m are real numbers.
 The vectors p and q also form a
basis for this vector space. Show that a,
and b can be expressed as linear
combinations of p, and q of the
form, u p + v q where
u and v are real numbers.

Show that the vector c = ( c_{1} , c_{2} , ... c_{n} ) ¢ is orthogonal to the vector
( 1 , 1 , ¼ 1 ) ¸Ön if and only if å_{ i = 1 }^{n} c_{i} = 0.

Consider the set of (n  1) ndimensional vectors:


( 1, 1, 0, 0, ..., 0 ) ¸Ö2 
 

( 1, 1, 2, 0, ..., 0 ) ¸Ö6 
 

( 1, 1, 1, 3, 0, ..., 0 ) ¸Ö12 
 

 

( 1, 1, 1, 1, ..., 1, (n1) ) ¸ 
 ______ Ö n^{2}  n




These vectors are referred to as Helmert contrasts
. They are useful for comparing ANOVA factors whose levels
are ordered, but not quantitative. Show that for n = 5
the vectors, c_{1} , c_{2} , ...,
c_{ n  1 }:
 are all orthonormal (i.e., pairwise orthogonal and of
unit length).
 are all orthogonal to the unit vector, j_{n}.
 any other vector d, which is orthogonal to
the unit vector, j_{n}. can be expressed as a
linear combination of c_{1}, c_{2},
..., c_{ n  1 }.
These three facts imply together that any other set of
contrasts among n levels can be expressed in terms of c_{1},
c_{2}, ..., c_{ n  1 }. Isn't that nice?

Consider the n ×n matrix
 Show that rank( C ) = n  1.
 Show that C j = 0. That is,
j is orthogonal to every row of C, so
C is the orthogonal complement of j.
 Let x_{·} = ( x_{ ·1 } , x_{ ·2 } , ..., x_{ · n } ) ¢ be the means of n treatment groups in a
single factor experiment. Interpret
x_{·}¢ j .
 Evaluate C x_{·}. Using the
results of (a), (b), and (c) above, show that C x_{·} is a vector of n
contrasts among the group means, of which only n  1
are independent. What do these contrasts
represent?

If c_{1} , c_{2} , ..., c_{n}
are vectors, all orthogonal to a vector x, show
that any vector in the span of c_{1} , c_{2} , ..., c_{n} is orthogonal to x.

Show that any linear combination of a set of independent
contrasts is also a contrast.

Given the three 4dimensional vectors below, suppose we wish to
find a unitlength vector, x, which is orthogonal
to each.
 Show that x must satisfy c_{1} ¢ x = c_{2} ¢ x = c_{3} ¢ x = 0.
 What additional equation must x satisfy?
 Find x.

Find two unit vectors, a, and b, which
are each orthogonal to the vector, m = 1 ¸Ö3 ( 1, 1, 1 ) ¢.

The general linear model, written in matrix notation, is given by
where
y is an n ×1 observation vector,
X is an n ×p design matrix,
b is an p ×1 vector of unknown parameters, and
e is an n ×1 random vector of residuals.
For regression models, X is typically a unit column vector
followed by columns of predictor variables.
For ANOVA models, X is typically a unit vector followed
by columns of indicator (0/1) variables.
 Consider a twoway 2 ×3 ANOVA design with n = 1 observation
per cell (Elswick etal, 1991).
If the model (without interaction) is given by
y_{ij} = m+ a_{i} + b_{j} + e_{ij}, i = 1:2; j = 1:3 
 (4) 
and the vector of parameters is
b¢ = ( m, a_{1}, a_{2}, b_{1}, b_{2}, b_{3} ) ,
find the design matrix X so that the elements in y
(a 6 by 1 vector) can be expressed in the form of Eqn. (3).
 By considering linear dependencies among the columns of X,
determine the rank of X. Hazard a guess about the relation
between rank and degrees of freedom.
 Find the row echelon form X^{*} of X
(Try the ECHELON function in APL or SAS
IML).
Does this confirm your observation of the rank of X?
 The product X^{*} b
shows the linear combinations of the elements of b
which can be estimated from the data.
Find X^{*} b and interpret the result.
IV. Transformations, projections & quadratic forms

Let x = ( x_{1} , x_{2} ) ¢ be a 2dimensional
vector and consider a rotation in which the vector is rotated
counterclockwise about the origin through an angle of
q degrees, to a position, x^{*} = ( x_{1}^{*} , x_{2}^{*} ), as shown in Figure 3. In scalar
terms, the rotation can be expressed as
Figure 3:
Rotation of a vector in 2dimensional
space
 Find the matrix T in the matrix expression of the
above equations, x^{*} = T x.
 Show that this matrix T is orthonormal.

Let L be a matrix of factor loadings of a set of
tests with respect to one set of coordinate axes. If the
coordinates are rotated, the matrix equation for expressing
the matrix V of loadings with respect to the rotated
axes is V = L T. Suppose there are
four tests and two factors, and L is given by
 Find a matrix T representing a transformation
through a positive angle of 30^{°}.
 Evaluate the product, L T, rounding
to two decimals. Note that some elements in V = L T. are (approximately) zero. The
principle of simple structure in factor rotation
seeks a rotation for which many elements of the rotated
matrix become zero.

The rotation procedure of factor analysis can be expressed as a
set of linear transformations, i.e., as a matrix product. Let
V = L T represent such a linear
transformation. The matrix L is a factor matrix
satisfying R = L L¢, where
R is the correlation matrix V is the transformed
factor matrix, and T is a matrix of coefficients which
specifies the rotation.
 Find a condition on the matrix T such that the
transformed factor matrix, V will also
satisfy the factor equation.
 Show that the particular matrix T of problem
1 satisfies this condition, within rounding error.

Consider a simple twodimensional example in which three tests
have these loadings on two underlying factors:
Factor1 Factor2
Test 1 0.50 0.30
Test 2 0.30 0.20
Test 3 0.40 0.70
Each test can be represented as a point in the plane of
Factor1 (xaxis) and Factor2 (yaxis). Plot these test
points, and find their transformed coordinates when the
coordinate axes are rotated through a positive angle of
37^{°}. Use sin37^{°} = 0.60, cos 37^{°} = 0.80.

Let x = ( x_{1} , x_{2} ) ¢, and
A = 
é ê
ë



ù ú
û

, B = 
é ê
ë



ù ú
û



Expand the
following quadratic forms:
 x ¢ A x
 x ¢ B x
 Describe in words the difference between the expansions
in (a) and (b).

Describe (or plot) the set of points with coordinates ( x, y )
that satisfy the matrix equations:
Hint: Multiply out, then repeat: {substitute a
trial value for x and solve for y} 
.


æ è


ö ø


é ê
ë


ù ú
û


æ ç
è


ö ÷
ø

= 60 


(x , y) 
é ê
ë


ù ú
û


æ ç
è


ö ÷
ø

= 60 


Evaluate the quadratic forms, u ¢ A u , where u = ( x, y, 1 ) ¢ and
where the matrix A is:
(a) 
é ê ê
ê ë


ù ú ú
ú û

(b) 
é ê ê
ê ë


ù ú ú
ú û

(c) 
é ê ê
ê ë


ù ú ú
ú û

(d) 
é ê ê
ê ë


ù ú ú
ú û



What functions would be obtained by setting each of these
quadratic forms equal to 0?

The next few problems demonstrate how some of the common
expressions for sums of squares can be represented by
quadratic forms. Suppose n observations are represented
by a vector x = ( x_{1} , x_{2} , ..., x_{n} )
and the sample variance is based on the sum of squared
deviations,
SS = 
n å
i = 1

( x_{i}  
_ x

)^{2} = 
n å
i = 1

x_{i}^{2}  n 
_ x

2



Show that
 [`x] = ( 1 / n ) x ¢ j.
 Sx_{i}^{2} = x ¢ x.
 n [`x]^{2} = ( 1 / n ) x ¢ J x = ( 1 / n ) x ¢j j ¢ x, where J is the n × n matrix with all elements equal to 1.
 Hence, show that SS can be expressed by the
quadratic form,
 Write out explicitly the matrix ( I  J / n ) for n = 3. [Source: Searle, 1966]

ANOVA and regression tests are based on breaking up sums of
squares into independent, additive portions attributable to
various sources of variance. This problem demonstrates some
of the properties of the quadratic forms associated with these
sums of squares.
Consider the wellknown identity for a set of test scores,
y_{i} , i = 1, ..., n:

n å
i

y_{i}^{2} = n 
_ y

2

+ 
n å
i

( y_{i}  
_ y

)^{2} 
 (5) 
In the previous problem, it was shown that the sum of squared
deviations term could be written as a quadratic form,
y ¢ ( I  J / n ) y = y ¢ A_{2} y (say) 

 Show that the above identity (3) can be written as
y ¢ A_{0} y = y ¢ A_{1} y + y ¢ A_{2} y 

where A_{0} , A_{1} , A_{2} are n ×n symmetric. Find A_{0} and A_{1} .
 Show that the following properties hold:
 Show that:
Matrices such as A_{1} and A_{2} with these
properties are called idempotent projection
matrices. Every sum of squares formula such as (3) in
ANOVA and regression can be represented as a sum of
quadratic forms with idempotent projection matrices.
 Evaluate the expression,

y ¢ A_{1} y / 1 y ¢ A_{2} y / n  1



for y = ( 2 , 3 , 3 , 3 , 5 , 8 )¢.
Interpret this expression and state what
statistical test it is used in.

In psychological experiments, response classes are often
defined arbitrarily. As a result, an experimenter may wish to
combine response classes when he analyzes data. In formal
models it is useful to have a representation of this procedure
of combining response classes.
Bush, Mosteller & Thompson (1954) use a projection
matrix to combine responses. An example is the matrix,
This matrix is used as follows: Let there be three response
classes initially, with probabilities, r_{1} , r_{2} ,
and r_{3}, in a column vector, r.
 Postmultiply P by r, and describe
verbally the result.
 Show that the matrix P satisfies P^{2} = P.

Guttman (1944) discusses a method of factoring a correlation
matrix, based on the equation,
f = R x ( x ¢R x )^{ 1/2} 
 (6) 
where f is one column of a factor matrix, F;
R is the correlation matrix; and x is a
column vector containing arbitrarily selected coefficients.
Once a single factor has been found using this method, the
same formula can be applied to the residual correlation
matrix, R_{1} = R  f f ¢, to
find a second factor, and so forth. A factor matrix F
which is build up in this way will always satisfy the basic
equation of factor analysis, F F ¢ = R.
Suppose that the matrix of intercorrelations on four tests is
The elements on the main diagonal of R are not 1.0 as
would be ordinarily expected, but are values called
communalities. representing the variance of each test which
is shared with the other tests.
 Apply Guttman's formula (4) to the matrix R to
find a single column f of the factor matrix.
Assume x to be the vector x = (1, 1, 1, 1 ) ¢
 Show that in this case, f f ¢ = R, or equivalently, R_{1} = R  f f ¢ = 0. In other words, the only
one factor can account for all the correlations in
R.
[Congradualtions! You've just discovered the basis for factor
analysis.]

Resolve the column vector x = ( x_{1} , x_{2} , x_{3} ) ¢ into two components, x_{a} and x_{b} , such that
 x = x_{a} + x_{b}
 x_{a} is parallel to the vector ( 1 , 1 , 1 ) / Ö3
 x_{b} is perpendicular (orthogonal) to ( 1 , 1 , 1 ) / Ö3
[And now, the GramSchmidt method for transforming correlated
variables into orthongonal ones!]

Arbuckle and Friendly (1977) consider the problem of
transforming a vector of factor loadings, x to a
vector y so that y is as smooth as
possible in the sense that the sum of squares of successive
differences, å_{i = 1}^{n  1} ( y_{i}  y_{i + 1} )^{2} is as small as possible.
 Find an (n1) ×n matrix P such that
P y = 
é ê ê
ê ê ë



ù ú ú
ú ú û

= d (say) 

 Show that the sum of squares of successive differences
referred to above is given by d ¢ d .
 If y is obtained as a linear combination of
x, by y = T x,
find an expression not using parentheses to express the
sum of squared successive differences in terms of
T, x, and P.
V. Determinants, inverse, and rank

Find the inverse of

 Verify that the inverse of
 What is the inverse of
Can you determine
what the inverse of any diagonal matrix is?

If
show that  A  = a_{11} a_{22}  a_{21} a_{12} = a_{11} ( a_{22}  a_{21} a_{11}^{1} a_{12} ).
This last expression has an important analog for partitioned
matrices.

A frequent question in the testing field is whether to add
another test to a battery of n tests to attempt to increase the
prediction of an external criterion. Horst (1951) proposes a
timesaving method of answering this question. He derives a
formula for the validity of an additional test, k, that
would be required to increase the validity of the battery by an
amount a.
Let R be the (n+1) ×(n+1) matrix of
intercorrelations, partitioned with test k as the last row
and column,
where R^{*} is the n ×n matrix of
correlations of all tests except test k, and r_{k} is the column vector of correlations of test k with
each other test.
To develop his formula, Horst needed to find the inverse of
R, that is a matrix S, such that R S = I, the identity matrix. Thus, S = R^{1}.
 Partition the matrix S conformably with R,
as
and
write out the equation R S = I in
terms of partitioned matrices.
 Write out the four submatrix equations implied by the
equation in part (a).
 Find S = R^{1} by solving for S^{*}, u, and d.

If a matrix is upper (lower) triangular, then its inverse is
also upper (lower) triangular. Show that this is true for the
matrix S below by finding its inverse.
Hint: Let
and write out the 9 scalar equations implied
by S^{1} S = I.

In a ratio scaling experiment four stimuli are presented in all
possible pairs to the subject. For each pair, the subject is
required to give a number representing the relative
``strength'' of the first stimulus compared to the second.
For example, he/she may be asked to judge the relative
brightness of two lights, or which of two tones is higher in
pitch.
One hypothesis for this type of judgment is that the subject
has internal subjective intensity values for each stimulus,
and to make a judgment, ``computes'' and reports the ratio of
the two values for a pair.
Let the matrix R = { r_{ij} } contain the judgments of
each pair, for stimulus i relative to stimulus j, and
let s = ( s_{1} , ..., s_{n} ) ¢ be a vector
of the subject's internal scale values for these stimuli.
 Express the hypothesis, r_{ij} = s_{i} ¸s_{j} in
matrix terms.
 What does this hypothesis imply about the rank of
R if the model holds?
 What does this imply about the latent roots of R ¢ R?

In the randomized block design, each of n subjects is given
each of a series of p experimental treatments, resulting
in a score, y_{ij} = score of subject i under treatment
j. With one observation per cell, the standard analysis
of variance model (without interaction) is,
y_{ij} = a_{i} + b_{j} ( + e_{ij} ) 

 Write Y (n ×p) = y_{ij} and express
Y in terms of a_{i}, and b_{j} as
a product of two matrices.
 Tukey's ``one degree of freedom for nonadditivity''
adds an additional term to this model, giving:
y_{ij} = a_{i} + b_{j} + g a_{i} b_{j} ( + e_{ij} ) 

where g is one additional parameter to be estimated.
Write a matrix product expression for
Y in this model. [You may not simply multiply Y by
the identity matrix. Use no + signs in the elements in
the factors in your product.]
 What is the rank of Y if the model of part (a)
holds?
 What is the rank of Y if the model of part (b)
holds?

Suppose an animal running a maze is scored under 3 experimental
conditions for (a) number of wrong turns made, (b) number of
pauses, and (c) time to reach the goal. Some data is given
below, with scores expressed as deviations from the column
means.
Errors Pauses Time
Cond. 1 2 0 4
Cond. 2 1 6 10
Cond. 3 3 6 14
 Determine the rank of this matrix.
 If time and errors have been measured, does the pause
measure contribute any additional information?
 If e is the (column vector) number of errors,
p is the number of pauses, and t
the time taken, can you find a constant a to make the
following equation true for these data?
VI. Simultaneous equations

On a sheet of graph paper draw lines representing each of the
three linear equations,
 Show that this system of equations has no solution and
interpret this fact with reference to your geometrical
interpretation.
 Suppose we adopt the following procedure for ``solving''
the system. Take the equations and solve them in
pairs (1) and (2), (1) and (3), (2) and (3). This
will give values, say ( x_{1} , y_{1} ), ( x_{2} , y_{2} ), and ( x_{3} , y_{3} ) which may not
agree. ``Solve'' the equations in this way and locate
the solution points on your graph.
 Now, suppose we adopt as the final solution to this
system the average value,
( 
_ x

, 
_ y

) = 
æ ç
è


(x_{1} + x_{2} + x_{3} ) 3

, 
(y_{1} + y_{2} + y_{3} ) 3

ö ÷
ø



Locate this point on your graph and interpret
geometrically.

Show that the system of equations,
is underdetermined. That is, there is no unique pair ( x , y ) which satisfy both equations.
 Draw these two equations as lines on a graph and explain
why there is no unique solution.
 If the second equation is changed to 2 x + 2 y = 3, explain why these two equations become
inconsistent.
VII. Latent roots and vectors
Multiplication of a vector v by a square matrix A
can be regarded as a mapping or transformation of v into
some other vector, v^{*} in the vector space,
A latent vector, or eigenvector of the
matrix A is a special vector whose direction is unchanged by
that transformation. That is, if v is a nonzero vector
such that
then v is an eigenvector or latent vector of A, and
k, the constant of proportionality, is the eigenvalue or latent
root corresponding to v.

In Householder's method for determining eigenvalues of a real
symmetric matrix, A, the matrix is subjected to a series
of orthogonal transformations of the form
P = I  2 w w ¢ where w ¢ w = 1 

 Show that P is orthonormal and symmetric.
 If the latent roots of A are l_{1} , l_{2} , ..., l_{n}, what are the latent
roots of P A P ?
 If the latent vectors of A are v_{1} , v_{2} , ..., v_{n}, what are the
latent vectors of P A P ?

The eigenvalues and eigenvectors of any square, symmetric
matrix A ( n ×n ) can (in principle) be found
by (1) finding the n values of l which satisfy
 A  lI  = 0; (2) solving A v = lv for v using each
value of l determined by step (1). Using this
method, find the eigenvalues and eigenvectors of the following
matrices:
(a) 
é ê
ë


ù ú
û

(b) 
é ê
ë


ù ú
û




Show that the eigenvalues of the matrix
are ±1.

In the Markov model for serial learning outlined in Problem
27 verify that the eigenvalues of the transition
matrix, T are
Hint: Solve the characteristic equation  T  lI  = 0. 

A magic square is a matrix in which the elements of each
row, column and the two main diagonals add up to the same
number, this number being called the magic number For
the magic square, A, below, show that one latent root of
this magic square is its magic number. (This is true for all
magic squares).
References
 [1]
 Arbuckle, J. & Friendly, M.L. (1977).
On rotating to smooth functions. Psychometrika,
42, 127140.
 [2]
 Bush, R.R., Mosteller, F. & Thompson, G.L. (1954).
A formal structure for multiple choice situations. In
Thrall, Coombs & Davis (Eds.) Decision processes.
NY: Wiley.
 [3]
 Cliff, N. (1959).
Adverbs multiply adjectives. Psychological Review,
66, 2744.
 []
 Elswick, R. K., Jr., Gennings, C., Chinchilli, V. M.,
and Dawson, K. S. (1991).
A simple approach for finding estimable functions in
linear models. The American Statistician, 45, 5153.
 [4]
 Guttman, L. (1944).
General theory and methods of matric factoring.
Psychometrika, 9, 116.
 [5]
 Horst, P. (1951).
The relation between the validity of a single test and its
contribution to the predictive efficiency of a test battery.
Psychometrika, 16, 5766.
 [6]
 Judd, D.B. (1951).
Basic correlates of the visual stimulus. In S. S. Stevens
(Ed.) Handbook of experimental psychology New York:
Wiley.
 [7]
 Kendall, D. G. (1969).
Some problems and methods in statistical archaeology.
World Archaeology, 1, 61.
 [8]
 Kendall, M.G. (1962).
Rank correlation methods London: Griffin.
 [9]
 Murdock, B. B. Jr. (1976).
Item and order information in shortterm serial memory.
Journal of Experimental Psychology: General, 105,
191216.
 [10]
 Searle, S.R. (1966).
Matrix algebra for the social sciences New York: Wiley.
 [11]
 Van de Geer, J.P. (1971).
Introduction to multivariate analysis for the social
sciences San Francisco: W. H. Freeman.
 [12]
 Williams, G. (198?).
Mathematics in archaeology. Mathematics Teacher,
?, 5658.
File translated from
T_{E}X
by
T_{T}Hgold,
version 2.78.
On 19 Oct 2001, 12:37.