# Course Outline

Multivariate Data Analysis
Psychology 6140

## Course Description

Psychology 6140 is designed to provide an integrated, in depth, but applied approach to multivariate data analysis and linear statistical models in behavioural science research. There is a strong emphasis throughout the course on graphical methods for visualizing data and the results of statistical models. The statistical topics covered will include:
• Regression analysis
• Univariate and multivariate ANOVA and ANCOVA
• Discriminant analysis
• Canonical correlation analysis
• Principal components and factor analysis
• Cluster analysis, Multidimensional Scaling and/or Logistic regression (as time permits)

Most of these methods are actually special cases of the General Linear Model. By developing these techniques within this framework, the student is led (hopefully) to appreciate the conceptual unity underlying all forms of regression and all analysis of variance designs, both univariate and multivariate.

This unification of these seemingly different forms of analysis is achieved through the use of matrix algebra to formulate the various models. Therefore, the first part of the course (about 5-6 weeks) is devoted to the necessary mathematical skills. If you wish you can get an early start on this part by looking at my description of matrix algebra preparation for the course.

Although all of the matrix algebra required for the course will be covered in the readings and lectures, time constraints dictate that this treatment will be somewhat brisk, and either a modicum of initial familiarity or a willingness to work hard will be assumed. In order to facilitate exercises and homework problems which involve matrix operations, students will be given instruction in using a computer package for matrix algebra.

Software Notes: In the lectures and lab sessions, I will use both SAS and R for examples and tutorials. Most of the practical assignments and graded work can be done with any software you are comfortable with; however excercises using matrix algebra will probably be most convenient in SAS/IML or R (or JMP or Matlab).

Both R and SAS/IML provide students with the equivalent of a "matrix desk calculator" which makes exploration and learning quite efficient; the facilites of SAS provide the power and data management facilities needed for larger projects.

## Texts and reference materials for the course:

There are two principal texts for the course, and one text on matrix algebra (Green etal.). For most topics in the course, parallel readings are assigned in Johnson & Wichern and Tabachnick & Fidell.
1. Green, P.E., Carroll, J.D & Chaturvedi, A. Mathematical tools for applied multivariate analysis, [Amazon web site] Academic Press, (Revised Edition), 1997. [ISBN 0-12-160955-3] {there are copies of the old edition in the Psychology Resource Center and online copies of some chapters in a password-protected area.}
2. Johnson, R.A. & Wichern, D.W. [brief desc.] Applied multivariate statistical analysis [Pub. web site], Pearson Education, 2013, 6th Ed. revised [ISBN-13: 9781292024943; (copies of 6th Ed. in Scott/Steacie, QA 278 J63 2007)]
3. Tabachnick, B. G. & Fidell, L. S. [brief desc.] Using Multivariate Statistics [Pub. web site] Allyn & Bacon, 2013, 6th Ed. [ISBN-13: 9780205849574; QA 278 T3 2013 (copies in Scott/Steacie of 6th Ed)] Also of interest: Companion website for the book with data files for the examples.

In addition, you may want to use one or more of the following for reference or supplementary reading. The first two provide alternative readings for some sections of the course, and are available in the Psychology Resource Center. The others relate to computing resources.

4. Morrison, D. F. Multivariate Statistical Methods (3rd ed.), 1990. New York: McGraw-Hill.
5. Stevens, J. Applied Multivariate Statistics for the Social Sciences, 4th ed., L. Erlbaum Associates 2002. [ISBN 0-8058-3777-9]
6. Friendly, M. SAS System for Statistical Graphics, First Edition., SAS Institute, 1991. [ISBN 1-55544-441-5; everything you wanted to know about statistical graphics.]
7. Friendly, M. Visualizing Categorical Data., SAS Institute, 2000. [ISBN 1-58025-660-0]

Grades in the course will be based on one take-home exam, one mid-year project (a data analysis project), and one end-year data analysis project: three units, each worth 33.3%. See Projects for details on all but the first take-home exam (on matrix algebra).

The two data-analysis projects will involve research reports involving analysis of either existing data or your own. The first will focus largely on regression techniques. The final project should be based on methods of the second half of the course using either existing data or your own. My intention is that you learn to execute, interpret and write the results of multivariate analysis.

There will also be frequent assignments and problems throughout the course, which I will review in class. Assignments will not be graded, but are an essential part of your learning. You should plan to devote 3-4 hrs/week to assignments. At first there will be a lot to learn about the mechanics of using the computer system itself, but it will get easier as you progress.

## Topic sequence

1. Part I: Statistical and mathematical background
1. Overivew of multivariate methods [Lecture slides]
2. Graphical techniques for multivariate data [Lecture slides] [tutorial: Intro to SAS for Windows ]
3. Data screening [Lecture slides]   [tutorial: Data Exploration and Graphics with SAS]
4. Matrix algebra
5. Multivariate distribution theory
2. Part II: General Linear Model
1. Regression analysis
2. Hotelling's T2
3. Multivariate analysis of variance
4. Analysis of covariance
5. Discriminant analysis
6. Loglinear models for categorical data (if time permits)
3. Part III: Dependence among variables
1. Canonical correlation
2. Principal components analysis
3. Factor analysis
4. Cluster analysis
5. Multidimensional scaling (if time permits)

This is just a rough sketch of the initial readings on matrix algebra. See the individual assignments or the lecture/reading schedule for details (those are updated).

Some students like to have an alternative source for reading about certain concepts. Where possible, I've included parallel readings from Morrison and/or Stevens, but if you've read the G&C material without trouble, that should be sufficient.

Note: TBD=to be distributed; PRC=Psychology Resource Centre; G&C=Greene & Carroll; J&W=Johnson & Wichern; T&F=Tabachnick & Fiddell.

1. Graphical techniques: Friendly, Statistical graphics for multivariate data, SAS SUGI Conf, 1991 (TBD); Wainer, H. Graphical data analysis, Ann. Rev. Psychol., 1981 (PRC); Chambers et. al. Graphical methods for data analysis, Chapter 5 (PRC)
2. Overview: G&C, Chapter 1; J&W, 1.1-1.4. T&F, Ch 1-2
3. Data screening: T&F, Chapter 4.
4. Basic vector and matrix operations: G&C, Chapter 2: 2.1-2.6; J&W, Supplement 2A Stevens, Chapter 2: 2.1-2.3; (Morrison, Chapter 2: 2.1-2.3)
5. Determinant & Inverse: G&C, Chapter 2: 2.7-2.9; Stevens, Chapter 2: 2.4-2.5; (Morrison, Chap. 2: 2.4-2.5)
6. Vector geometry: G&C, Chapter 3; J&W, 2.1-2.2;
7. Linear transformations & rank: G&C, Chapter 4; (Morrison, Chap. 2: 2.6-2.7)

## Computer Accounts, Software, and Class Files

Computer work for the course can be done in the Hebb lab, Rm. 158/159 BSB. You will need an individual account on the Health Psychology Lab, which you can create yourself using York's Manage My Services application.