Information visualization is the pictorial representation of data.

- Successful visualizations capitalize on our capacity to recognize and understand patterns presented in information displays.
- Conversely, they require that writers of scientific papers, software designers and other providers of visual displays understand what works and what does not work to convey their message.

This course will examine a variety of issues related to data visualization from a largely psychological perspective, but will also touch upon other related communities of research and practice related to this topic:

- history of data visualization,
- computer science and statistical software,
- visual design,
- human factors.

We will consider visualization methods for a wide range of types of data from the points of view of both the viewer and designer/producer of graphic displays.

**Lecture notes**: 1up PDF; 4up PDF**Assignment**:**Blogs**: Explore one or two of the blogs or web resources listed in the lecture notes or in Resources. Find a few examples of kinds of graphs you find interesting or worth exploring more.**Good/bad graphs**Explore the literature in your area, say several issues of one journal. Find one example of a data display (graph or table) that communicates particularly well, and one example of a display that communicates badly.

- Books, readings, blogs & web resources
- Goals of visualization; visualization as communication
- Roles of graphics in data analysis & presentation
- Effective data display
- Graphs: Good/bad, Excellent/evil

**Lecture notes**: 1up PDF; 4up PDF**Assignment**:- From the readings that you have done so far, find one example of a data graph that attempts to tell an interesting story of a useful topic. How well does it succeed? How could it be improved?

- Data graphs: 1D – 3D
- Thematic maps
- Network and tree visualization
- Animation & interactive graphics

- Data Visuaization Catalog A handy compendium of most known graphical methods. There is also a Blog section with extended discussions of variations of a given chart type, e.g., this one on Boxplots.
12 Data visualizations that illustrate poverty’s biggest challenges

- TED talks: Manuel Lima, A Visual History of Human Knowledge
- TED talks: Hans Rosling, The Best Stats …
TED talks: Nicholas Christakis, How Social networks Predict Epidemics

- Overview: The Milestones Project
- The first statistical graph
- The Big bang: William Playfair
- Moral statistics: the birth of social science
- Graphs in the public interest: Nightingale, Farr and Snow
- The Golden Age
- Case study: Re-Visions of Minard

- Friendly, M. A Brief History of Data Visualization
- Friendly etal. The First (Known) Statistical Graph: Michael Florent van Langren and the “Secret” of Longitude
- Friendly, M. The Golden Age of Statistical Graphics.
*Statistical Science*, 2008, 23, 502-535. - Friendly & Denis The early origins and development of the scatterplot
- Phan et al. Flow Map Layout, paper; see also: Web site
- Jeff Heer, A Brief History of Data Visualization, gives a lecture on his take on this history, interpreting and extending my work from a computer science perspective.

- Perceptual aspects
- Illusions
- Gestalt factors
- Accuracy of decoding

- Cognitive aspects
- Memory
- Color

- Cleveland & McGill (1984), Graphical Perception…
*JASA* - Christopher Healey, Perception in Visualization A web page on this topic, including interactive demos, animations and lots of examples
- Gordon & Finch (2015) “Statistician Heal Thyself: Have We Lost the Plot?”,
*JCGS*, 1210-1229, - Zeileis etal. (2009), “Escaping RGBland: Selecting Colors for Statistical Graphics,”
*Computational Statistics & Data Analysis*, 53, 3259–3270. - Ware (2013),
*Information Visualization: Perception for Design*, Chapter 4 (Color) - Kennedy Elliot, 39 studies abpout human perception in 30 minutes
- Why Should Engineers and Scientists Be Worried About Color?
- Stephen Few, Practical Rules for Using Color in Charts
- Thomas Lin Pedersen - Scico and the Colour Conundrum

- Human factors in graphic & information design
- Empirical study of graphs
- Experimental methods
- Graphical inference

- Heer & Bostock (2010), Crowdsourcing Graphical Perception…
- Skau & Kosara (2016), Arcs, Angles, or Areas: Individual Data Encodings in Pie and Donut Charts
- Haroz, Kosara, & Franconeri (2015), ISOTYPE Visualization - Working Memory, Performance, and Engagement with Pictographs
- Buja et al. (2009) Statistical inference for exploratory data analysis and model diagnostics

**Lecture notes**: 1up PDF; 4up PDF- Deep questions of Data Visualization; 4up

- Early attempts at standardization of graphs
- Bertin: Semiology of Graphics
- Graphics programming languages
- Wilkinson: The Grammar of Graphics
- Wickham: ggplot2

**Guest lecture**: Jim Rankin,*Toronto Star*. Jim’s lecture slides: Race, Crime & Policing (A big file)**Other lecture notes**: 1up PDF; 4up PDF

- Jim Rankin, “The Quest for Electronic Data: Where Alice meets Monty Python meets Colonel Jessep”
- 8 fantastic examples of data journalism
- The Data Journalism Handbook. An edited online collection of short articles on various aspects of data journalism, now into a second volume. Highly recommended.
- Journalism in the Age of Data A slick video report on data visualization as a story-telling medium, produced by Geoff McGhee at Stanford University. Eight video chapters, with associated resources, tutorials and online tools.
- Twitter: [@WSJGraphics](https://twitter.com/WSJGraphics), [@nytgraphics](https://twitter.com/nytgraphics). Explore some of the topics/examples they’ve posted.
- Twitter: [@ddjournalism](https://twitter.com/ddjournalism) posts some interesting examples of Data Driven Journalism.

The next two sessions, devoted to developing graphs with `ggplot2`

and related methods will take place in the Hebb lab, Rm 059 BSB.

- Installing R & R Studio You need to install both R & R Studio to profit from this.
- Working with R Studio; 4up A mini lecture to illustrate some aspects of R Studio
- Introduction to ggplot2; 4up
**tutorial**: ggplot2 tutorial: gapminder data; R script for this

getting started with ggplot This web page describes installing

`ggplot2`

and the`tidyverse`

of related packages. It also contains some useful links for learning to use`ggplot`

.The online chapter, Data Visualization of the book, R for Data Science is an excellent brief introduction to

`ggplot2`

. Another chapter in this book, Graphics for Communication takes up some more advanced topics.A free online book, An Introduction to Statistical and Data Sciences via R. The focus is on the

`tidyverse`

of R packages for data manipulation and`ggplot2`

for graphics. Also covers data modeling (regression), hypothesis testing, etc.

**Lecture notes**: 1up PDF; 4up PDF**tutorial**: Minard meets ggplot; R script for this

- Data wrangling: getting your data into shape
- Visualizing models: broom
- ggplot2 extensions
- tables in R

Hadley Wickham. Tidy data. The Journal of Statistical Software, vol. 59, 2014. See also the main vignette for the

`tidyr`

package.David Robinson. broom: An R Package for Converting Statistical Analysis Objects Into Tidy Data Frames. See also this broom presentation

Software Carpentry. Dataframe Manipulation with dplyr. A very nice interactive tutorial on manipulating data frames using

`dplyr`

and other tidy tools. Contains some Challenge questions and nice diagrams showing the effects of`select`

,`group_by`

and other tidy verbs. This is part of a larger series, R for Reproducible Scientific Analysis.

A collection of other R examples is available as R scripts, with some markup so that you can run them with Compile Report (Ctrl+Shift+K).

- Data tidying with dplyr and tidyr. A simple example used in the lecture of a survey of income by religeon from Pew Research.
- gapminder data: Summaries and boxplots by continent
- ggplot tutorial: gapminder data. A collection of examples showing various ways of plotting the gapminder data with
`ggplot2`

- gapminder data: Using
`broom`

for tidy model visualization. Shows the tools used to fit a collection of models for`lifeExp`

and visualize various model summaries.

Jamie Waese, Senior Manager of the Data Visualization Lab for the TD Bank Group will give a guest presentation, with the tentative title, *My travels from children’s TV to visualizing plant biology to directing data visualization efforts for a major bank.*

- ePlant, a data visualization system allowing plant biologists visualize the natural connections between DNA sequences, natural variation (polymorphisms), molecular structures, protein-protein interactions, and gene expression patterns at multiple levels.

These will take place Mar. 28 & Apr. 4. Details will be posted to the Students page

~~These will take place March 22, 29 & April 5 in the classroom.Due to the strike, these are now being done outside the classroom by web-based videos. See the 6135 Presentations spreatsheet to sign up for a topic.~~

Copyright © 2018 Michael Friendly. All rights reserved.

*friendly AT yorku DOT ca*