Visual Encoding Design

Similar documents
Principles of Data Visualization. Jeffrey University of Washington

Graphical Perception. Graphical Perception. Graphical Perception. Which best encodes quantities? Jeffrey Heer Stanford University

Graphical Perception. Graphical Perception. Which best encodes quantities?

CSE Data Visualization. Graphical Perception. Jeffrey Heer University of Washington

STAT 113: Statistics and Society Ellen Gundlach, Purdue University. (Chapters refer to Moore and Notz, Statistics: Concepts and Controversies, 8e)

Statistics for Engineers

6 ~ata-ink Maximization and Graphical Design

Chapter 27. Inferences for Regression. Remembering Regression. An Example: Body Fat and Waist Size. Remembering Regression (cont.)

CSE Data Visualization. Color. Jeffrey Heer University of Washington

Bootstrap Methods in Regression Questions Have you had a chance to try any of this? Any of the review questions?

Chapter 5. Describing Distributions Numerically. Finding the Center: The Median. Spread: Home on the Range. Finding the Center: The Median (cont.

More About Regression

Relationships Between Quantitative Variables

CS229 Project Report Polyphonic Piano Transcription

Relationships. Between Quantitative Variables. Chapter 5. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc.

Scoregram: Displaying Gross Timbre Information from a Score

Lesson 7: Measuring Variability for Skewed Distributions (Interquartile Range)

Frequencies. Chapter 2. Descriptive statistics and charts

Multiple-point simulation of multiple categories Part 1. Testing against multiple truncation of a Gaussian field

NAA ENHANCING THE QUALITY OF MARKING PROJECT: THE EFFECT OF SAMPLE SIZE ON INCREASED PRECISION IN DETECTING ERRANT MARKING

Automatic Music Genre Classification

Algebra I Module 2 Lessons 1 19

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

Reproducibility Assessment of Independent Component Analysis of Expression Ratios from DNA microarrays.

What is Statistics? 13.1 What is Statistics? Statistics

Math 81 Graphing. Cartesian Coordinate System Plotting Ordered Pairs (x, y) (x is horizontal, y is vertical) center is (0,0) Quadrants:

Why visualize data? Advanced GDA and Software: Multivariate approaches, Interactive Graphics, Mondrian, iplots and R. German Bundestagswahl 2005

Chapter 3. Averages and Variation

Data Visualization (CIS 468)

Lecture 2 Video Formation and Representation

Congratulations to the Bureau of Labor Statistics for Creating an Excellent Graph By Jeffrey A. Shaffer 12/16/2011

Colour Reproduction Performance of JPEG and JPEG2000 Codecs

Modeling memory for melodies

Escaping RGBland: Selecting Colors for Statistical Graphics

E X P E R I M E N T 1

MATH 214 (NOTES) Math 214 Al Nosedal. Department of Mathematics Indiana University of Pennsylvania. MATH 214 (NOTES) p. 1/3

The Measurement Tools and What They Do

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /ICIP.2016.

MODE FIELD DIAMETER AND EFFECTIVE AREA MEASUREMENT OF DISPERSION COMPENSATION OPTICAL DEVICES

Murdoch redux. Colorimetry as Linear Algebra. Math of additive mixing. Approaching color mathematically. RGB colors add as vectors

Paired plot designs experience and recommendations for in field product evaluation at Syngenta

Measuring Variability for Skewed Distributions

Chapter 1 Midterm Review

Improvised Duet Interaction: Learning Improvisation Techniques for Automatic Accompaniment

(Week 13) A05. Data Analysis Methods for CRM. Electronic Commerce Marketing

LCD and Plasma display technologies are promising solutions for large-format

Visualizing Social Networks

Estimation of inter-rater reliability

Blueline, Linefree, Accuracy Ratio, & Moving Absolute Mean Ratio Charts

Sociology 7704: Regression Models for Categorical Data Instructor: Natasha Sarkisian

Object selectivity of local field potentials and spikes in the macaque inferior temporal cortex

A Novel Approach towards Video Compression for Mobile Internet using Transform Domain Technique

Video coding standards

DELTA MODULATION AND DPCM CODING OF COLOR SIGNALS

Chapter 2 Describing Data: Frequency Tables, Frequency Distributions, and

Analysis of WFS Measurements from first half of 2004

COMP Test on Psychology 320 Check on Mastery of Prerequisites

Release Year Prediction for Songs

THE OPERATION OF A CATHODE RAY TUBE

Lesson 7: Measuring Variability for Skewed Distributions (Interquartile Range)

ATSC Standard: Video Watermark Emission (A/335)

Research Article. ISSN (Print) *Corresponding author Shireen Fathima

Box Plots. So that I can: look at large amount of data in condensed form.

Figures in Scientific Open Access Publications

Music Genre Classification

ATSC Candidate Standard: Video Watermark Emission (A/335)

Processes for the Intersection

Distribution of Data and the Empirical Rule

Experiments to Assess the Cost-Benefits of Test- Suite Reduction

Predicting the Importance of Current Papers

WHAT'S HOT: LINEAR POPULARITY PREDICTION FROM TV AND SOCIAL USAGE DATA Jan Neumann, Xiaodong Yu, and Mohamad Ali Torkamani Comcast Labs

For the SIA. Applications of Propagation Delay & Skew tool. Introduction. Theory of Operation. Propagation Delay & Skew Tool

Normalization Methods for Two-Color Microarray Data

Time Domain Simulations

THE OPERATION OF A CATHODE RAY TUBE

Chapter 4. Displaying Quantitative Data. Copyright 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley

Getting Started. Connect green audio output of SpikerBox/SpikerShield using green cable to your headphones input on iphone/ipad.

The theory of data visualisation

PS User Guide Series Seismic-Data Display

LSTM Neural Style Transfer in Music Using Computational Musicology

Math 7 /Unit 07 Practice Test: Collecting, Displaying and Analyzing Data

Connection for filtered air

A combination of approaches to solve Task How Many Ratings? of the KDD CUP 2007

Creating a Feature Vector to Identify Similarity between MIDI Files

Revised Curriculum Guide 2013

PGDBA 2017 INSTRUCTIONS FOR WRITTEN TEST

Supplemental Material: Color Compatibility From Large Datasets

MPEG-7 AUDIO SPECTRUM BASIS AS A SIGNATURE OF VIOLIN SOUND

Contents. xv xxi xxiii xxiv. 1 Introduction 1 References 4

CPU Bach: An Automatic Chorale Harmonization System

Composer Style Attribution

Bioconductor s marray package: Plotting component

Objective: Write on the goal/objective sheet and give a before class rating. Determine the types of graphs appropriate for specific data.

PHY221 Lab 1 Discovering Motion: Introduction to Logger Pro and the Motion Detector; Motion with Constant Velocity

HIGH DYNAMIC RANGE SUBJECTIVE TESTING

EE373B Project Report Can we predict general public s response by studying published sales data? A Statistical and adaptive approach

Visualizing Euclidean Rhythms Using Tangle Theory

On the Characterization of Distributed Virtual Environment Systems

homework solutions for: Homework #4: Signal-to-Noise Ratio Estimation submitted to: Dr. Joseph Picone ECE 8993 Fundamentals of Speech Recognition

About Giovanni De Poli. What is Model. Introduction. di Poli: Methodologies for Expressive Modeling of/for Music Performance

Transcription:

CSE 442 - Data Visualization Visual Encoding Design Jeffrey Heer University of Washington

A Design Space of Visual Encodings

Mapping Data to Visual Variables Assign data fields (e.g., with N, O, Q types) to visual channels (x, y, color, shape, size, ) for a chosen graphical mark type (point, bar, line, ). Additional concerns include choosing appropriate encoding parameters (log scale, sorting, ) and data transformations (bin, group, aggregate, ). These options define a large combinatorial space, containing both useful and questionable charts!

1D: Nominal

1D: Nominal

1D: Nominal

1D: Nominal

1D: Nominal

1D: Nominal

1D: Nominal Raw

1D: Nominal Raw Aggregate (Count)

1D: Nominal Raw Aggregate (Count)

1D: Nominal Raw Aggregate (Count)

1D: Nominal Raw Aggregate (Count)

1D: Nominal Raw Aggregate (Count)

1D: Nominal Raw Aggregate (Count)

1D: Nominal Raw Aggregate (Count)

1D: Nominal Raw Aggregate (Count)

Expressive? Raw Aggregate (Count)

1D: Quantitative

1D: Quantitative Raw

1D: Quantitative Raw

1D: Quantitative Raw

1D: Quantitative Raw

1D: Quantitative Raw

1D: Quantitative Raw

1D: Quantitative Raw

1D: Quantitative Raw Aggregate (Count)

1D: Quantitative Raw Aggregate (Count)

1D: Quantitative Raw Aggregate (Count)

1D: Quantitative Raw Aggregate (Count)

Expressive? Raw Aggregate (Count)

Raw (with Layout Algorithm)

Raw (with Layout Algorithm) Treemap

Raw (with Layout Algorithm) Treemap Bubble Chart

Raw (with Layout Algorithm) Treemap Bubble Chart Aggregate (Distributions)

Raw (with Layout Algorithm) Treemap Bubble Chart Aggregate (Distributions) Box Plot

Raw (with Layout Algorithm) Treemap Bubble Chart Aggregate (Distributions) median Box Plot

Raw (with Layout Algorithm) Treemap Bubble Chart Aggregate (Distributions) middle 50% (inter-quartile range) median Box Plot

Raw (with Layout Algorithm) Treemap Bubble Chart Aggregate (Distributions) low middle 50% (inter-quartile range) median Box Plot

Raw (with Layout Algorithm) Treemap Bubble Chart Aggregate (Distributions) low middle 50% (inter-quartile range) median high Box Plot

Raw (with Layout Algorithm) Treemap Bubble Chart Aggregate (Distributions) low middle 50% (inter-quartile range) median high Box Plot Violin Plot

2D: Nominal x Nominal

2D: Nominal x Nominal Raw

2D: Nominal x Nominal Raw

2D: Nominal x Nominal Raw

2D: Nominal x Nominal Raw

2D: Nominal x Nominal Raw Aggregate (Count)

2D: Nominal x Nominal Raw Aggregate (Count)

2D: Nominal x Nominal Raw Aggregate (Count)

2D: Nominal x Nominal Raw Aggregate (Count)

2D: Quantitative x Quantitative

2D: Quantitative x Quantitative Raw

2D: Quantitative x Quantitative Raw

2D: Quantitative x Quantitative Raw

2D: Quantitative x Quantitative Raw

2D: Quantitative x Quantitative Raw Aggregate (Count)

2D: Quantitative x Quantitative Raw Aggregate (Count)

2D: Nominal x Quantitative

2D: Nominal x Quantitative Raw

2D: Nominal x Quantitative Raw

2D: Nominal x Quantitative Raw

2D: Nominal x Quantitative Raw

2D: Nominal x Quantitative Raw

2D: Nominal x Quantitative Raw Aggregate (Mean)

2D: Nominal x Quantitative Raw Aggregate (Mean)

2D: Nominal x Quantitative Raw Aggregate (Mean)

Raw (with Layout Algorithm)

Raw (with Layout Algorithm) Treemap

Raw (with Layout Algorithm) Treemap Bubble Chart

Raw (with Layout Algorithm) Treemap Bubble Chart Beeswarm Plot

3D and Higher Two variables [x,y] Can map to 2D points. Scatterplots, maps, Third variable [z] Often use one of size, color, opacity, shape, etc. Or, one can further partition space. What about 3D rendering? [Bertin]

Other Visual Encoding Channels?

Encoding Effectiveness

Effectiveness Rankings [Mackinlay 86] QUANTITATIVE ORDINAL NOMINAL Position Position Position Length Density (Value) Color Hue Angle Color Sat Texture Slope Color Hue Connection Area (Size) Texture Containment Volume Connection Density (Value) Density (Value) Containment Color Sat Color Sat Length Shape Color Hue Angle Length Texture Slope Angle Connection Area (Size) Slope Containment Volume Area Shape Shape Volume

Effectiveness Rankings [Mackinlay 86] QUANTITATIVE ORDINAL NOMINAL Position Position Position Length Density (Value) Color Hue Angle Color Sat Texture Slope Color Hue Connection Area (Size) Texture Containment Volume Connection Density (Value) Density (Value) Containment Color Sat Color Sat Length Shape Color Hue Angle Length Texture Slope Angle Connection Area (Size) Slope Containment Volume Area Shape Shape Volume

Effectiveness Rankings [Mackinlay 86] QUANTITATIVE ORDINAL NOMINAL Position Position Position Length Density (Value) Color Hue Angle Color Sat Texture Slope Color Hue Connection Area (Size) Texture Containment Volume Connection Density (Value) Density (Value) Containment Color Sat Color Sat Length Shape Color Hue Angle Length Texture Slope Angle Connection Area (Size) Slope Containment Volume Area Shape Shape Volume

Color Encoding

Area Encoding

Effectiveness Rankings QUANTITATIVE ORDINAL NOMINAL Position Position Position Length Density (Value) Color Hue Angle Color Sat Texture Slope Color Hue Connection Area (Size) Texture Containment Volume Connection Density (Value) Density (Value) Containment Color Sat Color Sat Length Shape Color Hue Angle Length Texture Slope Angle Connection Area (Size) Slope Containment Volume Area Shape Shape Volume

Gene Expression Time-Series [Meyer et al 11]

Gene Expression Time-Series [Meyer et al 11] Color Encoding

Gene Expression Time-Series [Meyer et al 11] Color Encoding Position Encoding

Effectiveness Rankings QUANTITATIVE ORDINAL NOMINAL Position Position Position Length Density (Value) Color Hue Angle Color Sat Texture Slope Color Hue Connection Area (Size) Texture Containment Volume Connection Density (Value) Density (Value) Containment Color Sat Color Sat Length Shape Color Hue Angle Length Texture Slope Angle Connection Area (Size) Slope Containment Volume Area Shape Shape Volume

Artery Visualization [Borkin et al 11] Rainbow Palette Diverging Palette 2D 3D

Artery Visualization [Borkin et al 11] Rainbow Palette Diverging Palette 62% 92% 2D 39% 71% 3D

Effectiveness Rankings QUANTITATIVE ORDINAL NOMINAL Position Position Position Length Density (Value) Color Hue Angle Color Sat Texture Slope Color Hue Connection Area (Size) Texture Containment Volume Connection Density (Value) Density (Value) Containment Color Sat Color Sat Length Shape Color Hue Angle Length Texture Slope Angle Connection Area (Size) Slope Containment Volume Area Shape Shape Volume

Scales & Axes

Include Zero in Axis Scale? Government payrolls in 1937 [How To Lie With Statistics. Huff]

Include Zero in Axis Scale? Yearly CO 2 concentrations [Cleveland 85]

Include Zero in Axis Scale?

Include Zero in Axis Scale? Violates Expressiveness Principle!

Include Zero in Axis Scale? Compare Proportions (Q-Ratio) Violates Expressiveness Principle!

Include Zero in Axis Scale? Compare Proportions (Q-Ratio) Violates Expressiveness Principle! Compare Relative Position (Q-Interval)

Axis Tick Mark Selection What are some properties of good tick marks?

Axis Tick Mark Selection Simplicity - numbers are multiples of 10, 5, 2 Coverage - ticks near the ends of the data Density - not too many, nor too few Legibility - whitespace, horizontal text, size

How to Scale the Axis?

One Option: Clip Outliers

Clearly Mark Scale Breaks Poor scale break [Cleveland 85] Well-marked scale break [Cleveland 85]

Clearly Mark Scale Breaks Violates Expressiveness Principle! Poor scale break [Cleveland 85] Well-marked scale break [Cleveland 85]

Scale Break vs. Log Scale Scale Break Log Scale [Cleveland 85]

Scale Break vs. Log Scale Both increase visual resolution Scale break: difficult to compare (cognitive not perceptual work) Log scale: direct comparison of all data

Linear Scale vs. Log Scale Linear Scale 60 50 40 30 20 10 0 MSFT Log Scale 50 60 40 30 20 10 0 MSFT

Linear Scale vs. Log Scale Linear Scale Absolute change 60 50 40 30 20 10 0 MSFT Log Scale Small fluctuations Percent change d(10,20) = d(30,60) 50 60 40 30 20 10 0 MSFT

When To Apply a Log Scale? Address data skew (e.g., long tails, outliers) Enables comparison within and across multiple orders of magnitude. Focus on multiplicative factors (not additive) Recall that the logarithm transforms to +! Percentage change, not absolute value. Constraint: positive, non-zero values Constraint: audience familiarity?

Regression Lines

[The Elements of Graphing Data. Cleveland 94]

[The Elements of Graphing Data. Cleveland 94]

[The Elements of Graphing Data. Cleveland 94]

[The Elements of Graphing Data. Cleveland 94]

[The Elements of Graphing Data. Cleveland 94]

Transforming Data How well does the curve fit the data? [Cleveland 85]

Plot the Residuals Plot vertical distance from best fit curve Residual graph shows accuracy of fit [Cleveland 85]

Multiple Plotting Options Plot model in data space Plot data in model space [Cleveland 85]

Administrivia

A2: Exploratory Data Analysis Use visualization software to form & answer questions First steps: Step 1: Pick domain & data Step 2: Pose questions Step 3: Profile the data Iterate as needed Create visualizations Interact with data Refine your questions Author a report Screenshots of most insightful views (10+) Include titles and captions for each view Due by 11:59pm Tuesday, Oct 16

Multidimensional Data

Visual Encoding Variables Position (X) Position (Y) Size Value Texture Color Orientation Shape ~8 dimensions?

Example: Coffee Sales Sales figures for a fictional coffee chain Sales Profit Marketing Product Type Market Q-Ratio Q-Ratio Q-Ratio N {Coffee, Espresso, Herbal Tea, Tea} N {Central, East, South, West}

Encode Sales (Q) and Profit (Q) using Position

Encode Product Type (N) using Hue

Encode Market (N) using Shape

Encode Marketing (Q) using Size

Trellis Plots A trellis plot subdivides space to enable comparison across multiple plots. Typically nominal or ordinal variables are used as dimensions for subdivision.

Small Multiples [MacEachren 95, Figure 2.11, p. 38]

Small Multiples [MacEachren 95, Figure 2.11, p. 38]

Scatterplot Matrix (SPLOM) Scatter plots for pairwise comparison of each data dimension.

Multiple Coordinated Views

Multiple Coordinated Views select high salaries

Multiple Coordinated Views how long in majors select high salaries

Multiple Coordinated Views how long in majors select high salaries avg assists vs avg putouts (fielding ability)

Multiple Coordinated Views how long in majors select high salaries avg assists vs avg putouts (fielding ability) avg career HRs vs avg career hits (batting ability)

Multiple Coordinated Views how long in majors select high salaries avg assists vs avg putouts (fielding ability) avg career HRs vs avg career hits (batting ability) distribution of positions played

Parallel Coordinates

Parallel Coordinates [Inselberg]

Parallel Coordinates [Inselberg] Visualize up to ~two dozen dimensions at once 1. Draw parallel axes for each variable 2. For each tuple, connect points on each axis Between adjacent axes: line crossings imply neg. correlation, shared slopes imply pos. correlation. Full plot can be cluttered. Interactive selection can be used to assess multivariate relationships. Highly sensitive to axis scale and ordering. Expertise required to use effectively!

Radar Plot / Star Graph Parallel dimensions in polar coordinate space Best if same units apply to each axis

Dimensionality Reduction

Dimensionality Reduction http://www.ggobi.org/

Principal Components Analysis 1. Mean-center the data. 2. Find basis vectors that maximize the data variance. 3. Plot the data using the top vectors.

PCA of Genomes [Demiralp et al. 13]

Many Reduction Techniques! General Strategies: Matrix Factorization Nearest Neighbor (Topological) Methods Popular Techniques: Principal Components Analysis (PCA) t-dist. Stochastic Neighbor Embedding (t-sne) Uniform Manifold Approx. & Projection (UMAP)

distill.pub

Visualizing t-sne [Wattenberg et al. 16]

Time Curves [Bach et al. 16]

Time Curves [Bach et al. 16] Wikipedia Chocolate Article

Time Curves [Bach et al. 16] Wikipedia Chocolate Article U.S. Precipitation over 1 Year

Visual Encoding Design Use expressive and effective encodings Avoid over-encoding Reduce the problem space Use space and small multiples intelligently Use interaction to generate relevant views Rarely does a single visualization answer all questions. Instead, the ability to generate appropriate visualizations quickly is critical!