Evaluation of Serial Periodic, Multi-Variable Data Visualizations

Similar documents
Feature-Based Analysis of Haydn String Quartets

Logisim: A graphical system for logic circuit design and simulation

MATH& 146 Lesson 11. Section 1.6 Categorical Data

Composer Style Attribution

NAA ENHANCING THE QUALITY OF MARKING PROJECT: THE EFFECT OF SAMPLE SIZE ON INCREASED PRECISION IN DETECTING ERRANT MARKING

in the Howard County Public School System and Rocketship Education

Centre for Economic Policy Research

Introduction to CMOS VLSI Design (E158) Lab 3: Datapath and Zipper Assembly

Supplemental Material: Color Compatibility From Large Datasets

Charlottesville / Central Virginia Region Q NestRealty.com 1 of 9

AP Statistics Sec 5.1: An Exercise in Sampling: The Corn Field

Final Project [Tic-Tac-Toe]

Programs. onevent("can", "mousedown", function(event) { var x = event.x; var y = event.y; circle( x, y, 10 ); });

Detecting Musical Key with Supervised Learning

SURVEYS FOR REFLECTIVE PRACTICE

Analysis of local and global timing and pitch change in ordinary

OPTIMIZING VIDEO SCALERS USING REAL-TIME VERIFICATION TECHNIQUES

The Effects of Web Site Aesthetics and Shopping Task on Consumer Online Purchasing Behavior

SCANNER TUNING TUTORIAL Author: Adam Burns

Estimation of inter-rater reliability

Digital 1 Final Project Sequential Digital System - Slot Machine

Set-Top-Box Pilot and Market Assessment

CS229 Project Report Polyphonic Piano Transcription

WEB APPENDIX. Managing Innovation Sequences Over Iterated Offerings: Developing and Testing a Relative Innovation, Comfort, and Stimulation

MP212 Principles of Audio Technology II

Chapter Two: Long-Term Memory for Timbre

SUBJECTIVE QUALITY EVALUATION OF HIGH DYNAMIC RANGE VIDEO AND DISPLAY FOR FUTURE TV

The Calculative Calculator

A Framework for Segmentation of Interview Videos

AP Statistics Sampling. Sampling Exercise (adapted from a document from the NCSSM Leadership Institute, July 2000).

Force & Motion 4-5: ArithMachines

Algebra I Module 2 Lessons 1 19

Paired plot designs experience and recommendations for in field product evaluation at Syngenta

Student Performance Q&A:

Lecture 10: Release the Kraken!

Lecture 8: Sequential Logic

MAKING THE SWITCH A Customer Success Story with Robbin Rose and the Missoula Community Chorus

Example: compressing black and white images 2 Say we are trying to compress an image of black and white pixels: CSC310 Information Theory.

THE OPERATION OF A CATHODE RAY TUBE

UC San Diego UC San Diego Previously Published Works

Blueline, Linefree, Accuracy Ratio, & Moving Absolute Mean Ratio Charts

ENGR 40M Project 3b: Programming the LED cube

A Low-Power 0.7-V H p Video Decoder

IF MONTY HALL FALLS OR CRAWLS

Visual Communication at Limited Colour Display Capability

Math 81 Graphing. Cartesian Coordinate System Plotting Ordered Pairs (x, y) (x is horizontal, y is vertical) center is (0,0) Quadrants:

Analysis of Background Illuminance Levels During Television Viewing

PUNCTUATION GAMES AND ACTIVITIES INSTRUCTIONS. Full stops

FPGA Laboratory Assignment 4. Due Date: 06/11/2012

E X P E R I M E N T 1

Shaking Shapes. Joshua Gutwill. November 2004

Context. Draw a Secret [Usenix 99] Draw a Secret. Do background images improve Draw a Secret graphical passwords?

VISUAL CONTENT BASED SEGMENTATION OF TALK & GAME SHOWS. O. Javed, S. Khan, Z. Rasheed, M.Shah. {ojaved, khan, zrasheed,

Meeting Embedded Design Challenges with Mixed Signal Oscilloscopes

6-Point Rubrics. for Books A H

Author Resources Manuscript Preparation Guidelines

CMOS VLSI Design. Lab 3: Datapath and Zipper Assembly

CONQUERING CONTENT EXCERPT OF FINDINGS

Influence of Discovery Search Tools on Science and Engineering e-books Usage

Adaptive Key Frame Selection for Efficient Video Coding

Lecture 24: Motivating Modal Logic, Translating into It

Video Industry Making Significant Progress on Path to 4K/UHD

Selling the Premium in the Freemium: Impact of Product Line Extensions

Removing the Pattern Noise from all STIS Side-2 CCD data

Guidance For Scrambling Data Signals For EMC Compliance

The XYZ Colour Space. 26 January 2011 WHITE PAPER. IMAGE PROCESSING TECHNIQUES

Techniques to Reduce Manufacturing Cost-of-Test of Optical Transmitters, Flex DCA Interface

Results of Vibration Study for LCLS-II Construction in FEE, Hutch 3 LODCM and M3H 1

Tic-Tac-Toe Using VGA Output Alexander Ivanovic, Shane Mahaffy, Johnathan Hannosh, Luca Wagner

The Proportion of NUC Pre-56 Titles Represented in OCLC WorldCat

Results of the June 2000 NICMOS+NCS EMI Test

Problem 5 Example Solutions

LED Lighting Control and Interface

Sundance Institute: Artist Demographics in Submissions & Acceptances. Dr. Stacy L. Smith, Marc Choueiti, Hannah Clark & Dr.

Elements of Style. Anders O.F. Hendrickson

PERFORMANCE OF 10- AND 20-TARGET MSE CLASSIFIERS 1

Lecture 2 Video Formation and Representation

TV Character Generator

Catch or Die! Julia A. and Andrew C. ECE 150 Cooper Union Spring 2010

SHORT TERM PITCH MEMORY IN WESTERN vs. OTHER EQUAL TEMPERAMENT TUNING SYSTEMS

Rubato: Towards the Gamification of Music Pedagogy for Learning Outside of the Classroom

Gaining Musical Insights: Visualizing Multiple. Listening Histories

decodes it along with the normal intensity signal, to determine how to modulate the three colour beams.

ECE532 Digital System Design Title: Stereoscopic Depth Detection Using Two Cameras. Final Design Report

Communications. Weathering the Storm 1/21/2009. Verbal Communications. Verbal Communications. Verbal Communications

Testing Production Data Capture Quality

Session 1: Challenges: Pacific Library Cases Moderator: Verenaisi Bavadra RIDING THE WAVE: HOW MUCH A LIBRARY CAN CHANGE IN THREE YEARS

2012 Inspector Survey Analysis Report. November 6, 2012 Presidential General Election

Jazz Melody Generation and Recognition

7thSense Design Delta Media Server

Section 001. Read this before starting!

How to Predict the Output of a Hardware Random Number Generator

Manuel Richey. Hossein Saiedian*

LSTM Neural Style Transfer in Music Using Computational Musicology

Name: Date: Baker ELA 9

Reducing False Positives in Video Shot Detection

Contents Slide Set 6. Introduction to Chapter 7 of the textbook. Outline of Slide Set 6. An outline of the first part of Chapter 7

CS 61C: Great Ideas in Computer Architecture

White Paper. Missing Pixels in Medical Grayscale Flat Panel Displays. Geert Carrein Director Product Management. W hat s inside?

About This Guide. About Church Motion Graphics

Transcription:

Evaluation of Serial Periodic, Multi-Variable Data Visualizations Alexander Mosolov 13705 Valley Oak Circle Rockville, MD 20850 (301) 340-0613 AVMosolov@aol.com Benjamin B. Bederson i Computer Science Department Human-Computer Interaction Lab 3171 A.V. Williams Building University of Maryland College Park, MD 20742 bederson@cs.umd.edu ABSTRACT In this paper, I present the results of an evaluation of the effectiveness of a new technique for the visualization and exploration of serial periodic data. At this time, the only other visualization to support this task is the Spiral by Carlis and Konstan [1], which an issue with space usage that I attempt to address namely, the data points on the fringes of the spiral are sparse and the data points towards the middle are crowded. My solution to this is to use a grid-like structure, where space is used is uniformly throughout, and no space is wasted. I have conducted a study to compare the effectiveness of several variations of the grid approach of looking at multiple variables simultaneously, and the findings of this study are discussed. Keywords Information visualization, DataGrid, Grid, Serial Periodic Data, Multi-Variable Data, Data Exploration, Evaluation, User Study. INTRODUCTION Serial periodic data is data that has both serial and periodic properties the most obvious example is timebased data, where time continually moves forward (the serial aspect), and there are cycles of days, weeks, months, etc. (the periodic aspect). The DataGrid is an attempt to enable the user to find periodicity in their data, as well as see other pertinent information once the period has been found. In the DataGrid visualization, this data is displayed in rows and columns, similar to the way the days are arranged on a calendar. The exploration of the data is done through interactively varying the number of data points displayed in each row, thus varying the period. When the period displayed gets close to a period present in the data displayed, we see a telltale diagonal pattern (see Figures 1-3). When the period that we re displaying the data with matches a periodicity inherent in the data we re exploring, we see a vertical pattern emerge. See Figure 4 for an example of what the results look like when a month s worth of daily light data, taken at 15- minute intervals, is displayed with a period of 24 hours. Note that the periodicity of the data is not the only thing revealed it s also clear that the light intensity is going up each day (the red is brighter towards the bottom), and the day is getting longer (the red column is getting slightly wider towards the bottom). These additional observations make sense in light of the fact that the data displayed is for January, in the Northern hemisphere, when this is exactly what s supposed to be happening. EXPLORING MULTIPLE VARIABLES When only one variable is displayed, each of the small rectangles seen in Figures 1-4 corresponds to one reading of a single variable for a point in time, with the intensity of the color reflecting the value of the variable. DataGrid also allows the user to look at up to 3 variables simultaneously. Each additional variable is displayed using a different color (red and green for 2, red green and blue for 3 variables). There are 5 different ways of combining the different variables on the screen. Diagonal each rectangle is split diagonally, and the portion allocated for each variable is colored accordingly. Horizontal each rectangle is split horizontally, and each section colored accordingly. Vertical same as horizontal, but the rectangle is split vertically. Color Blend each rectangle s color is the blending of the red, green, (and possibly blue, for 3 variables) color components of each variable. Multiple Views (MV) displays, one below another, 2 to 3 single variable views, all of which are controlled simultaneously. The effectiveness of these different methods is evaluated by the study that is outlined below. See Figures 5, 6, 7, 8, and 9 for examples of what each method looks like. STUDY DESCRIPTION The study conducted was relatively small (10 subjects total). Each subject was asked to perform a variety of tasks on each of 10 different datasets. Five datasets were 2-variable, and 5 were 3 variable. For each dataset, each

subject had to use only one of the 5 possible visualization methods. The methods were staggered across datasets and users in such a manner as to ensure that the ease or difficulty of performing the tasks on a particular dataset did not affect the outcome of the study. The Multiple Views method was used as a baseline to compare other methods against, as it does not attempt to combine multiple variables in the same space. Tasks: The subject will have two tasks to perform for each data set: correctly identify the period, if any, of each variable that is being displayed identify the relationship, if any, between the variables that are being displayed Datasets: Pseudo-randomly generated data, suited for the study. The following 2-variable datasets were generated: 1) same period for both variables, variables directly related. 2) same period for both variables, variables inversely related 3) same period for both variables, variables are not related 4) different periods for each variable, not related 5) one variable is periodic, one isn t Three-variable datasets were the same as 2-variable sets (although randomly generated again, but with the same patterns), with an extra, unrelated variable added in. The goal was to try to measure the effect of this extra clutter introduced by adding another variable to the display. Measured Variables: The time the subjects take to complete each task. The correctness of the answer (binary, either correct or incorrect). STUDY RESULTS The Mann-Whitney Test was used to analyze the gathered data for statistical significance. See Tables 1-7 for detailed results of the test. The following comparisons were made, with the following results: Test I The time taken to find the period of the first variable was compared, for every method, against the time taken by the Multiple Views method. For 2-variable data sets, the MV method performed significantly better than all the other methods, except for Horizontal, where the difference was not significant. For 3-variable data sets, the MV method performed worse than all the other methods except for Horizontal so Horizontal actually did relatively worse than for 2 variables, but these differences were not statistically significant. Test II The total time taken to perform all the tasks was compared, for every method, against the time taken by the MV method. For both 2 and 3-variable datasets, the MV method outperformed its counterparts, however the difference was only significant in one case, when it was compared vs. the Diagonal method on 3-variable datasets. Test III The time taken to identify any relationship between the displayed variables was compared, for every method, against the time taken by the MV method. In all cases except one, the other methods outperformed MV, but the difference was not statistically significant. The exception was with the Vertical method for 3-variable datasets, where MV outperformed it, but also not significantly. Test IV The time taken to find the period of the first variable in a 2-variable dataset was compared, for every method, against the time taken by the same method to do the same task for a 3-variable dataset. All the differences were statistically insignificant, however, notably, the 2 largest ones were for the Horizontal and MV methods. The correctness of the subjects answers was not analyzed for significance, as the fraction of incorrect answers turned out to be extremely small. ANALYSIS OF STUDY RESULTS Test I shows that while, for 2 variables, MV clearly outperforms the other methods, for 3 variables, the other methods actually slightly outperform it. This is probably due to the fact that as the number of variables goes up, the space allotted for each variable in the MV method goes down. This indicates sharing the given space between multiple variables becomes more efficient than simply splitting the space up, as the number of variables displayed goes from 2 to 3. It seems likely that this trend would continue, and become more pronounced, as the number of variables is increased. Test I also indicates that the Horizontal is adversely affected by the increase in the number of variables, compared to the other methods except MV. Test II shows MV outperforms, though mostly insignificantly, all the other methods on the total time taken to complete all the tasks. Looking at the data, I think this is due to the fact that the first variable was always periodic, and the other 2 weren t always so. A lot of time was generally taken by subjects to identify that something wasn t periodic, and this was much more clear

in the MV view. The extra time was usually spent making sure that there really isn t a pattern there, whereas for MV it was very clear. However, since the subjects generally felt that there wasn t a pattern, and were just trying to make sure that was the case, it s reasonable to suppose that with more experience with using the other methods, they would be more comfortable identifying something as non-periodic. Test III shows MV slightly outperformed by all methods except for Vertical at the task of identifying relationships between variables. I think this is because the users were able to glean extra information the variable relationships while trying to identify individual variable periods in the methods where the space was shared in fact, many times the subjects identified the variable relationships immediately. With MV, the users gained no extra information from identifying the periods, and looking for relationships was a whole new task to them. The Vertical method tended to introduce a lot of confusion, because the vertical splitting of the rectangles inadvertently introduced a lot of vertical patterns that made vertical patterns due to periodicity harder to find. It also made variables appear to be inversely related, as the different colored vertical lines appeared side by side (See Figure 10). commented, This is pretty cool. I just think red, and I see the pattern I didn t even notice the other colors. CONCLUSION So, which of the methods is better? What are any of these methods good for? Obviously, the data under consideration needs to either be known to be periodic, or needs to be evaluated for periodicity. If only 2 variables need to be displayed, then Multiple Views is probably the best choice. The Horizontal method is a close second. For 3 variables, Color Blend and Diagonal seem to be the best choices they maintain a sense of vertical continuity, like Vertical, but, unlike Vertical, don t introduce false patterns. For 4 or more variables, Color Blend isn t an option, which leaves Diagonal. Its effectiveness for that many variables would need to be explored more, but it shows some promise. ACKNOWLEDGMENTS I d like to thank the Mote Marine Laboratory (www.mote.org) for providing the weather data. REFERENCES 1) Carlis and Konstan. Interactive Visualization of Serial Periodic Data. ACM Symposium on User Interface Software and Technology (1998), 29-38. Test IV, though it did not produce statistically significant results, seems to indicate that the Horizontal and MV methods suffered most from the clutter introduced by adding a third variable. For MV, this is consistent with the results from Test I that MV is more affected by the reduction of available space for each variable than other methods are by being forced to fill the shared space with more variables. For Horizontal, I think this is due to the fact that adding more horizontal lines per rectangle increases the vertical separation between values of the same variable, making vertical patterns harder to spot. This is also consistent with Test I s results. USER FEEDBACK The subjects seemed to be excited about using the visualization tool, and largely enjoyed the process of completing the tasks, in particular when they were able to quickly spot patterns in the data they were working with. The Vertical method seemed to cause a lot of confusion, and the study results bear that out somewhat. Many subjects commented that the Color Blend method was hard to use, as they weren t quite sure which colors combine to create which. However, despite that, the Color Blend method did quite well. I think that s because, even if mentally someone isn t quite sure which color combinations form which, they just have an intuitive sense for it for example, someone looking for red would be more likely to look at yellow (red + green) instead of cyan (blue + green). In fact, one user

APPENDIX A: FIGURES Figure 1: Light information for January 2000, displayed with a 21-hour period. We see a very slanted diagonal pattern. Figure 3: Light information for January 2000, displayed with a 23-hour period. The diagonal pattern becomes less and less slanted as we get closer to the period of the variable. Figure 2: Light information for January 2000, displayed with a 22-hour period. The diagonal pattern is a little less slanted. Figure 4: Light information for January 2000, displayed with a 24-hour period. The vertical pattern we see indicates that we have found the period of the variable (which, in this case, was obviously 24 to begin with).

Figure 5: Diagonal method, 2 variables. A vertical pattern is about to emerge for both red and green, both of which have the same period. Figure 7: Vertical method, 3 variables. A vertical pattern is about to emerge for both red and green, which share the same period. Note that although blue is non-periodic, we can see definite vertical strips of it. Figure 6: Horizontal method, 2 variables. A vertical pattern has emerged for both red and green, which are inversely related. Note that there is some vertical discontinuity for both colors. This becomes worse in the 3 variable case. Figure 8: Color Blend method, 3 variables. We see red and green have the same period (which is currently displayed), and are inverses. Blue, which is non-periodic, doesn t appear to introduce much clutter.

Figure 9: Multiple Views method, 3 variables. We are at the correct period for red. Green and blue are nonperiodic note the telltale absence of diagonal lines in either. Figure 10: Vertical method, 3 variables. We are not at the correct period for any variable, but we see strong vertical patterns. Also, the variables (falsely) appear to alternate, creating the impression that there is some inverse relationship.

APPENDIX B: TABLES The Z value is the confidence interval. Z >=1.96 means that a finding is significant with a confidence level of 95%. Tables for Test I Diagonal -2.00.05 Horizontal -.72 Not significant Vertical -2.31.05 Color Blend -2.00.05 Table 1: MV vs all other methods, time to identify period of 1 st variable, for 2 variable datasets. Negative Z values indicate MV performing better. Diagonal 0.11 Not significant Horizontal -0.42 Not significant Vertical 0.04 Not significant Color Blend 1.25 Not significant Table 2: MV vs all other methods, time to identify period of 1 st variable, for 3 variable datasets. Negative Z values indicate MV performing better. Tables for Test II Diagonal -1.93 Not significant Horizontal -.94 Not significant Vertical -1.40 Not significant Color Blend -1.47 Not significant Table 3: MV vs all other methods, total time to perform all tasks for 2 variable datasets. Negative Z values indicate MV performing better. Tables for Test III Diagonal 0.86 Not significant Horizontal 0.79 Not significant Vertical 0.56 Not significant Color Blend 0.23 Not significant Table 5: MV vs all other methods, time to identify relationships between variables in 2 variable datasets. Positive Z values indicate MV being outperformed. Diagonal.56 Not significant Horizontal 1.09 Not significant Vertical -.34 Not significant Color Blend.18 Not significant Table 6: MV vs all other methods, time to identify relationships between variables in 3 variable datasets. Positive Z values indicate MV being outperformed. Tables For Test IV 2 vs 3 var Z value Significance Level Diagonal.94 Not significant Horizontal -1.47 Not significant Vertical.26 Not significant Color Blend.49 Not significant Mult.Views -1.47 Not significant Table 7: 2 variable vs. 3 variable times to identify period of 1 st variable, for each method vs. itself. Negative Z values indicate that the method performed better on 2 variable datasets. Diagonal -2.08.05 Horizontal -1.40 Not significant Vertical -1.25 Not significant Color Blend -1.25 Not significant Table 4: MV vs all other methods, total time to perform all tasks for 3 variable datasets. Negative Z values indicate MV performing better.

APPENDIX C: RAW TIME DATA Below is the raw data gathered during the study. Data on the correctness of the subjects answers is not included, as the vast majority of the answers were correct, and there didn t appear to be any correlation between the time it took to answer a question and the answers correctness. Times to identify period of 1 st variable, for 2 variable datasets. Diagonal 34 120 7 224 82 100 4 23 18 12 Horizontal 24 90 8 9 79 11 8 8 3 2 Vertical 35 60 7 9 34 21 18 73 12 43 ColorBlend 80 123 16 14 26 10 12 12 12 22 Mult.Views 34 21 5 2 5 8 5 5 10 39 Times to find relationship between variables, 3 variable datasets: Diagonal 35 1 5 1 49 15 3 2 1 4 Horizontal 13 5 1 1 5 9 1 11 2 3 Vertical 11 105 6 13 7 5 1 14 5 1 Color Blend 49 69 5 5 1 30 1 6 3 2 Mult. Views 31 4 15 4 1 5 2 10 10 7 i This work was originally started as a class project for an Information Visualization course taught by Prof. Bederson at the University of Maryland. Times to identify period of 1 st variable, for 3 variable datasets. Diagonal 10 75 10 50 10 29 13 71 17 10 Horizontal 14 50 8 30 27 20 31 13 53 10 Vertical 10 10 6 45 8 21 180 105 12 27 ColorBlend 12 13 15 41 13 13 14 50 12 5 Mult.Views 30 27 17 67 16 11 21 19 4 15 Times to finish analyzing 2 variable datasets: Diagonal 57 235 15 345 207 120 9 34 20 65 Horizontal 36 123 10 16 118 34 17 15 5 57 Vertical 51 120 9 20 84 44 21 75 33 70 Color Blend 125 190 22 24 40 21 25 18 14 51 Mult. Views 454 54 7 9 9 22 7 7 12 109 Times to finish analyzing 2 variable datasets: Diagonal 11 240 11 87 73 105 185 131 141 84 Horizontal 15 175 31 140 49 71 92 16 147 93 Vertical 12 130 23 107 22 113 317 146 101 34 Color Blend 22 370 32 116 60 88 133 86 14 46 Mult. Views 35 80 34 89 34 51 28 37 10 44 Times to find relationship between variables, 2 variables datasets: Diagonal 11 1 1 1 1 44 20 1 3 1 Horizontal 11 1 1 1 1 10 1 1 1 5 Vertical 17 1 1 5 1 26 1 1 1 4 Color Blend 6 1 1 10 1 23 1 9 1 2 Mult. Views 9 1 1 34 1 68 1 5 1 2