Flow Cytometry Histograms: Transformations, Resolution, and Display

Similar documents
DELTA MODULATION AND DPCM CODING OF COLOR SIGNALS

Lecture 2 Video Formation and Representation

MODE FIELD DIAMETER AND EFFECTIVE AREA MEASUREMENT OF DISPERSION COMPENSATION OPTICAL DEVICES

Frequencies. Chapter 2. Descriptive statistics and charts

BitWise (V2.1 and later) includes features for determining AP240 settings and measuring the Single Ion Area.

Interface Practices Subcommittee SCTE STANDARD SCTE Measurement Procedure for Noise Power Ratio

Spectrum Analyser Basics

Guide to the FACS Calibur

For the SIA. Applications of Propagation Delay & Skew tool. Introduction. Theory of Operation. Propagation Delay & Skew Tool

What is Statistics? 13.1 What is Statistics? Statistics

Chapter 27. Inferences for Regression. Remembering Regression. An Example: Body Fat and Waist Size. Remembering Regression (cont.)

PulseCounter Neutron & Gamma Spectrometry Software Manual

The Measurement Tools and What They Do

Chapter 7. Scanner Controls

LabView Exercises: Part II

FACSAria I Standard Operation Protocol Basic Operation

On Figure of Merit in PAM4 Optical Transmitter Evaluation, Particularly TDECQ

Introduction. Edge Enhancement (SEE( Advantages of Scalable SEE) Lijun Yin. Scalable Enhancement and Optimization. Case Study:

DCI Requirements Image - Dynamics

Statistics for Engineers

Chapter 6. Normal Distributions

E X P E R I M E N T 1

Lab P-6: Synthesis of Sinusoidal Signals A Music Illusion. A k cos.! k t C k / (1)

homework solutions for: Homework #4: Signal-to-Noise Ratio Estimation submitted to: Dr. Joseph Picone ECE 8993 Fundamentals of Speech Recognition

Algebra I Module 2 Lessons 1 19

Distribution of Data and the Empirical Rule

Iterative Direct DPD White Paper

Elasticity Imaging with Ultrasound JEE 4980 Final Report. George Michaels and Mary Watts

ZONE PLATE SIGNALS 525 Lines Standard M/NTSC

Blueline, Linefree, Accuracy Ratio, & Moving Absolute Mean Ratio Charts

Reconstruction of Ca 2+ dynamics from low frame rate Ca 2+ imaging data CS229 final project. Submitted by: Limor Bursztyn

Chapter 1. Introduction to Digital Signal Processing

UNIVERSAL SPATIAL UP-SCALER WITH NONLINEAR EDGE ENHANCEMENT

Measurement of overtone frequencies of a toy piano and perception of its pitch

Dither Explained. An explanation and proof of the benefit of dither. for the audio engineer. By Nika Aldrich. April 25, 2002

An Empirical Analysis of Macroscopic Fundamental Diagrams for Sendai Road Networks

Getting Started. Connect green audio output of SpikerBox/SpikerShield using green cable to your headphones input on iphone/ipad.

Technical Bulletin. Standardizing Application Setup Across Multiple Flow Cytometers Using BD FACSDiva Version 6 Software. Abstract

MIE 402: WORKSHOP ON DATA ACQUISITION AND SIGNAL PROCESSING Spring 2003

Signal Stability Analyser

LCD and Plasma display technologies are promising solutions for large-format

The Extron MGP 464 is a powerful, highly effective tool for advanced A/V communications and presentations. It has the

CFlow User Guide. Science is hard. Flow cytometry should be easy.

FPGA Hardware Resource Specific Optimal Design for FIR Filters

m RSC Chromatographie Integration Methods Second Edition CHROMATOGRAPHY MONOGRAPHS Norman Dyson Dyson Instruments Ltd., UK

Removing the Pattern Noise from all STIS Side-2 CCD data

Why t? TEACHER NOTES MATH NSPIRED. Math Objectives. Vocabulary. About the Lesson

User s Manual. Log Scale (/LG) GX10/GX20/GP10/GP20/GM10 IM 04L51B01-06EN. 2nd Edition

CS229 Project Report Polyphonic Piano Transcription

Erasing 9840 and 9940 tapes

Evaluating Oscilloscope Mask Testing for Six Sigma Quality Standards

Realizing Waveform Characteristics up to a Digitizer s Full Bandwidth Increasing the effective sampling rate when measuring repetitive signals

User s Manual. Log Scale (/LG) GX10/GX20/GP10/GP20/GM10 IM 04L51B01-06EN. 3rd Edition

2. AN INTROSPECTION OF THE MORPHING PROCESS

HEBS: Histogram Equalization for Backlight Scaling

Visual Encoding Design

CytoFLEX Flow Cytometer Quick Start Guide

SEM- EDS Instruction Manual

How to Manage Color in Telemedicine

STAT 113: Statistics and Society Ellen Gundlach, Purdue University. (Chapters refer to Moore and Notz, Statistics: Concepts and Controversies, 8e)

Pre-processing of revolution speed data in ArtemiS SUITE 1

Color Spaces in Digital Video

The Definition of 'db' and 'dbm'

Sample Analysis Design. Element2 - Basic Software Concepts (cont d)

Processing data with Mestrelab Mnova

Keysight Technologies Understanding and Improving Network Analyzer Dynamic Range. Application Note

Getting Started with the LabVIEW Sound and Vibration Toolkit

KONRAD JĘDRZEJEWSKI 1, ANATOLIY A. PLATONOV 1,2

Chapter 5. Describing Distributions Numerically. Finding the Center: The Median. Spread: Home on the Range. Finding the Center: The Median (cont.

Machine Vision System for Color Sorting Wood Edge-Glued Panel Parts

Human Hair Studies: II Scale Counts

Precise Digital Integration of Fast Analogue Signals using a 12-bit Oscilloscope

Bootstrap Methods in Regression Questions Have you had a chance to try any of this? Any of the review questions?

Laser Beam Analyser Laser Diagnos c System. If you can measure it, you can control it!

Supplemental Material: Color Compatibility From Large Datasets

CAEN Tools for Discovery

Investigation of Digital Signal Processing of High-speed DACs Signals for Settling Time Testing

Design of an Error Output Feedback Digital Delta Sigma Modulator with In Stage Dithering for Spur Free Output Spectrum

A NEW LOOK AT FREQUENCY RESOLUTION IN POWER SPECTRAL DENSITY ESTIMATION. Sudeshna Pal, Soosan Beheshti

SPP-100 Module for use with the FSSP Operator Manual

Analysis and sorting of cells with FACSAria II flow cytometer Tiina Pessa-Morikawa / Revised

LAB 1: Plotting a GM Plateau and Introduction to Statistical Distribution. A. Plotting a GM Plateau. This lab will have two sections, A and B.

Extraction Methods of Watermarks from Linearly-Distorted Images to Maximize Signal-to-Noise Ratio. Brandon Migdal. Advisors: Carl Salvaggio

Intra-frame JPEG-2000 vs. Inter-frame Compression Comparison: The benefits and trade-offs for very high quality, high resolution sequences

Environmental Controls Laboratory

Circular Statistics Applied to Colour Images

Dithering in Analog-to-digital Conversion

MestReNova A quick Guide. Adjust signal intensity Use scroll wheel. Zoomen Z

Study of White Gaussian Noise with Varying Signal to Noise Ratio in Speech Signal using Wavelet

Results of the June 2000 NICMOS+NCS EMI Test

White Paper. Uniform Luminance Technology. What s inside? What is non-uniformity and noise in LCDs? Why is it a problem? How is it solved?

Analysis of WFS Measurements from first half of 2004

HIGH QUALITY GEOMETRY DISTORTION TOOL FOR USE WITH LCD AND DLP PROJECTORS

FPA (Focal Plane Array) Characterization set up (CamIRa) Standard Operating Procedure

The Effect of Time-Domain Interpolation on Response Spectral Calculations. David M. Boore

COMP Test on Psychology 320 Check on Mastery of Prerequisites

Please feel free to download the Demo application software from analogarts.com to help you follow this seminar.

Heart Rate Variability Preparing Data for Analysis Using AcqKnowledge

Full Disclosure Monitoring

DATA COMPRESSION USING THE FFT

Transcription:

Review Article Flow Cytometry Histograms: Transformations, Resolution, and Display David Novo, 1 * James Wood 2 1 De Novo Software, 3250 Wilshire Blvd. Suite 803, Los Angeles, California 2 Department of Cancer Biology, Wake Forest University School of Medicine, Winston Salem North Carolina 27157 Received 25 March 2008; Accepted 6 May 2008 *Correspondence to: David Novo, 3250 Wilshire Blvd. Suite 803, Los Angeles, CA 90010, USA Email: david.novo@denovosoftware.com Published online 8 July 2008 in Wiley InterScience (www.interscience. wiley.com) DOI: 10.1002/cyto.a.20592 2008 International Society for Advancement of Cytometry Abstract Flow cytometry data analysis routinely includes the use of one- or two-parameter histograms to visualize the data. These histograms have traditionally been plotted with either a linear or logarithmic scale. However, the recent trend of performing the logarithmic conversion in software has made apparent some limitations of the traditional visual presentation of logarithmic data. This review discusses the mathematics of presenting data on a histogram and emphasizes the difference between scaling and binning. The review introduces the concept of an effective resolution to describe how the bin width changes in a variable bin-width histogram. The change in effective resolution is used to explain the commonly observed valley and picket fencing artifacts. These result from the effective resolution of the display histogram being too high for the data being presented. Recently, several different binning transformations have been described that are becoming more popular because they allow one to view a large dynamic range of data on a single plot, while allowing the display of negative data values. While each of the transforms is based upon different equations, they all exhibit very similar properties. All of the transforms bin the data logarithmically at high channel values and linearly at low channel values. The linear scaling of the lower channels serves to limit the effective resolution of the histogram, thus minimizing the valley and picket fencing artifacts. The newer transformations are not without their own limitations and recommendations for the appropriate manner of presenting flow cytometry data using these newer transformations are discussed. ' 2008 International Society for Advancement of Cytometry Key terms histogram; data display; transformation; binning; scaling FLOW cytometry data consist of multiparameter optical (e.g. light scatter, fluorescence intensities), or electronic (e.g. Coulter volume) measurements from many thousands (or millions) of cells. These data are almost always graphed on one- or two-parameter histograms. The goal of visualizing data of a measured parameter of a sample population is to determine the actual probability function (APF) of the property being investigated. In various statistical literature, the APF is also referred to as the actual probability distribution, actual frequency distribution or actual probably density distribution. A histogram is a direct tabulation of the frequencies of the measured values obtained by measuring a specific parameter of a sample population (1). Flow cytometry data histograms have been universally interpreted as being an accurate visual representation of the APF of the property being measured. They are traditionally displayed on either linear or logarithmically scaled axes. The logarithmic transformation has been found to be useful because it allows for the visualization of at least four decades of dynamic range on a single graph, and it makes log-normal distributions, common to biological systems, appear more symmetrical. Some instruments perform the logarithmic transformation in hardware, using logarithmic amplifiers, while other instruments only save linearly scaled data, and the logarithmic transformation Cytometry Part A 73A: 685 692, 2008

occurs in software (2). More recently, several different authors have proposed alternatives to the logarithmic function for the display of flow cytometry data (3 7) to overcome some of the logarithmic transformation s perceived shortcomings. This review will discuss the following: (i) the difference between binning and scaling of flow cytometry data; (ii) the display of data in uniform and variable bin-width histograms; (iii) visual artifacts associated with variable bin-width histograms; (iv) how these new transformations attempt to display the flow cytometry data distributions so as to accurately reflect the underlying distribution of the cell populations in the sample; and (v) caveats of the new transformations. SCALING VS. BINNING Flow cytometry histograms are a direct tabulation of the frequencies of measured values in a fixed number of channels or bins. They are often described as being displayed in a particular scale without reference to the underlying process of binning. However, scaling and binning are separate procedures that impact how data are visualized in a histogram. Scaling refers to the scale associated with an axis and does not describe the width of the bins in the histogram. The difference between scaling and binning can be clearly appreciated in Figure 1. Figures 1B and 1C show data from the same APF as the data displayed in Figure 1A, on an apparent logarithmic display. The logarithmic display was either accomplished by simply changing the scale in Microsoft Excel (Fig. 1B) or by simulating the digitization of a logarithmically amplified signal as it would be if performed in the flow cytometer (Fig. 1C). The plot in Figure 1B preserves the peak heights and the overall shape of the curve, whereas the plot in Figure 1C did not. Figures 1D 1F show data from 100,000 cells simulated from a uniform APF, i.e., the measured values encompassed all possible values. Figure 1D shows the data on a linear scale, where the APF is clearly understood. The data is displayed on a logarithmic axis either by changing the scale in Microsoft Excel (Fig. 1E) or by simulating a logarithmically amplified signal (Fig. 1F). The population in Figure 1F appears skewed, with many more events having high data values than lower ones. Even these simplistic data sets show there is a clear difference between the different mechanisms of displaying data on a logarithmic scale. Scaling of an axis refers to changing the relative position of coordinates along the axis. Simply changing the scale preserves the number of peaks and the peak heights, but can make peaks appear wider, narrower, skewed, or can change their relative positions, depending on the scaling used. Visually comparing peaks in different regions of the nonlinear scale can be nonintuitive, since the peaks may look dramatically different e.g., larger or smaller apparent area solely due to the nonlinear axis scaling. For example, the logarithmic scale will make distributions in the lower portion of the scale look disproportionately larger in area than if an identical distribution was displayed in the upper portion of the scale. Binning is a mathematical process that also affects how an APF appears in a histogram. All data displayed on flow cytometry histograms are binned data. Binning is the process of assigning data to discrete adjoining categories, generally referred to as channel values or bins. The appearance of the histogram is affected by the width and number of bins into which the APF is distributed, and whether the bins are all the same or have variable widths. Binning is initially performed by the flow cytometer electronics itself, when an analog-to-digital converter (ADC) samples an analog signal and assigns it to a discrete channel. The ADC resolution determines the number of possible channels to which a given analog signal can be assigned. Most of the older flow cytometers in use today have 10 bit ADCs (1,024 channels), while newer cytometers have up to 24 bit ADCs (16,777,216 channels). It may not be commonly appreciated that the data shown on flow cytometry plots is generally further binned by the flow cytometry data analysis software at a lower resolution than it was digitized. The decrease in resolution serves to emphasize the underlying shape of the distribution without being distracted by noise that is often evident in the higher resolution data. The calculation of the optimal number of bins needed to represent an APF has been a common subject in statistics literature (8). For instance, if 5000 events are acquired on an 18 bit resolution cytometer (262,144 possible channels), it is likely that there will be no more than a few events in any particular channel (Fig. 2A). Reducing the resolution to 10 bits (Fig. 2B) allows for a more accurate representation of the underlying APF. The simplest binning algorithm is linear binning, shown in Eq. (1), where O(y) is the underlying APF, R is the range of the original data and b is the number of bins. BðxÞ ¼ Z rðxþ1þ rx OðyÞdy; where r ¼ R b : ð1þ Linear binning has the characteristic that each individual channel in the binned dataset always represents the same range of the original dataset. Linear binning to a lower resolution will always increase the number of counts in the binned channels as compared to the original and can result in a decrease in the number of peaks if the resolution is lowered sufficiently. Another way to bin the data is logarithmically, according to the formula shown in Eq. (2), where a is the base and c ¼ log a ðrþ b. Z a cðxþ1þ BðxÞ ¼ OðyÞdy: ð2þ a cx Logarithmic binning can result from (i) the analog signal passing through a logarithmic amplifier and an analog to digital converter, or (ii) by software transforming linearly binned data that was saved by the flow cytometer. In contrast to linear binning, no two logarithmically binned channels represent the same absolute range in the original function. For example, channel 1023 of an ideal 4 decade logarithmically amplified histogram samples from 10,000 times as much of the original voltage signal range as channel 1. In fact, the range of the original signal represented by each channel in the logarithmic 686 Flow Cytometry Histograms

Figure 1. Comparison of Logarithmic displays obtained by binning and scaling. Populations were created from 5,000 simulated voltage measurements ranging from 0.001 to 10 V. Linear and logarithmically amplified histograms were constructed by assigning the simulated measurement to channel based on the formulas: Linear Channel 5 (x/10) 3 (1024); where x is a simulated measurement. Log Channel 5 [log (x) 1 3) 3 256; where x is a simulated measurement. (A C) 5,000 cells were simulated to have a mean and standard deviation of 1 and 0.8 V, respectively. (A) Data were graphed as if signal was acquired through a 10 bit ADC. Channel 0 contains 573 events. (B) Data from Figure 1A was graphed on a log scale using Microsoft Excel. (C) Data were graphed as if signal was acquired using an ideal logarithmic amplifier. Channel 0 contains 540 events. Median value is 991. (D F) 100,000 cells were simulated from a uniform distribution. (D) Data were graphed as if signal was acquired through a 10 bit ADC. (E) Data from Figure 1A were graphed on a log scale using Microsoft Excel. (F) Data were graphed as if signal was acquired using an ideal logarithmic amplifier. [Color figure can be viewed in the online issue, which is available at www.interscience.wiley.com.] histogram increases exponentially with increasing channel number. Each channel width increases by the same relative fraction as the previous one. Thus in contrast to a linearly binned histogram in which the relative width of the histogram channels decreases and the absolute width stays constant, the relative width of the logarithmic histogram channels remains Cytometry Part A 73A: 685 692, 2008 687

Figure 2. Effect of lowering the resolution on the ability to visualize the APF. Lowering the resolution of the histogram is often critical for visualizing the structure of the APF. Five thousand cells were simulated and channel values assigned using a simulated 18 bit ADC converter as described in Figure 1. Data were plotted on histograms with 18 bit (A) or 10 bit (B) resolution. constant and the absolute width increases exponentially. Thus logarithmic histograms have the advantage that the width of two populations that have the same coefficient of variation will be nearly identical, irrespective of the position on the histogram. This allows for easy visual comparison of the number of cells in different peaks anywhere along the axis range, provided they have the same CVs. While the end result of logarithmic binning may be to fit multiple decades of dynamic range onto an axis, it is not the same as applying a logarithmic scale to the axis. While displaying data on a logarithmic axis can be useful for biological data, care must be taken to appreciate that for a typical flow cytometry plot the logarithmic axis is a byproduct of the nonuniform bin widths and can introduce distortions along the axis that can lead to misinterpretation of the APF. In contrast, because true logarithmic scaling only affects the horizontal axis, the shape of the original APF is usually more recognizable (Fig. 1). EFFECTIVE RESOLUTION As discussed previously, when data is binned logarithmically, no two binned channels represent data from the same range as the original data. Thus, it is convenient to define the effective resolution (ER) of a particular channel in the logarithmic binned histogram as the number of channels there would be if the original data was binned linearly with the same equivalent bin width as the particular log channel. The topmost channel in a 1,023 resolution log scaled histogram samples from 0.9% of full scale. Dividing the histogram into channels with widths equal to 0.9% of full scale results in 111 channels, or 6.79 bit ER. Conversely, the ER of channel 0 on a logarithmically binned histogram is greater than 20 bits. Figure 4 shows how the ER changes across the entire range of a 10 bit 4 and 5.41 decade histogram, respectively. In both cases, the maximum effective resolution (MER) is far greater than the nominal resolution of 10 bit ADC converter itself, which was the original intent of the logarithmic conversion. The change in effective resolution along the histogram gives rise to two visual artifacts that have been previously noted in the literature. The first will be termed the valley artifact, which has been described previously in Refs. 3, 6 and 9. Briefly, as the gain on a particular channel was decreased, the population moved towards the origin, as expected. However, this only happened up to a point. Once the gain was decreased further, the visible peak appeared to decrease in size and large numbers of cells began appearing on the axis. Figure 3 shows how, as the mean of a distribution with a given standard deviation (SD) is decreased, the distribution mode approaches the value of the SD as the mean approaches and is decreased below the value of the SD. As the mean decreases below the SD, the mode stays fixed and the peak height at the mode decreases. The accumulation of events on the axis is dependent on several factors including the acquisition electronics. For the duration of the paper the term valley artifact will refer to the decrease in the count values close to channel zero. Cells were never visible in what appeared to be a valley between the peak Figure 3. As the mean of a gaussian distribution with a specific SD is decreased, the value of the mode of the distribution tracks the value of the mean until the value of the mean approaches the value of the SD. As the value of the mean becomes less than the value of the SD, the mode remains fixed at a value close to the value of the SD. This is commonly observed when applying spectral compensation to flow cytometry data. As the compensation is increased, the mean and mode will initially decrease together with the mode finally becoming stationary. At this point the valley artifact becomes evident as the area between the mode and the histogram origin. 688 Flow Cytometry Histograms

and the axis (3). The second visual artifact is the picket fencing artifact, shown in Figure 5. Figure 5A shows the population displayed on a linear axis, where it is clear that there is only a single population. However, when this data is graphed on a logarithmic axis (Fig. 5B), a picketed pattern appears near the origin, making it appear as if there are many discrete populations. Pickets can also be observed in two-dimensional plots. The change in ER from the low end to the high end of the histogram explains both the valley and picket fencing artifacts. These artifacts result from the interplay between the resolution of the data and the ER of the histogram on which the data is being plotted. The bulk of the remainder of this paper will explore this relationship, which is critical for understanding both the source of the artifacts and why they are not seen with alternative binning functions. EXPLANATION OF VALLEY ARTIFACT The valley artifact is most evident when the standard deviation of the population approaches the value of the population mean. Thus a significant portion of the APF lies below the first histogram channel. A simple explanation for the valley artifact is that channels at the low end of the logarithmically binned histogram sample from such a small range of the original signal that the probability of an event resulting in a signal that falls into that range becomes very small. This is analogous to the phenomenon described in Figure 2A; however, in that case the bins were too small along the entire range of the data. In the case of the logarithmic histogram, the bins only become too small towards the origin. This almost always includes at least the first decade of the logarithmic histogram. From a mathematical point of view, the ER of the histogram increases exponentially with decreasing channel. If the dynamic range of the log converter would be increased (i.e. the low end of the scale were to be extended), it would only serve to increase the apparent width of the valley. The valley artifact will be evident for any peak where the SD of the peak approaches the mean, i.e., where a significant proportion of the events in the peak lie on the axis (Fig. 3). In these cases, the frequency distribution displayed by the logarithmically binned histogram does not provide an easily recognizable representation of the APF. It is possible to correct for the variable bin width by dividing the frequency distribution histogram values by the channel width so that the frequency per unit channel is plotted. This is the definition of a binned probability density histogram and histograms constructed in this manner appear similar to the histograms displayed in Figures 1B and 1E. In fact given a sufficient enough number of events per channel, it is possible to recover the shape of the APF from the logarithmic histogram by normalizing the data to the histogram binwidth to reflect the relative ER of each bin (10,11). Figure 4. Effective resolution of a histogram changes along the axis. Note that Y-axis is a base 2 logarithmic scale. Dashed Line: Effective resolution of a 10 bit; 4 decade histogram Solid line: Effective resolution of a 10 bit; 5.41 decade histogram. EXPLANATION OF PICKET FENCING The picket fencing artifact results from improper graphing of the variable sized channels at the low end of the logarithmic histogram. Picket fencing occurs when data are improperly binned into a histogram whose MER is greater than that of the data itself. Figure 4A shows that a 10 bit, 4 decade log histogram has a MER of 20 bits. Logarithmic amplifiers sample data directly from the analog signal, the resolution of which will always exceed the MER of whatever histogram into which it is eventually binned. The lack of picket fencing in instruments with logarithmic amplifiers is not caused by so called nonlogarithmic behavior, amplifier noise, or random digitization errors associated with the conversion of small signals. It is simply that the original analog signal has sufficient resolution to be graphed on a histogram with a 20 bit MER. However, in newer digital instruments, the logarithmic conversion is performed in software on data that has already been binned to a finite resolution by an ADC. Incorrectly graphing data in the region of a histogram where the ER exceeds the resolution of the ADC will result in the picket fencing artifact. This can be appreciated visually in Figure 6A, which is simply a magnified view of the histogram in Figure 5B. Notice that the width of the individual peaks (or pickets) representing a channel is constant, whereas the width of the actual channel changes along the X-axis. To fill up the entire first channel on the logarithmic histogram with peaks as narrow as shown, the original digitized data would need many fractional values between 1 and 2, which do not exist. Placing all the events from channel 1 of the original data at the leftmost border of channel 1 on the logarithmic histogram gives the impression that the resolution of the original data is higher than it is, i.e., it appears that all of the original data is really at 1.00000. A more visually accurate graphing of the data would spread the data from the original channel over the entire range it represents on the logarithmic scale. This is why smoothing or dithering works to reduce picket fencing. Just as many linear channels are binned into a single log channel at the high end of the log scale, at the low end of the logarithmic histo- Cytometry Part A 73A: 685 692, 2008 689

Figure 5. Picket fencing example. (A) Five thousand cells from a simulated normally distributed population with a mean in channel 30 and a 33% CV on an 18 bit linear scale (only first 100 channels are shown). (B) The linear data was transformed to log in software and graphed on a log scale. gram a single linear channel should be spread over many logarithmic channels. Figures 6B and 6C show the data properly graphed on a log scale. The individual peaks in Figure 6B are blockier than in Figure 5B, which accurately reflects the varying widths of the channels. Notice that the counts in Figure 6C are less than the corresponding counts in Figure 6A because the data is spread over multiple logarithmic channels. The blockiness is also apparent when data are properly graphed on 2D plots (Fig. 7A), however, if the data are graphed on a density plot, with the background color matching the color used for low density events, then much of the undesirable blockiness blends into the background (Fig. 7B). the raw data will ensure that picket fencing artifacts do not appear since the ER of the plot never exceeds the resolution of the original data. However, it will not necessarily eliminate the CHARACTERISTICS OF NEW HISTOGRAM SCALE TRANSFORMATIONS Recently, several different binning transformations have been described (3 7) that are becoming more popular because they allow one to view a large dynamic range of data on a single plot, while allowing the display of negative data values, and minimizing valley and picket fencing artifacts. While each of the transforms is based upon different equations, they all exhibit very similar properties. All of the transforms bin the data logarithmically at high channel values and linearly at low channel values. All of the transforms allow for the adjustment of a transition point between the logarithmic and linear binning, first derivative matched (4,6), or more gradual like the biexponential (5,7) and hyperlog (3). We use the generic term transition point to refer to the w parameter of the biexponential transformation (5) and the b parameter hyperlog transformation (3). The primary effect of adjusting the transition point is to limit the MER of the transformed plot. This ensures that the width of the binned channels never gets so small that the chance of an event being assigned to that channel becomes negligible. Figures 8A and 8B show the effective resolution of all the transformations, with the parameters adjusted so that the resolution does not exceed 18 and 14 bits, respectively. It can be seen that the main difference between the transformations is the way that they transition from logarithmic to linear binning. Adjusting the parameters of the transformation such that the MER of the histogram does not exceed the resolution of Figure 6. Correct graphing of data where MER of histogram exceeds resolution of the data. (A) Magnified view of same histogram as displayed in Figure 5B. (B) Same data as in Figure 5B and histogram is generated using the interpolation algorithm described in the text. (C) First decade of data from Figure 6B. 690 Flow Cytometry Histograms

strongly dependent on the choice of the transition point. Because the transition point is not usually displayed with the histogram, the investigator needs to verify that any two histograms are truly comparable. Figure 9 shows an example of two similar datasets displayed with differing transition points that give distinctly different visual impressions. The data in Figures 9A and 9B are essentially identical, except that the data shown in Figure 9B contains a few cells with higher negative X-axis (egfp) values than in Figure 9A. This caused the software to calculate a different transition value for the biexponential transformation in Figures 9A and 9B. Since the X-axis in Figure 9B displays a much wider range of negative values, the data on the plots appear visually different, which can be misleading unless care is taken to inspect the axis labels. Figure 9C shows the data in 9B plotted on the same X-axis scale (T 5 262,144, w 5 50) as 9A. CONCLUSIONS To summarize, there is a distinct difference between simply scaling data by changing the relative positions of the axis coordinates and scaling the data by changing the histogram bin-widths. The valley and picket fencing artifact associated with the logarithmic binning both result from the effective resolution of the logarithmic histogram increasing as the chan- Figure 7. 2D plots graphed with corrections to eliminate picket fencing. (A) Standard dot plot. Large blocks near the axis result from the large size of the channels near the origin. (B) Density plots eliminate much of the undesired visual effects associated with the large channels near the axis. valley artifact. Depending on the CVs of the populations and the number of events in the low end of the data, it may be necessary to further decrease the MER so that the lower channels of the histogram have enough events to see some structure in the data. In fact, transition points suggested by the authors (3,5) result in histograms with MERs that are far lower than the resolution of the data of most modern flow cytometers. These new transformations introduce a new caveat when displaying histogram data. The placement of the transition can have profound impact on the display of the data. Firstly, if the transition point occurs within a population distribution, it has not been demonstrated that the distribution would not be distorted by the transition from logarithmic to linear behavior. Secondly, when visually comparing histograms between data sets and sometimes within data sets, it is important that the transition point is the same since the scaling of the data is Figure 8. Effective resolution of different display scale transformations. Transition parameters were adjusted to such that the MER is either 18 (A) or14(b) bit. Solid black, logarithmic (4 decade, 10 bit); solid gray, derivative matched; dashed gray, hyperlog; dashed black, biexponential. Cytometry Part A 73A: 685 692, 2008 691

channel is too low to see any structure. The picket fencing results from improper graphing of the data in the region where the histogram resolution exceeds the resolution of the instrument s ADC s resolution. New transformations that transition between linear and logarithmic binning help to minimize these artifacts and provide a way to display negative data values. The primary motivation of these transformations is to make the data appear more visually intuitive. The transition parameters of these transformations should be held constant whenever data is compared to avoid plots that are visually misleading solely as a result of differing axis scales. MATHEMATICAL MODELING AND GRAPHICS METHODS Simulations were simulated using Microsoft Excel or custom software written in Delphi (Code Gear, Scotts Valley, CA). Unless otherwise specified the simulation results and flow cytometry data were plotted with FCS Express Version 3 (De Novo Software, Los Angeles, CA). ACKNOWLEDGMENTS The authors thank Howard Shapiro, John Nolan, Ralph Rossi, and Alessio Palini for their insightful comments when reviewing the manuscript. They thank Ger van den Engh for his suggestions that resulted in Figure 7B. They also thank Nicole Beauchamp of Wake Forest University for the data used in Figure 9. Figure 9. The effect of changing the transition parameter on data display. Data in panels A and B are almost identical, however, the BD FACSDiva TM software automatically calculated a transition value for the biexponential function that binned the data differently, resulting in radically different appearance of the data. Panel C shows the same data as Panel B, with the biexponential parameters adjusted in FCS Express TM such that the binning was similar to that shown in Panel A. The differing scales on the X-axis make it difficult to interpret the data. [Color figure can be viewed in the online issue, which is available at www.interscience.wiley.com.] nel number decreases. The valley artifact results from the resolution increasing so dramatically that the widths of logarithmic channels become so small that the number of events per LITERATURE CITED 1. Boas ML. Mathematical Methods in the Physical Sciences. New York: Wiley; 1966. pp 695 718. 2. Shapiro HM. Practical Flow Cytometry, 4th ed. New Jersey: Wiley; 2003. pp 35 36. 3. Bagwell CB. Hyperlog-a flexible log-like transform for negative, zero, and positive valued data. Cytometry A 2005;64A:34 42. 4. Battye FL. A mathematical simple alternative to the logarithmic transform for flow cytometric fluorescence data displays. In: 2005 ISAC Samuel A. Latt Conference, Queensland, Australia, November, 2005. 5. Parks DR, Roederer M, Moore WA. A new Logicle display method avoids deceptive effects of logarithmic scaling for low signals and compensated data. Cytometry A 2006;69A:541 551. 6. Wood JCS. Techniques to compress the scale of flow cytometry data: Benefits, artifacts and solutions. Cytometry A 2004;59A:88. 7. Herzenberg LA, Tung J, Moore WA, Herzenberg LA, Parks DR. Interpreting flow cytometry data: A guide for the perplexed. Nat Immunol 2006;7:681 685. 8. Scott DW. On optimal and data-based histograms. Biometrika 1979;66:605 610. 9. Roederer M. Spectral compensation for flow cytometry: Visualization artifacts, limitations, and caveats. Cytometry 2001;45:194 205. 10. Wood JCS. Fundamental flow cytometer properties governing sensitivity and resolution. Cytometry 1998;33:260 266. 11. Wood JCS, Quintana J. Using basic physical parameters to predict instrument performance at low fluorescence light levels. Cytometry 1998;34:295. 692 Flow Cytometry Histograms