Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Similar documents
Beatty on Chance and Natural Selection

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

MENC: The National Association for Music Education

Council for Research in Music Education

Book Review. John Dewey s Philosophy of Spirit, with the 1897 Lecture on Hegel. Jeff Jackson. 130 Education and Culture 29 (1) (2013):

KNX Dimmer RGBW - User Manual

Precision testing methods of Event Timer A032-ET

in the Howard County Public School System and Rocketship Education

Mind Association. Oxford University Press and Mind Association are collaborating with JSTOR to digitize, preserve and extend access to Mind.

Why t? TEACHER NOTES MATH NSPIRED. Math Objectives. Vocabulary. About the Lesson

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

DEFINITIONS OF TERMS

Blueline, Linefree, Accuracy Ratio, & Moving Absolute Mean Ratio Charts

Review Your Thesis or Dissertation

Plato s. Analogy of the Divided Line. From the Republic Book 6

I) Documenting Rhythm The Time Signature

COMP Test on Psychology 320 Check on Mastery of Prerequisites

NAA ENHANCING THE QUALITY OF MARKING PROJECT: THE EFFECT OF SAMPLE SIZE ON INCREASED PRECISION IN DETECTING ERRANT MARKING

The State of Poetry and Poetry Criticism in the UK and Ireland, Jan 2012 Mar 2018

Chapter 27. Inferences for Regression. Remembering Regression. An Example: Body Fat and Waist Size. Remembering Regression (cont.)

The Barrier View: Rejecting Part of Kuhn s Work to Further It. Thomas S. Kuhn s The Structure of Scientific Revolutions, published in 1962, spawned

Independent Reading Project

MAURICE MANDELBAUM HISTORY, MAN, & REASON A STUDY IN NINETEENTH-CENTURY THOUGHT THE JOHNS HOPKINS PRESS: BALTIMORE AND LONDON

Instance and System: a Figure and its 2 18 Variations

STI 2018 Conference Proceedings

The Influence of Open Access on Monograph Sales

Dawn M. Phillips The real challenge for an aesthetics of photography

A separate text booklet and answer sheet are provided for this section. Please check you have these. You also require a soft pencil and an eraser.

Measurement of overtone frequencies of a toy piano and perception of its pitch

AP Statistics Sec 5.1: An Exercise in Sampling: The Corn Field

Precise Digital Integration of Fast Analogue Signals using a 12-bit Oscilloscope

6 The Analysis of Culture

Salt on Baxter on Cutting

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

Department of Chemistry. University of Colombo, Sri Lanka. 1. Format. Required Required 11. Appendices Where Required

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes

Bach-Prop: Modeling Bach s Harmonization Style with a Back- Propagation Network

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

WEB APPENDIX. Managing Innovation Sequences Over Iterated Offerings: Developing and Testing a Relative Innovation, Comfort, and Stimulation

How to Predict the Output of a Hardware Random Number Generator

Policy Statement on Academic Integrity and Plagiarism

In basic science the percentage of authoritative references decreases as bibliographies become shorter

Confidence Intervals for Radio Ratings Estimators

Tale of Two Books. By Gordon Lynn Hufford and Garry Harrison

Review Your Thesis or Dissertation

MATH 214 (NOTES) Math 214 Al Nosedal. Department of Mathematics Indiana University of Pennsylvania. MATH 214 (NOTES) p. 1/3

Typography & Page Layout

A 5 Hz limit for the detection of temporal synchrony in vision

Introduction. The report is broken down into four main sections:

Logic and Artificial Intelligence Lecture 0

Comparison, Categorization, and Metaphor Comprehension

Chapter 7: RV's & Probability Distributions

The outputs are formed by a combinational logic function of the inputs to the circuit or the values stored in the flip-flops (or both).

An Example of Eliminating a Technical Problem with Only One Single Part

Chapter 4. Logic Design

Chapter 14. From Randomness to Probability. Probability. Probability (cont.) The Law of Large Numbers. Dealing with Random Phenomena

Nicomachean Ethics. p. 1. Aristotle. Translated by W. D. Ross. Book II. Moral Virtue (excerpts)

Part 1: A Summary of the Land Ethic

COMMONLY MISUSED AND PROBLEM WORDS AND EXPRESSIONS

MENC: The National Association for Music Education

Figure 9.1: A clock signal.

HOW TO WRITE HIGH QUALITY ARGUMENTS

Analysis of local and global timing and pitch change in ordinary

Memory-Depth Requirements for Serial Data Analysis in a Real-Time Oscilloscope

Fieldbus Testing with Online Physical Layer Diagnostics

J.S. Mill s Notion of Qualitative Superiority of Pleasure: A Reappraisal

EFFECT OF REPETITION OF STANDARD AND COMPARISON TONES ON RECOGNITION MEMORY FOR PITCH '

Practice makes less imperfect: the effects of experience and practice on the kinetics and coordination of flutists' fingers

Article Critique: Seeing Archives: Postmodernism and the Changing Intellectual Place of Archives

SocioBrains THE INTEGRATED APPROACH TO THE STUDY OF ART

Example the number 21 has the following pairs of squares and numbers that produce this sum.

Sampling Plans. Sampling Plan - Variable Physical Unit Sample. Sampling Application. Sampling Approach. Universe and Frame Information

1/10. Berkeley on Abstraction

Characterization and improvement of unpatterned wafer defect review on SEMs

Part No./ 型号 : RGB-Controller-101

0:24 Arthur Holmes (AH): Aristotle s ethics 2:18 AH: 2:43 AH: 4:14 AH: 5:34 AH: capacity 7:05 AH:

Tutorial 0: Uncertainty in Power and Sample Size Estimation. Acknowledgements:

Next Generation Literary Text Glossary

Lejaren Hiller. The book written by James Bohn is an extensive study on the life and work of

ARIEL KATZ FACULTY OF LAW ABSTRACT

Simulation of DFIG and FSIG wind farms in. MATLAB SimPowerSystems. Industrial Electrical Engineering and Automation.

Exploring the Monty Hall Problem. of mistakes, primarily because they have fewer experiences to draw from and therefore

FILING AGRICULTURAL BULLETINS AND CIRCULARS

Notes on Digital Circuits

The Tentatve List of Enigma and Other Machine Usages, formatted by Tony Sale. (c) July March l945 page 1

PART II METHODOLOGY: PROBABILITY AND UTILITY

Immanuel Kant Critique of Pure Reason

Introduction. Page 1. Welcome to the signage guidelines for St John Ambulance premises, updated as of May 2013.

Composite Video vs. Component Video

Linear mixed models and when implied assumptions not appropriate

Instructions to Authors

Bootstrap Methods in Regression Questions Have you had a chance to try any of this? Any of the review questions?

Human Progress, Past and Future. By ALFRED RUSSEL WAL-

( ). London: The Library, University College London, 1976.

Western School of Technology and Environmental Science First Quarter Reading Assignment ENGLISH 10 GT

What is Character? David Braun. University of Rochester. In "Demonstratives", David Kaplan argues that indexicals and other expressions have a

S-DASH (2009) Risk Identification Checklist For Use in Stalking and Harassment Cases

Bas C. van Fraassen, Scientific Representation: Paradoxes of Perspective, Oxford University Press, 2008.

Use of Abstraction in Architectural Design Process (in First Year Design Studio)

The Most Important Findings of the 2015 Music Industry Report

Transcription:

Biometrika Trust The Meaning of a Significance Level Author(s): G. A. Barnard Source: Biometrika, Vol. 34, No. 1/2 (Jan., 1947), pp. 179-182 Published by: Oxford University Press on behalf of Biometrika Trust Stable URL: http://www.jstor.org/stable/2332521 Accessed: 23-09-2017 15:03 UTC JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org. Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://about.jstor.org/terms Biometrika Trust, Oxford University Press are collaborating with JSTOR to digitize, preserve and extend access to Biometrika

[ 179 ] THE MEANING OF A SIGNIFICANCE LEVEL BY G. A. BARNARD A level of significance is a probability. To say that a given result is significan level means that some class of events has probability 0.05. Now whatever theory we may hold as to the nature of probability, in order to give a statement of probability a precise meaning we must refer to some reference class, or set of data, on which the probability is calculated. What is the reference class involved in a level of significance? To many people the answer to this question seems simple enough. The reference class involved is the set of indefinite (possibly imaginary) repetitions of the experiment which gave the result in question. Otherwise put, the data, on which the probability, is calculated, are the external conditions of the experiment. The following example indicates, however, that the meaning of this reference class is not always clear. The example is a modified form of one given by Prof. R. A. Fisher in a letter to the author. Suppose we have a bag of chrysanthemum seeds, known to give plants having white flowers or plants having purple flowers, no other colours being possible. We suspect that the proportions of white and purple seeds are equal, and to test this hypothesis we select at random ten seeds from the bag, and plant them. Nine of the plants grow to maturity, and all of them have white flowers. On what level of significance can we reject the hypothesis of equality of proportions? We may assume that white and purple plants are equally viable. It would be natural to argue that, if white and purple flowers were equally likely, the probability of our result would be 1/29. If there is no reason to suspect an excess of white rather than an excess of purple flowers, we must add to this the probability of getting nine purple flowers, which is also 1/29, giving a total probability of 1/28. The hypothesis of equality of proportions would then be rejected on the 1/256, or the 0-3906 % level of significance. But if we did this our reference class would not be the set of indefinite repetitions of the experiment, in its ordinary meaning. A repetition of the experiment, in its ordinary meaning, would consist of another selection of ten seeds from the bag, and their planting and growth. On such another occasion all ten plants might grow to maturity, or all or some might die. These possibilities have not been taken into account in our calculation of probability, so far. To allow for the possible variation in the number of plants which grow, we might lay out the set of all possible results of the experiment as in Fig. 1, where n denotes the number of plants that grow, and r denotes the excess of white over purple. Thus any point in the figure can be referred to uniquely by its co-ordinates (n, r). If we now introduce a parameter p, to denote the probability (if it exists) that a plant will grow to maturity, given that it has been selected, the probability associated with the point (n, r) on the hypothesis of equality of proportions of white and purple will be W(nr;p) 10!= paila10-a _ n!2 t and since this is a function of the unknown p, we have a special problem of arranging the points (n, r) in order of significance before we can establish a test. The situation in this respect is similar to that dealt with in the paper on 2 x 2 tables, printed earlier in this issue (Barnard, 1946, pp. 123-38 above).

180 The meaning of a significance level Proceeding as in the earlier paper, we notice first that the same level of significance must apply to (n, r) as to (n, - r), so that we can confine our further considerations to the upper half of the diagram. Now in this half, the transition from (n, r) to (n + 1, r + 1) means we discover that one of the plants which failed to grow in our case, was in fact a white-flowered plant. In this case our conviction that there is an excess of white-flowered plants would be strengthened, so that (n + 1, r + 1) would be reckoned more significant than (n, r). Similarly, going from (n, r) to (n + 1, r - 1) would mean that a missing plant was found to be purple, and this would weaken our belief in an excess of white-flowered plants; consequently, 10 9 8 7 6 5 4 3 2 *.... - 1 0 fr -2-3 -4-5 0 1 2 3 4 5 6 7 8 9o 10 n -7 *-8-9 -10 Fig. - (n, r) would be reckoned more significant than (n + 1, r -1). Finally going from (n, r) to (n + 2, r) would mean growing two more plants, one purple and one white, and this would increase our tendency to believe in the equality of proportions. Consequently, (n, r) would be reckoned more significant than (n + 2, r). These principles taken together imply that points lying north-east, or west, of a given point (n, r), or between these two directions, would be reckoned more significant than (n, r); while, conversely, points lying east to south-west (inclusive) from (n, r) would be reckoned less significant than (n, r). The relative significance of points lying inside the half-quadrants north-east to east and south-west to west would remain undetermined. We could now proceed as in the paper (1), building up a test, consistent with the abo partial ordering, in such a way as to make the significance or otherwise of our result d as little as possible on any knowledge we may have about the value of p. But we need n carry this through for the result we have quoted, since our conditions by themselves r that the only points in the diagram which should be reckoned not less significant tha result are the points (9, 9),y (9, -9), (I10, 1 0) and (1 0, -10). The probability associated these four points is P(9, '9; p) = 2(1-p9(1-p)3. 2-9 +p'02-1) = (p/2)9 (20-19p) the maximum value of which occurs when p = 18/19, and is Pm(9, 9)-=00002413. Thus on this basis we should conclude that our result was significant on the 0-2413 %/ level.

G. A. BARNARD 181 The difference between the first resu negligible. Somewhat larger differences will be found in other similar cases, however, and it seems worth while to try to clarify the cause of the discrepancy. Consider three possible causes for the failure of the tenth plant to grow to maturity: (1) The bag from which the seed was taken is known to contain a proportion of dead seeds, which are physically indistinguishable from the live ones, and the tenth seed planted happened to be one of these. The conditions of growth were such that any live seed planted would have grown. (2) The tenth plant happened to be attacked by a soil pest, which destroyed it. (3) The statistician trod on the tenth plant while running for a bus; otherwise, it would have grown. If we now consider what would happen in these three cases if the experiment were repeated, in case (1) we should be just as uncertain as before how many plants would grow, out of those selected. In case (2), we might or might not happen to strike a good year for the pest in question, so that we might or might not have a similar accident recurring. In case (3) we should obviously give the statistician firm instructions not to be careless, and then we could be reasonably certain that all the plants selected would grow.* In the first case, we can suppose that the proportions of white, purple, and dead seeds in the bag are, respectively, P1, P2, and 1- (P1 +P2); and the purpose of our experiment is to test the hypothesis P1 = P2' In this case, putting P1 +P2 = PI we can clearly apply the analysis of Fig. 1, and the appropriate level of significance is 02413 %. In the third case, the situation actually realized is just what it would have been if we. had warned the statistician beforehand, and then thrown one of the ten seeds back into the bag. Thus our effective sample size here is 9, and the appropriate level of significance is 03906 %. In the second case, the answer depends on our attitude to the set of accidents of which the pest is a specimen. If this set of accidents is regarded as a stable set of chance causes we may be justified in representing its effect on the growth of our plants by the probability p. If, on the other hand, the incidence of such pests undergoes, say, regular cyclical fluctuations from year to year, so that its incidence is to some extent predictable, if not wholly controllable, then we should not be justified in assuming the existence of a real probability corresponding to our parameter p. We should, to be on the safe side, in this case allow for the possibility that experimental technique might improve in the future, to such an extent as to eliminate the possibility of such accidents. Thus, adopting this conservative attitude to our results, we should here treat the effective sample size as 9. The repetitions of the experiment which we have in mind would then be imaginary repetitions, in which experimental technique was supposed to be better than it is now, and we have as much control over pests as we have over statisticians. The general situation illustrated by this example can be described in terms of the notion of 'isolate' introduced by Prof. H. Levy (1931). In making an experiment, we try to construct an isolate-a system, or part of the world, which we suppose has relatively interaction with the rest of the world, and which, for practical purposes, may be considered on its own. This isolate may contain within itself all the systems of chance causes which are * It is not suggested that the three cases exhaust the multiplicity of types which might arise in practice. As Prof. Pearson has pointed out, if it were not the statistician, but his three-year-old son who was the vandal in case (3), we should have here a situation intermediate between our second and third instances.

182 The meaning of a diyjificance level regarded as affecting, to any practical extent, the results of the experiment. Such is the case in (1), where all the chance causes involved in the experiment are supposed given in the bag which is the subject of the experiment. Here, then, we are dealing with a 'good isolate', whose interaction with the rest of the world is really negligible, and chance causes operate within the isolate. In case (3), on the other hand, we are dealing with an imperfect isolate. The outside world, in the shape of the statistician, interacts with our isolate to an extent not negligible in practice. Fortunately, in this case we are able to construct a smaller isolate, consisting of the nine surviving plants, in which the interactions with the outside world are negligible. In case (2), there may be some doubt as to what isolate we are discussing. If we regard soil pests and such things as included in the isolate, and represent them as a stable set of chance causes, then we are entitled to analyse as in case (1); but if the pests are not included in the isolate, we should analyse as in case (3). Statistical tests are applicable to at least two types of experiment. First, to experiments in which the isolate studied contains within itself a system of chance causes which may influence the results. And second, to experiments in which the isolate studied is not a 'good' isolate, and the residual interactions with the rest of the world may affect the results. There may also be mixed cases. The distinction between the two types may also be brought out in relation to the necessity or otherwise of an 'artificial' randomization procedure, using random digits or the like. In the first type, such an artificial randomization procedure is not strictly necessary; for example, with our bag of seeds, the bag itself, and its physically indistinguishable contents, forms a perfectly adequate randomizer. We have in this case, as it were, an impermeable shield around the system, which prevents any external shocks from affecting the system. In the second type of experiment, we need to ensure that the interactions with the outside world will not mask the results we are interested in; and if we cannot ensure a practically complete separation from the outside world, then the effect of external intereactions must be randomized, by a special procedure. The randomization here acts like a shock absorber, specially placed around the experiment to distribute external shocks evenly through the system. In the first type of experiment, the reference class to which the significance level applies is in fact the set of indefinite repetitions of the experiment in question. In the second type of experiment, the reference class is an ideal set, in which the accidental influences of the outside world repeat themselves exactly, while the effect of these accidents on the system varies as a result of the special randomization. REFERENCES BARNARD, G. A. (1946). Significance tests for 2 x 2 tables. Biometrika, 34, 123. LEVY, H. (1931). The Univer8e of Science. London: Watts and Co.