RELATIONAL INDEXING J. FARRADANE* The reading, over the years, of many books and articles about subject indexing has not left me with any impression of well-defined principles, or even clear guidance for the indexing of complex material. I am concerned here with the detailed subject indexing of difficult and varied writings such as are found in a good scientific periodical or, more particuarly, in an abstracts journal in some scientific field. Such indexes require to be very specific and accurate, and as complete in reference to detail as one can achieve, finances permitting. You have only to consider the great value to science of such indexes as that provided for Chemical Abstracts to appreciate the importance of this matter. Nevertheless, I have never seen any exposition of any scheme providing a complete logical approach to such indexing; advice in choice of heading and avoidance of synonyms, suggestions on grouping headings and subheadings, and so on, are available in plenty, but they provide no reliable order of work. For some time, therefore, I have been driven to the conclusion that a very different approach is needed. This paper represents the results of such efforts as I have been able to make towards a theory of indexing; I have used these methods in practice for some time, in the indexing of an abstracts journal, and I think I may claim that, after some experience, they are rapid and reliable; the results are used not only in the form of the usual printed alphabetically ordered subject index, but also in the coded card index form, as I shall explain. Let us consider more exactly the nature and structure of an index. A list of single-word headings presents no complexities beyond those of synonyms and accurate representations of the matter on the different pages of the volume. Any subheading, however, represents a decision of emphasis and a selection of one concept in subordination to another. If this emphasis is only the result of the whim of the indexer, there is little likelihood of consistency. I suggest that the selection of subheadings should be the same as the selection of subordinate terms in any form of classification. The selection of sub-subheadings represents a further breakdown stage of classification. A subject index is, however, traditionally an alphabetically arranged list of terms and subheadings, etc. This may be regarded as an arrested stage of a classification, with alphabetical re-arrangement. You are no doubt aware that in the Universal Decimal Classification, for instance, where a complex subject may be represented by two or more decimal numbers connected by colons, the permutations of these complexes may also be used; this is similar to the normal cross-referencing entries in an alphabetical index. The principles of more modern types of classification methods provide an even better discipline for detailed subject indexing. Mr. Langridge, in a recent lecture to your Society,1 described a system of indexing based on what was virtually facet lthe Indexer, 1961, 2, in, pp.95-98. 'Based on a paper read to the Society in February, 1961. 127
classification; I have followed very similar principles in classification, though reached by different methods, and I find that the methods are also applicable to subject indexing. To describe the indexing, I must first deal briefly with this approach to classification. The older types of hierarchical classification are inadequate because the methods of subdivision from assumed main classes takes little cognisance of the fact that the principles of subdivision may vary; one may divide by the generic relation of genus into species, by properties or parts, by theoretical aspects, etc., and the mixed logic of such procedures has not been recognized. Furthermore, such classifications make little provision for complex mixed subjects, and hence for the detailed "depth" classification required today; the methods are also very inflexible. Ranganathan, in his Colon classification, provided a new approach, whereby the different aspects and possible divisions of classes are separated into different subordinate schedules of terms, called facets, which are then cited in a preferred order. This in turn provides some inflexibilities and difficulties, and the Classification Research Group has investigated, with considerable success in special classifications, principles of much freer selection of facets to suit a given special subject. There arise, nevertheless, difficulties of establishing a satisfactory preferred order of facets for a given subject, especially when that "subject" is a somewhat complex area of knowledge. We are in fact finding that these methods will probably not work for general classification, and the Group is now discussing, from scratch, the principles required for an improved general classification. It has for a long time seemed to me that the core of the difficulties in various classifications lay in the lack of definition of the relations implied between concepts in the placing of those concepts in any given order. I therefore attempted to analyse these relations. Language is far too imperfect a tool to attempt to define the relations on a basis of linguistic analysis; semantics offers little or no guide. I was therefore driven to a study of the psychology of thinking, and to such experimental facts as are available concerning concept formation and interlinking, and the processes of learning in general. The results are described in detail in an earlier paper2. For the present purposes I will go straight to the results of this approach, in the form of the following table: Mental time or memory Non-time Temporary Fixed Concurrent Not-distinct Conceptual clarity Distinct 2The Psychology of Classification. /. Documentation, 1955, II} 187. 128
This table represents the interactions of the two principal mental operations involved in the linking of concepts in the mind, which is the process of learning and understanding. A concept is the mental pattern resulting from a stimulus, or group of stimuli, when the pattern has sufficient clarity and memorizability to be given a verbal association, or name; for classification purposes we can really deal only with clearly distinguishable, or uniquely definable concepts, which I shall call isolates. The growing infant and child become increasingly aware of repeated, and hence memorizable, interrelations of certain concepts. At first (non-time) there is a mere concurrence of concepts; later, temporary associations are recognised, often by the surprise occasioned when the interrelation is not manifested. Finally, fixed associations are formed. The other progressively developing mental operation is that of recognition of similarities in parts of the patterns of different concepts, or, alternatively, the recognition of distinctness between concepts between which a relation is nevertheless clearly envisaged. The interactions of the stages of these two modes of learning yield nine categories of relations between concepts, which are mentally perceived though not explicitly definable in words. I have allocated easily typed symbols and trivial names to these categories of relations, merely for ease of reference and notation. The possible meanings of these relations have been obtained by analysis of complex subjects in terms of the two types of mental process. I will try and explain these meanings. Consider any two concepts "A" and "B". The oblique stroke implies some sort of relation between them, i.e. A/B, where B is subordinate to A in that A is the focus of greater interest in any given field of knowledge under consideration or classification. The relational symbol after the oblique stroke defines the actual relation. I would note at this point that the learning process involves an evolution or development of thinking from left to right and from top to bottom of the table. The top left relation is a mere awareness of the concurrence of two concepts without further definition of their interaction; the bottom right represents the most developed stage of learning. These categories of relations, which I have called operators, provide a sort of psycho-logical mathematics by which complex subjects can be analysed and manipulated in a variety of ways. The possible meanings of these operators can best be explained by examples. The Concurrence operator represents a mere juxtaposition of concepts such as one thing accompanying another; it also describes the relation of bibliographical form, e.g. an encyclopaedia of chemistry, to be written as Chemistry/9 Encyclo paedia. The relation written with an asterisk describes the process of comparison, but also fits the conditions for self-activity of something, e.g. man walking, bird migration, etc., written as Bird/*Migration. Association may well be understood in the Pavlov sense of the induced fixed association, and covers associations where the exact linkage is not properly clarified; it expresses the relation of, for example, a process and the tool used for it; indeed the words "B for A" will very often be representable as A/; B. It also is the operator for abstract properties, or any attributes which are not intrinsic but which are imposed by man's thought, e.g. the efficiency of a machine, written as Machine/; Efficiency. The Equivalence 129
operator represents the recognition of some degree of identity of ideas; it will represent the use of things in a different capacity, such as turnips as fodder, written Turnips/=Fodder, or platinum as a catalyst. The Dimensional operator covers position in space or time, but also expresses the relation of the temporary state of a thing, e.g. its temperature, electrical charge, etc. The Appurtenance operator is used for physical or intrinsic properties, parts or organs of things, and is in fact also the well-known generic relation. Distinctness is a less obvious operator, but it comes into use in expressing the relation between substitutes or imitations and the original thing; there may possibly be other meanings to be discovered for this operator as more complex subjects throw up cases difficult of analysis. It is to be noted, however, that I have not yet found any complexity of relation which on analysis has not dropped into place in one category or another. The Reaction operator is straightforward, and expresses the action of any thing "B" on another thing "A", when written as A/-B. It is the normal relation for the action of a process on an object, or of one object on another. Finally, the operator I have called Causation would perhaps be better described as Functional Dependence, and expresses the idea of B arising out of A, or B caused by A. It is the relation between the product and the raw material, e.g. Wheat/: Bread. It also expresses, incidentally, the relation of an author to his book. These are difficult conceptions, and I hardly dare risk trying to express them by any regular use of words. You have only to consider the very different meanings of the word "for" in such phrases as "Votes for women", "an eye for an eye", "pressed for time", or the implications of the word "of" in such phrases as "pack of cards", "odour of musk", "sport of kings", to see that language is often quite unreliable. The most I can suggest is that practice in the analysis of subjects comprising more than one concept very soon familiarizes one with the operators which are applicable to different linguistic situations. The process tool relation, the abstract property relation, the physical property relation, the generic relation, the whole part relation, the relation of a thing to its position or date of existence, the substance process relation, etc. these are all easily recognized common situations in the descriptions of situations and events, especially in scientific work where facts are more clearly stated. It is now possible to write in linear form a sort of shorthand description of any subject, simple or complex. The result may look like this: A/-B/;C/(D. An actual example might be: Glass/-Cutting/;Knife/(Hardness. I call such a symbolized statement an analet (i.e. a small analysis). The meaning can almost be seen by reading the words as they are written, disregarding the symbols. A better check is obtained by reading them backwards 1 For example, "hardness of a knife for cutting glass"; note how one automatically "reads in" suitable prepositions. There are more complex cases where one thing is related to more than one other thing at the same time, and this is overcome by a sort of mathematical symbolization, using square brackets, e.g. Sugar[/-Hydrolysis/;Acid] /: Glucose. Where there is an operator between the isolate and the bracket, the bracket can be disregarded in considering how the relation applies; where there 130
is no operator between an isolate and a bracket, the bracket forms a barrier, and the operator and the isolate beyond the bracket are related to the isolate outside the bracket at the other end; thus glucose comes from sugar which is hydrolysed by acid; there is no direct relation of the glucose to the acid. The linear expression of complex subjects in this manner enables one to see very quickly those concepts which are connected by the same relation or operator to one given concept, and one can group such subordinate concepts together. In this way one can analyse a whole field of knowledge and construct a reliable system of classification. The method can in fact be used to make a classical type of hierarchic classification, or a modern faceted classification, or other types. We are, however, concerned here with indexing, to which the relational method is easily applicable. There is one new feature to be introduced, that of permutation. It will be obvious that the order of the concepts can be permuted if the operator symbols are suitably manipulated, e.g. reversed as required. Thus A/-B is clearly the same as B-/A. The number of permutations of any given analet is, however, limited by the necessity of maintaining the interrelations exactly. Thus with the four-term analet A/-B/;C/(D, you cannot have the permutation A/C/B/D, even with the use of square brackets, since it will be found that brackets overlap, e.g. A/-[C[;/]B]/(D, leaving an operator by itself between brackets, and other impossibilities, or logically barred situations. Another useful rule is that if a permutation of the original analet in classification form (where all the operator symbols appear on the right of the oblique stroke) starts with a later isolate, it is best to return to the earliest isolate of the classified form as soon as possible, e.g. C[;/B-/A]/(D and not C[/(D];/B-/A. The latter is not impossible, but is far less suitable for use in indexing. Hence with four terms the number of useful permutations will not be 24 (which is logically impossible), but in most cases only four, or even less, since the later isolates, e.g. "D", will probably not be useful indexing terms, at least in many cases. Now let us consider the practical task of making a subject index for a volume of scientific papers, or for an abstracts journal. Let us assume that each paper or abstract is reasonably homogeneous, i.e. it concerns only one, though complex, subject. If the degree of indexing required, though detailed, is not too elaborate, it will be found, perhaps surprisingly, that in almost all cases the subject can be represented by an analet of three or four terms, and almost never more than six. Thus a paper on the effect of cyanide on biosynthesis of ascorbic acid in rat liver in vitro can be expressed as: Ascorbic Acid/-Biosynthesis/; [Rat/(]Liver[/+Glass apparatus]/-potassium Cyanide. You will note how the somewhat vague or loose language of the title of the paper is re-organized by the logical analysis, and that it is the liver which is in the glass apparatus (in vitro) and which is acted upon by the cyanide. With practice, such analet writing can be carried out at an average of two minutes per analet. The permutations can then be written out very quickly, in fact I usually just write out permutations in the form of numbers (the isolates being represented in order as 1,2,3,4, etc.) e-g- 2[-/l]/;4()/3)[/+5]/-6, and my secretary writes out the full forms. In this 131
example, it will be noted that isolate 4 (Liver) is connected by different relations to four other isolates, and this makes the linear writing of some permutations a little difficult; the difficulty is overcome by the further use of round brackets to indicate a sort of "sideways" relation to one of the isolates, this notation preferably being used for the less important isolate in question. When the original and the required permutations have been typed on cards, these can be sorted into alphabetical order of the first term, with sub-placings by the second term, etc. The normal sort of alphabetical subject index can then be written directly from the cards. The terms are in each case used exactly as given and in the order provided by the analet or its permutation on any card; suitable prepositions are introduced to express the meaning, and a complete sentence is formed in each case, though the point of beginning of the sentence will be dictated by the analet. A barrier square bracket is usually an indication of a hiatus in logical expression and may be the place for a full stop. Thus the above analet and its permutation appear as the following index entries: ASCORBIC ACID biosynthesis by rat liver in vitro. Potassium cyanide effect on. and BIOSYNTHESIS of ascorbic acid by liver of rat in vitro. Potassium cyanide effect on. It will be noted that the normal looseness of language reappears quite easily without affecting the logic of the subject or the accuracy of representation. This method has now been in use for the preparation of subject indexes to an abstracts journal for several years, and has been found to be quick and entirely reliable. The cards are also kept in drawers in a progressive amalgamation of all the years, and provide a consolidated permuted classified index for the whole period. The retrieval of information from this index has been found to be simple and quick. I suggest that this method of relational indexing could equally be used in a simplified form for the task of indexing books of the more usual sort. The degree of detail could then perhaps be much reduced. The task would be greatly aided if the various terms to be used as headings in the index could be standardized, at least to some extent. This would best be achieved by making a preliminary set of schedules in facet form, as was outlined by Langridge (loc. cit.). The facet form represents an implied relational order, and can be prepared in the first place from consideration of a relatively small number of trial analets prepared for the various subjects dealt with in the book. The detailed indexing will then be carried out by analysing subjects from the different pages only in the terms available in the schedules, as far as may be possible. If extraneous terms appear, they can be used with caution, as essentially required. In this rather short exposition I fear I have omitted certain occasional complexities which may arise, and the methods of dealing with them. These have been dealt with in my previous papers (loc. cit.). I should have emphasized that a concept cannot be connected to two other concepts by the same relation, e.g. B)/A/(C. This should be written as A/ Q j. Amongst simpler problems, it may be worth mentioning that where a single word is not available to express a concept, it is best to start both words with a capital letter, so that the second 132
word is not accidentally used as a subheading; likewise, proper names can be cited in inverted commas. Vague words such as "determination" should be avoided, or more exact terms, such as analysis, measurement, etc., should be used instead. Vagueness of language and terminology occasionally causes difficulties in analysis, but these are usually overcome by commonsense modifications. Finally, I would wish to emphasise that relational indexing provides a reproducible logical system which is to a great extent self-regulatory. Logical mistakes become particularly evident when preparing permutations; it will then be found that the expected permutations somehow cannot be made without breaking the rules of maintaining correct relations between concepts, and reexamination of the analet will then soon show where the fault in analysis has been made; the fault may be the omission of some isolate in the series, especially perhaps some term that was implied, but not actually written, in the subject matter; alternatively, the fault may lie in the use of an incorrect operator, or in an inaccurate or illogical interconnection of concepts. Relational indexing is in fact a formalized exact method of representing the inner logic of our thinking, or a scientific symbolic language. I have endeavoured to describe here how it can be used for classification and indexing. It is my hope that it will have wider applications in other fields. OBITUARY LESLIE ERNEST CHARLES HUGHES (1904-1961) Following so soon after the death of our Hon. Membership Secretary, Mr. A. T. H. Talbot, the sudden death of Dr. L. E. C. Hughes, Ph.D., M.I.E.E., our Hon. Assistant Secretary, on Friday, June 9th, 1961, came as a severe shock to the Society. He had served as Hon. General Secretary from 1959-60, since when he had assisted Mr. Norman Knight, and regularly attended our meetings, taking an active part in promoting the interests of the Society. L. E. C. Hughes was born in London and educated at Quintin School, City and Guilds College, and Imperial College, where he later became a lecturer. Keenly interested in research and technology, his main interest was in engineering, and his services as a technical editor were in great demand. He edited Chambers's Technical dictionary and Heywood's Electronic engineer's reference book, and compiled numerous indexes to technical publications. A member of several societies, including Aslib and P.E.N., he was past-president of the British Sound Recording Association, the Institute of Electronics, and the H. G. Wells Society, and was a Fellow of the Royal Society of Arts, to which he was also a Cantor Lecturer. Dr. Hughes was a keen Mason, and belonged to several Lodges. The activities of Dr. Hughes were many and varied, and he was particularly interested in editing and indexing. He was about to take up an appointment as editor of a new technical journal when he expired shortly after attending a meeting on behalf of the Society. We extend our sincere condolences to his widow. Dr. Hughes was cremated at Golders Green Crematorium on June 14th, four officers of the Society attending the service. J. L. T. 133