Evaluating left and right hand conducting gestures A tool for conducting students Tjin-Kam-Jet Kien-Tsoi k.t.e.tjin-kam-jet@student.utwente.nl ABSTRACT What distinguishes a correct conducting gesture from an incorrect one? The left and right hand conducting motions of both professional and student conductors are recorded using motion capture. Only the wrist data of both arms is analyzed. The main extracted features are turn and tick points. These features are tested to see if they conform with the desired beat pattern. The extracted patterns contain enough information for the assessment of the basic conducted gesture. Keywords Music conductor, conducting gestures, assessment, feedback, recognition, motion capture. 1. INTRODUCTION Conducting is a long existing profession and a whole gesture language has evolved over the years. [Kol04] gives a classification of conducting gestures by grouping them by their intended effect on the musical performance of the ensemble. He mentions that many of the expressive gestures are performed by the conductor with the left hand while time-beating gestures are almost exclusively done with the right hand. This contrasts with [Bro66], who says that a properly trained right hand gives both the tempo and the character of the music. In this paper we are interested in the basic conducting techniques taught in a freshman s course on conducting. The right hand should give both the tempo and the character of the music. The left hand supports the right hand by giving (entry / cutoff) cues, or by mirroring the right hand. There are many advanced practicing tools for students of all sorts of professions, but we lack a practicing tool for the student conductor. Valuable teaching hours are spent teaching the students how to master the basic gestures and techniques, which decreases the available time to teach more advanced material, like guidance in developing a more personal technique and practicing other forms of communication. The development of systems that can follow a human conductor, such that they can replace an entire orchestra [BS07], raise discussions of scenarios where complete ensembles will be replaced by much cheaper computer systems. [Mar00] cites a music student commenting on the virtual orchestra, providing the reader with insights to why this is unappealing: the orchestra sounds so real, making it a cost effective substitute for the unseen musician, this brings around a sense of insecurity to the average traditional musician. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission. 7th Twente Student Conference on IT, Enschede, June 25 th, 2007 Copyright 2007, University of Twente, Faculty of Electrical Engineering, Mathematics and Computer Science There is no clear cut definition of the right conducting technique, many styles are taught. In this research, an application is built to analyze the motion captured data and to automatically extract beat patterns, allowing us to manually perform some basic comparisons on these beat patterns. 1.1 Research questions The main question is: how can we assess basic conducting movements with a minimum amount of information? With basic conducting movements we mean the 2-, 3- and 4-beat patterns and the indication of entrance and cutoff cues. 1.1.1 Sub questions 1. When is a conducted motion correct or incorrect and why? 2. What are the common conducting errors made by beginners? 3. How can we model correct and incorrect gestures in terms of location and time of both wrists, to allow comparison? A static approach is taken in this research; the conductor will not receive any live feedback from the system. The gestures of the conductor are recorded during practice sessions. Features in a particular gesture, like turning points in an N-beat pattern, are compared with features in other similar patterns made during the same practice session. The remainder of this paper is organized as follows: section 2 gives a summary of related work. Section 3 gives a basic introduction on conducting gestures. The proposed method is given in section 4. Section 5 presents the results, section 6 presents the conclusion and section 7 discusses the results. 2. RELATED WORK Early systems in the area of computer-based conducting gesture recognition such as the Mechanical Baton, Radio Baton, Personal Orchestra [BLS+04] and the system from Morita [MHO91], concentrated on the extraction of the right hand or baton-in-right-hand gestures. The main tracked parameters, the position of the (baton in the) right hand in 2d space, contain beat and amplitude information. In Conga [LGK+06], a system is built which can follow gestures of amateurs. That system works as long as the necessary features, such as turning points and speed, can be extracted. In Conga, a gesture is interpreted as a sequence of features. It is able to recognize several beat patterns and has over 90% recognition rate. However, it does not give any practical feedback to the conductor, helping him to improve his skills. If there were to be a tool to help a student conductor master the basic conducting gestures, then that tool should assess the movement of both hands. The left and right hand movements can be completely different from each other, and that makes conducting a big challenge.
There are no known systems that rate or give feedback on the gestures of a (student) conductor at the time of this writing. Existing research on conducting gestures focuses mainly on the right hand and does not focus on the 'why' when a gesture is wrong. Some systems try to extract the beat pattern on the fly; others focus on the interaction between the conductor and the ensemble, also adjusting many things on the fly. 3. CONDUCTING This section gives some introductory background information on some of the conducting techniques taught in a freshman s year. It is surely not the intention to highlight all aspects of a freshman s course on music conducting. 3.1 Basics The basic gesture repertoire of a freshman s conductor course consists of the five (1-, 2-, 3-, 4- and 6/8-beat) beat patterns. In general, the conductor gives one extra (preparatory) beat in tempo before the music begins. Usually, the right hand gives the beat and the left hand either mirrors the right hand (when the right hand goes to the right, the left hand goes to the left, right hand goes up, left hand goes up, etcetera.) or indicates entrance cues, cutoff cues, fermatas and dynamics. Fermatas indicate that a note, or rest, should be sustained longer than the current tempo suggests. A common way of indicating fermatas is by simply halting all movement for a short while. Dynamics can be indicated in a number of ways. One, by increasing or decreasing the size of the beat pattern, this can be done with both hands. Two, by moving the left hand atilt upwards or downwards from the body, or three, by just letting the palm (of the left hand) face up or down. The conductor is assumed to be conducting righthanded; the right hand performs the beat patterns and the left hand supports the right hand. Figure 1 and 2 show two well known four-beat patterns. Notice how each beat has its own place; these are examples of patterns with scattered beat points. In a pattern with a centered beat point, all beats occur at that single point. In this paper, we use beat patterns with a centered beat point. Figure 2: a 4-beat pattern. All beats are on the same conducting line 3.2 Common errors We briefly discuss the three main common errors freshman conductors make, according to our conversations with other conductors. First, the hand 'trails' the wrist. It looks as if they are painting and this makes distinguishing the beat difficult. What is the beat? The wrist that stops its swing or the hand that stops moving after the wrist has stopped its swing? Second, the elbows hang near the waist while conducting. These should not be hanging near the waist but should be pointing sideways away from the body. People tend to relax and drop their elbows. Third, the ticks of the beat pattern are not on a clear conducting line. A conducting line is an imaginary surface that should be around waist height. Note that whereas some schools/books state that all beats should be on this conducting line, other schools/books may not do so. For instance, figure 1 does not have a conducting line, while figure 2 does. There are of course numerous other errors like not halting all movement before starting, not having and sustaining a clear tempo, introducing superfluous movements, etcetera. 4. METHOD 4.1 Getting the data Using Vicon as our motion capture system, we recorded our participants by placing infrared markers on the wrist, elbow and shoulder joints. Each data frame contains the x, y and z coordinates per marker, and a frame is recorded at a fixed interval of 120 Hz. The shoulders are aligned along the x-axis. The y-axis is the vertical axis. Figure 3 shows the position of the three axis. Figure 1: a 4-beat pattern. The numbers indicate where beats are marked and should sound in the gesture. There is one very important characteristic that applies to all beat patterns: the first beat of the bar, the one, must always be clearly distinguishable from the others. This is done by performing a straight downward motion. The starting point of this motion is also the highest (turn) point in the whole pattern. Figure 3: position of the x, y and z axis All our participants have participated voluntarily. Among the participants we have a professional conductor, Valentijn Smit, a student conductor of his, A. Smelt- van Dijk, and someone who attended a short class on conducting basics, Kien Tjin-Kam-Jet. The student conductor is left-handed; she uses her left hand to give the beats and her right hand to give the cues. This is compensated by assigning her left hand as the beat hand, her right hand as the cue hand and then mirroring the data on the x-axis. From now on, we refer to the hand which gives the beats
as the beat hand, and similarly, we refer to the hand which gives the cues as the cue hand. We recorded several 2-, 3- and 4-beat patterns, both staccato and legato. The expert and student were also asked to conduct exercise 0304 of het slagtechniekboek by Eduard Nieland [Nie01]. In this exercise the conductor should indicate the following: which notes are staccato and legato, when are the entry/cutoff cues and fermata, and what are the desired dynamics. We started recording each session after the participants were standing still. This means that most of the session s data actually belongs to a real beat pattern. Below are pictures (figure 4 and 6) of the beat hand of several recorded sessions to give an impression of the sampled data. Figure 5: a schematic view of a 3-beat pattern. The start of the arrow is the turn point, the head of the arrow is the tick point. The second test checks whether the 2 nd turn is left above the 1 st tick and whether the angle (b) between a vertical line and the line connecting the first tick and second turn is bigger than (a). The third test checks if the 3 rd turn is to the right and above the 1 st tick and whether the angle (c) between a vertical line, the 1 st tick and 3 rd turn, is bigger than (a). This enforces the pattern to have a clear and distinguishable one. There are two more (trivial) constraints. First, a second and third tick point must exist. Second, the turn and tick points must be ordered in time such that the first turn occurs before the first tick, the first tick occurs before the second turn, the second turn occurs before the second tick and so on. This allows the system to detect a wide range of 3-beat pattern conducting motions, even patterns with (very) scattered tick points. Figure 4: exercise 0304 by Valentijn Smit 4.2 Analyzing data We extract the beat patterns by first extracting what we call tick points and turn points from the beat hand. A tick point is the location in the beat pattern where the beat sounds. A tick point is typically characterized by a bouncing movement on the y- axis. As long as there are more beats, the hand continuously prepares for next beat. A turn point is the location of a visible change in both direction and speed on the x- and y-axis in preparation for the next beat. Once these ticks and turns have been extracted, a series of rather straight-forward tests are done, depending on the beatpattern. Here follows a brief description of a three-beat pattern; there is an upward motion followed by a straight downward motion. This defines the first turn and tick point respectively: see the left solid arrow in figure 5. Then, the hand slants up to the left in preparation for the second tick, turns and slants down right. After the second tick, the hand slants up to the right, turns, and slants down to the left again, thereby finishing the 3- beat pattern. In our tests for the 3-beat pattern, the first test is to check the angle (a) between a vertical line and the line connecting the first turn and first tick point. This angle (a) must not be greater than 30 degrees. A big margin is used because there is almost no straight line for the one in practice. We also do not want to impose too stringent requirements in order for a complete conducted pattern to be detected by the system. Figure 6: a session of 3-beat pattern by Kien The tests are mainly based on the beat patterns found in [Bro66] and on conversations with conductors and thoughtful observation. 4.2.1 Detecting ticks and turns There are several ways to identify a tick point and a turn point. In the next subsections we discuss two approaches to find tick and turn points. 4.2.1.1 Speed and direction Obviously, when the direction changes dramatically in the opposite direction, say with an angle of at least 90 degrees, there is a turn or tick point. When the speed is at a local minimum, there might be a turn or tick point. When the acceleration reaches a local maximum, there is a turn point. We did some pilot tests with this method and the results were not so promising. Almost all detected turn points were not near any actual turn. Therefore, we came up with the next solution.
4.2.1.2 The Chain-Principle One way to rule out the problems when using speed and acceleration is to identify a chain of points with the following characteristic: the x and y distance of all points in the chain with their (following or previous) neighbor is smaller than some tuple D R 2. This chain must also have a minimal length of L. Once such a chain has been found, we take the centered chainpiece as our turn point. The philosophy is that the deceleration towards the real turn point and acceleration away from the turn point will be almost symmetrical; hence we take the center of the chain. This is based on [Bro66] who says that the motion is like bouncing a golf ball on pavement and a beat should never stick on the bottom or it can not clearly indicate a point in time. Of course, we must still determine the values for D and L. For D, we choose the average neighbor-difference (along the x-axis and y-axis) of the inspected chain-piece. Setting L to a value of 5 works well in practice. Using this method, the first tick, the one, should be detectable. But the next tick points in a 2-, 3- of 4-beat pattern may not be detected. This is because the tick occurs in a rapid sideways movement, which makes the neighbor-distance too big. One solution is to look for chains with the inverse criteria, and selecting the lowest point of that chain. Note that it is not completely the inverse, we only look for chains where the x- distance is greater than the x component of D. The y-value of these chains will probably be smaller than the average y- difference, the y component of D. 4.2.2 Detecting the beat pattern Using the chain principle sounds simple and promising. Now we need to find a collection of ticks and turns which satisfy the criteria for an N-beat schema. All detected patterns must satisfy the following criteria 1. There is always one, and only one, turn before a tick 2. The downward line of the one has a maximum angle of 30 degrees compared to a vertical line. But using the chain principle to detect ticks and turns might pose a problem when a very large session is analyzed. For instance, the user might be standing still for a long period, or the overall shape of the beat-pattern is small, possibly indicating that the ensemble should be quiet. Whatever the reason, this will affect the average neighbor-difference. So, ideally, we should try to take just enough data samples to determine the actual neighbor-difference in the beat pattern in order to find the right ticks and turns. If we know in advance, the real tempo, the beat-pattern and the sample-rate, we can calculate how long it should take to complete one full beat pattern and thus the amount of data samples to analyze. Using a metronome to give a full beat at the desired tempo should further help determining the start position of the real beat pattern data. Since our recorded sessions were relatively short, doing an average of 10 beats with a tempo of around the 60 beats per minute, we did not use a metronome. 4.2.2.1 Storage of the beat pattern In our view, a beat pattern can be stored by storing the coordinates of its turn points and its tick points. With this information, it is possible to give feedback on many aspects such as: tempo, dynamics, and accuracy of the intended shape. It is also possible to verify the clearness of the conducting line, and consistency of the tick points: is the second tick always on the right/left side of the first tick? And so on. Given only the ticks and turns, it is difficult to distinguish musical expressions such as staccato or legato. But still, the length of the line from turn to tick and the angle of that line with the horizon might convey some information. Examples of extracted beat patterns are shown in appendix A; a 4-beat pattern is analyzed and 8 out of 12 full patterns are correctly extracted. 5. RESULTS When analyzing the beat hand, using the chain-principle to find ticks and turns, we obtained the following results: From the 190 2-beat patterns performed by the participants, 147 were correctly detected. All missed 2-beats were false negatives and there were no false positives. 15 out of the 51 3-beat patterns were detected. One particular person had a one that looked more like a two, therefore most patterns were not considered to resemble a 3-beat pattern, hence not recognized. There were no false positives. 14 out of 49 4-beat patterns were correctly detected. There was one false positive; a 4-beat pattern was detected while in the middle of two actual 4-beat patterns. Analyzing the trajectory of the cue hand produced mixed results. Even if the cue hand is stationary and only gives entrance and cutoff cues at some times, many turn points are detected. If the cue hand mirrors the beat hand, it is also difficult as some beat patterns may contain shapes of entrance or cutoff cues. 6. CONCLUSION We assess basic conducting movements with a minimum amount of information by looking only at wrist data and finding a set of turn points and tick points which satisfy certain constraints. These constraints differ per pattern, for instance, in a 3-beat pattern, the 2 nd turn must be to the left of the 1 st tick, while in a 2-beat pattern it must be to the right. Once these constraints are applied for some n-beat pattern, the resulting n- beat pattern(s), if any, can be evaluated. At the moment, the system shows pictures of the extracted patterns, leaving real evaluation to experts. 7. DISCUSSION The biggest problem occurs when too many false ticks and turns are detected. In future work, these could be filtered. Our experiments show that staccato beat patterns have high turn points and sharp ticks. A sharp tick causes the hand to strongly recoil, which results in a high turn point. The strong recoil causes a high deceleration and acceleration to happen near the tick point. Legato patterns are more curved than staccato patterns. Because of the curvedness of a legato pattern, the acceleration at a tick point won t be as high as the acceleration of a tick point in a staccato pattern. So, in order to accurately determine the expression the pattern dictates, the above model could be extended with two pieces of information: one piece describing the deceleration and acceleration at the tick point and another piece indicating the curvedness of the line between the tick and the turn point. The proposed method for finding the patterns has difficulty with 3-beat and 4-beat patterns. Of course, you can design a system that recognizes as many patterns as possible, but that is not really the objective in this case. Literature only shows static pictures of different patterns, showing what is right and what is
wrong. In practice many conductors alter or deviate completely from these patterns. By performing simple tests we are one step closer to defining what is actually a correct conducting movement, and what is not. Conversations with conductors also have mixed outcomes about a practicing tool which only records wrist data. The conducting gestures constitute at most 15% of the total communication flow between conductor and ensemble. Traditionally, practicing conducting motions with some sort of feedback can be done by using a (big) mirror and drawing the so called conducting line on the mirror. Real-life examples can be obtained from video recordings of expert conductors. But, a practicing tool could randomly select appropriate exercises and give feedback on the conducting motions. The whole point of the conducting movements is to somehow manipulate the ensemble to react as the conductor wishes. The bottom line is that the gestures used in the communication between conductor and ensemble must be understandable and anticipatable. Pictures such as those in figure 1 are deceiving since they imply that the 2 nd, 3 rd and 4 th ticks occur at fixed heights and lengths compared to the 1 st tick. Practice shows that what is percepted as the actual tick is really the lowest point in a given curve of the beat pattern and our proposed method has a high tick detection rate. 8. ACKNOWLEDGMENTS I would like to thank my instructors Wim Fikkert and Herwin van Welbergen and also Dennis Reidsma for their patience and valuable input for my research, and my peer reviewers Maurits Diephuis and Mike Lansink. Special thanks to Valentijn Smit, A. Smelt- van Dijk, and Daphne Wassink for their participation. REFERENCES [Mar00] Marrin Nakra, T. Inside the "Conductor's Jacket": Analysis, Interpretation and Musical Synthesis of Expressive Gesture. PhD Thesis. Massachusetts Institute of Technology, Cambridge, 2000. [MHO91] Morita, H., Hashimoto, S., and Ohteru, S. A Computer Music System that Follows a Human Conductor. Computer, Volume 24 Issue 7, pages 44-53. DOI= http://dx.doi.org/10.1109/2.84835. 1991. [BLS+04] Borchers J., Lee E., Samminger W., and Mühlhäuser M. Personal orchestra: a real-time audio/video system for interactive conducting. Multimedia Systems, Volume (9): 458-465. DOI=10.1007/s00530-003-0119-y. 2004. [BS07] [Kol04] Bianchi and Smith. Virtual Orchestra, http://www.virtualorchestra.com/, (01/03/2007) Paul Kolesnik. Conducting Gesture Recognition, Analysis and Performance System. McGill University, Montreal, 2004. [LGK+06] Lee, E., Grüll, I., Kiel, H., and Borchers, J. Conga: a framework for adaptive conducting gesture analysis. In Proceedings of the 2006 Conference on New interfaces For Musical Expression, 260-265. New Interfaces For Musical Expression, France, 2006. [Bro66] Brock McElheran. CONDUCTING TECHNIQUE For Beginners and Professionals. ISBN 0-19- 501825-7. OXFORD University press, New York, 1966. [Nie01] Eduard Nieland. Het slagtechniekboek. ISBN 90-802545-9-2. Stichting UNISONO, Netherlands, 2001.
APPENDIX A: PRELIMINARY TEST RESULTS Below we show a picture of a recorded session of 12 4-beat patterns. Using the chain method, we identified the turns (squares) and the ticks (circles). The system correctly recognized the first 8 4-beat patterns, which are also shown below from left to right. Figure 7: a session of twelve 4-beat patterns
Figure 8: pattern 1 Figure 9: pattern 2 Figure 10: pattern 3 Figure 11: pattern 4 Figure 12: pattern 5 Figure 13: pattern 6 Figure 14: pattern 7 Figure 15: pattern 8