SONG STRUCTURE IDENTIFICATION OF JAVANESE GAMELAN MUSIC BASED ON ANALYSIS OF PERIODICITY DISTRIBUTION

Similar documents
Following a musical performance from a partially specified score.

Statistics AGAIN? Descriptives

System of Automatic Chinese Webpage Summarization Based on The Random Walk Algorithm of Dynamic Programming

RIAM Local Centre Woodwind, Brass & Percussion Syllabus

A STUDY OF TRUMPET ENVELOPES

Optimized PMU placement by combining topological approach and system dynamics aspects

LOW-COMPLEXITY VIDEO ENCODER FOR SMART EYES BASED ON UNDERDETERMINED BLIND SIGNAL SEPARATION

Instructions for Contributors to the International Journal of Microwave and Wireless Technologies

QUICK START GUIDE v0.98

Modeling Form for On-line Following of Musical Performances

Automated composer recognition for multi-voice piano compositions using rhythmic features, n-grams and modified cortical algorithms

Decision Support by Interval SMART/SWING Incorporating. Imprecision into SMART and SWING Methods

tj tj D... '4,... ::=~--lj c;;j _ ASPA: Automatic speech-pause analyzer* t> ,. "",. : : :::: :1'NTmAC' I

The UCD community has made this article openly available. Please share how this access benefits you. Your story matters!

Hybrid Transcoding for QoS Adaptive Video-on-Demand Services

Failure Rate Analysis of Power Circuit Breaker in High Voltage Substation

Why Take Notes? Use the Whiteboard Capture System

Simple VBR Harmonic Broadcasting (SVHB)

Anchor Box Optimization for Object Detection

Small Area Co-Modeling of Point Estimates and Their Variances for Domains in the Current Employment Statistics Survey

Technical Information

Integration of Internet of Thing Technology in Digital Energy Network with Dispersed Generation

Novel Quantization Strategies for Linear Prediction with Guarantees

A Quantization-Friendly Separable Convolution for MobileNets

arxiv: v1 [cs.cl] 12 Sep 2018

A Comparative Analysis of Disk Scheduling Policies

MODELING AND ANALYZING THE VOCAL TRACT UNDER NORMAL AND STRESSFUL TALKING CONDITIONS

THE IMPORTANCE OF ARM-SWING DURING FORWARD DIVE AND REVERSE DIVE ON SPRINGBOARD

Error Concealment Aware Rate Shaping for Wireless Video Transport 1

Study on the location of building evacuation indicators based on eye tracking

AMP-LATCH* Ultra Novo mm [.025 in.] Ribbon Cable 02 MAR 12 Rev C

Detecting Errors in Blood-Gas Measurement by Analysiswith Two Instruments

Multi-Line Acquisition With Minimum Variance Beamforming in Medical Ultrasound Imaging

SKEW DETECTION AND COMPENSATION FOR INTERNET AUDIO APPLICATIONS. Orion Hodson, Colin Perkins, and Vicky Hardman

Accepted Manuscript. An improved artificial bee colony algorithm for flexible job-shop scheduling problem with fuzzy processing time

AN INTERACTIVE APPROACH FOR MULTI-CRITERIA SORTING PROBLEMS

Reduce Distillation Column Cost by Hybrid Particle Swarm and Ant

Analysis of Subscription Demand for Pay-TV

Lost on the Web: Does Web Distribution Stimulate or Depress Television Viewing?

Simon Sheu Computer Science National Tsing Hua Universtity Taiwan, ROC

Quantization of Three-Bit Logic for LDPC Decoding

Modular Plug Connectors (Standard and Small Conductor)

A Scalable HDD Video Recording Solution Using A Real-time File System

Simple Solution for Designing the Piecewise Linear Scalar Companding Quantizer for Gaussian Source

User s manual. Digital control relay SVA

Product Information. Manual change system HWS

Product Information. Manual change system HWS

Craig Webre, Sheriff Personnel Division/Law Enforcement Complex 1300 Lynn Street Thibodaux, Louisiana 70301

Critical Path Reduction of Distributed Arithmetic Based FIR Filter

TRADE-OFF ANALYSIS TOOL FOR INTERACTIVE NONLINEAR MULTIOBJECTIVE OPTIMIZATION Petri Eskelinen 1, Kaisa Miettinen 2

current activity shows on the top right corner in green. The steps appear in yellow

T541 Flat Panel Monitor User Guide ENGLISH

Cost-Aware Fronthaul Rate Allocation to Maximize Benefit of Multi-User Reception in C-RAN

Scalable QoS-Aware Disk-Scheduling

Improving Reliability and Energy Efficiency of Disk Systems via Utilization Control

Expressive Musical Timing

AIAA Optimal Sampling Techniques for Zone- Based Probabilistic Fatigue Life Prediction

Correcting Image Placement Errors Using Registration Control (RegC ) Technology In The Photomask Periphery

Discussion Paper Series

Production of Natural Penicillins by Strains of Penicillium chrysogenutn

Clock Synchronization in Satellite, Terrestrial and IP Set-top Box for Digital Television

The Traffic Image Is Dehazed Based on the Multi Scale Retinex Algorithm and Implementation in FPGA Cui Zhe1, a, Chao Li2, b *, Jiaqi Meng3, c

INSTRUCTION MANUAL FOR THE INSTALLATION, USE AND MAINTENANCE OF THE REGULATOR GENIUS POWER COMBI

Color Monitor. L200p. English. User s Guide

Conettix D6600/D6100IPv6 Communications Receiver/Gateway Quick Start

Environmental Reviews. Cause-effect analysis for sustainable development policy

S Micro--Strip Tool in. S Combination Strip Tool ( ) S Cable Holder Assembly (Used only

FPGA Implementation of Cellular Automata Based Stream Cipher: YUGAM-128

Sealed Circular LC Connector System Plug

CASH TRANSFER PROGRAMS WITH INCOME MULTIPLIERS: PROCAMPO IN MEXICO

3 Part differentiation, 20 parameters, 3 histograms Up to patient results (including histograms) can be stored

Printer Specifications

User Manual. AV Router. High quality VGA RGBHV matrix that distributes signals directly. Controlled via computer.

Fast Intra-Prediction Mode Decision in H.264/AVC Based on Macroblock Properties

User Manual ANALOG/DIGITAL, POSTIONER RECEIVER WITH EMBEDDED VIACCESS AND COMMON INTERFACE

in Partial For the Degree of

Product Information. Universal swivel units SRU-plus

Social Interactions and Stigmatized Behavior: Donating Blood Plasma in Rural China

(12) Ulllted States Patent (10) Patent N0.: US 8,269,970 B2 P0lid0r et a]. (45) Date of Patent: Sep. 18, 2012

Turn it on. Your guide to getting the best out of BT Vision

Product Bulletin 40C 40C-10R 40C-20R 40C-114R. Product Description For Solvent, Eco-Solvent, UV and Latex Inkjet and Screen Printing 3-mil vinyl films

IN DESCRIBING the tape transport of

DT-500 OPERATION MANUAL MODE D'EMPLOI MANUAL DE MANEJO MANUAL DE OPERA(_._,O. H.-,lri-D PROJECTOR PROJECTEUR PROYECTOR PROJETOR

User guide. Receiver-In-Ear hearing aids. resound.com

Product Information. Miniature rotary unit ERD

GENERAL AGREEMENT ON MMra

INTERCOM SMART VIDEO DOORBELL. Installation & Configuration Guide

CONNECTIONS GUIDE. To Find Your Hook.up Turn To Page 1

User guide. Receiver-In-The-Ear hearing aids, rechargeable Hearing aid charger. resound.com

V (D) i (gm) Except for 56-7,63-8 Flute and Oboe are the same. Orchestration will only list Fl for space purposes

User guide. Receiver-In-The-Ear hearing aids, rechargeable Hearing aid charger. resound.com

SWS 160. Moment loading. Technical data. M x max Nm M y max Nm. M z max Nm

Tempo and Beat Analysis

CONNECTIONS GUIDE. To Find Your Hook.up Turn To Page 1

Emotional Metaphors for Emotion Recognition in Chinese Text

include a comment explaining the reason and the portions of the pending application that are being

zenith Installation and Operating Guide HodelNumber I Z42PQ20 [ PLASHATV

9! VERY LARGE IN THEIR CONCERNS. AND THEREFORE, UH, i

MC6845P I 1.5. ]Vs ,.~

Loewe bild 7.65 OLED. Set-up options. Loewe bild 7 cover Incl. Back cover. Loewe bild 7 cover kit Incl. Back cover and Speaker cover

Transcription:

SOG STRUCTURE IDETIFICATIO OF JAVAESE GAMELA MUSIC BASED O AALYSIS OF PERIODICITY DISTRIBUTIO D. P. WULADARI, Y. K. SUPRAPTO, 3 M. H. PUROMO,,3 Insttut Teknolog Sepuluh opember, Department of Electrcal Engneerng, Surabaya 60 Indonesa E-mal: dah@te.ts.ac.d, yoyonsuprapto@ee.ts.ac.d, 3 hery@ee.ts.ac.d ABSTRACT In a song played by multple nstruments, there s dstrbuton of perodctes that comes from dfferent playng patterns among groups of nstruments. We propose a vsualzaton of ths dstrbuton for analyzng song structure of Javanese gamelan musc. A predefned number of perodctes along wth ther confdence levels are obtaned usng comb flter resonator. The flter s appled to the auto-correlaton functon of overlappng analyss frames of the muscal track. We cluster the dstrbuton based on the proxmty of two parameters, whch are perodcty and confdence level. In ths way, we assume that each cluster center represents the perodcty of a group of nstruments. We observe four features of the vsualzaton, namely the wdth and the average heght of perodcty dstrbuton, the pattern of domnant perodctes, and the fluctuaton of the most domnant perodcty. Those features mplctly gve us nformaton regardng the strength appled to the notes, the estmated number of nstruments, and the accent of song accordng to those features, from whch we make an nference about the structure. We provde the experment wth a database of thrty Javanese gamelan songs and compare the analyss of lancaran, ladrang, and ketawang song structures. The results show that usng ths method, lancaran receved the hghest performance, whch s 0.94 F-measure, followed by ketawang and ladrang wth F- measure of 0.90 and 0.75 respectvely. Keywords: Comb Flter Resonator, Confdence Level, Perodcty Dstrbuton, Song Structure Analyss, Javanese Gamelan Musc. ITRODUCTIO Javanese gamelan musc s wdely used as accompanment of cultural events, lke weddng ceremony, art show, and of many relgous ones []. Each of the events has ts own story plot and therefore t requres an accompanment of dfferent sequence of song structures. For example, n a shadow puppet show, the orchestra starts playng smple yet dynamc structure of lancaran n the begnnng. Before the story goes to a new scene, the orchestra slows the tempo, softens the nstrument playng, and prepares to move nto more elaborate songs, lke those of ladrang or ketawang, gvng the audence an elegant mpresson. There are many knds of song structure n Javanese gamelan musc, whch are categorzed as gendhng alt, gendhng madya, and gendhng ageng []. Ths research focuses on three types of song structures whch fall nto gendhng alt category, namely lancaran, ladrang, and ketawang, snce these structures are among the most frequently used ones. Javanese gamelan musc dvdes a song nto several parts, called part A, part B, part C (f any), etc. In each part, every nstrument has dfferent playng pattern. We use the term pattern to represent a combnaton of notaton and perodctes arse between notes. There are nonstrct rules of determnng the structure of Javanese gamelan song based on the number of lnes contaned n each part and the presence of partcular nstruments sounds (lke kenong, kethuk, and kempul). Moreover, the players may repeat each part as many as they lke dependng on the stuaton (the flow of the story). But apart from these facts, Javanese gamelan experts are able to dfferentate the song structure by recognzng the pattern as well as by feelng the rhythm. In a Javanese gamelan ensemble there are several nstrument groups. For the reason of smplcty we would lke to menton three groups of nstruments as an example, whch are saron, pekng, and gong. Saron group for example, conssts of several saron nstruments. Saron s 39

usually played accordng to the notaton, whereas pekng s usually played by strkng each note n the notaton twce, and gong s stroke only at the end of each lne. Therefore these three groups may generate dfferent perodctes. We vsualze perodcty dstrbuton of a song and analyze the structure mpled by the pattern appears n the dstrbuton. The song s frstly dvded nto short overlappng frames. Each frame s consdered as the basc unt of perodcty analyss. Perodctes of a frame are enhanced by employng a comb flter resonator [4]. We cluster the perodctes based on two parameters, the perodcty tself and the correspondng confdence level. Each cluster conssts of neghborng perodctes whch have relatvely close confdence level. In ths way, we assume the centers as a representaton of perodcty of an nstrument group. Clusterng s carred out n order to overcome rrelevant varatons that may present n a song. Ths paper s organzed as follows, Secton I descrbes the background and outlne the contrbuton of the research, Secton II explans prevous works related to the topc of ths research, Secton III proposes a new method to analyze the structure of Javanese gamelan song, Secton IV shows the expermental evaluaton of the proposed method, Secton V presents the analyss of the expermental results, and Secton VI concludes the analyss.. RELATED WORKS Ths research s a contnuaton of the prevous ones regardng Javanese gamelan musc transcrpton. A number of methods were mplemented to transcrbe the notaton of Javanese gamelan musc, such as the use of flter to extract nstrument sound [5-7], and the use of onset detecton method to transcrbe saron notatons, [8-0]. Some other researches related to Javanese gamelan musc performed nstrument sound segmentaton [, ], nstrument tmbre analyss [3, 4], and beat trackng [5]. Our proposed method of song structure analyss supports nformaton retreval and recognton of Javanese gamelan musc whch s an applcaton of musc transcrpton [6]. Many applcatons of muscal sgnal processng are based on perodcty, such as n ptch trackng, beat trackng, tempo estmaton, and furthermore, n understandng rhythm. The algorthms developed for perodcty detecton are manly bult upon tme-doman perodcty and frequency-doman perodcty [6]. The majorty of the algorthms fall nto the frst approach, lke [7] and [8]. A research that studed perodcty based on spectral autocorrelaton was proposed by [9] whle that whch based on autocorrelaton of log spectrum was proposed by [0], where both appled to speech sgnals. In ths research, we adopt beat perod nducton method usng comb flter resonator [4]. But nstead of selectng the most confdent perodcty among hypotheses and consder t as the beat perod of an analyss frame, we vsualze the confdence levels of all perodctes n a track and make analyss about the song structure. In general, beat trackng algorthms conssts of two stages, whch are the generaton of drvng functon from drect processng of audo sgnals and the detecton of perodctes n these drvng functons to fnd tempo estmates []. For generatng drvng functons we have compared several reducton functons based on spectral features and we conclude that spectral flux functon s the best ft for our database [8]. Javanese gamelan nstruments are manly percussve, thus t yelds more dscrmnatve drvng functons compared to those resulted from wnd and bow nstruments, lke flute and voln. But on the other hand, defnng the structure of Javanese gamelan songs s qute trcky theoretcally. The song structure can be dstngushed by the number of lnes (where a lne conssts of four bars) n each song part, whle the song part can be dfferentated from each other by observng the playng pattern of the nstruments. Snce each nstrument group has ts own pattern, thus t delvers dfferent perodctes from the other group. Ths research attempts to address ths problem by representng the vsualzaton of perodcty dstrbuton along a muscal pece for rhythmc structure analyss. 3. METHOD There are three man stages conducted n ths research, as depcted n Fg.. We take a collecton of Javanese gamelan musc as audo nput and perform preprocessng stage. Ths stage ams to enhance relevant features whle t attenuates the rrelevant ones for the next stage. The second stage s beat perod nducton. We pass the autocorrelaton matrx of audo sgnal to a bank of comb flter resonators. These resonators serve as a bank of weghted perodcty templates, where the delays of delta functons represent perodctes that may be contaned n a muscal track. The output s a dstrbuton of perodctes based on ther confdence level. 40

Fgure System Overvew Fgure Preprocessng The thrd stage s the tempo clusterng and vsualzaton. We propose to explot perodcty dstrbuton to analyze the rhythmc pattern of a song. Each group of perodctes wth relatvely close dstance n the dstrbuton s assumed to belong to a certan nstrument group. The clusterng method s used to fnd the center of the group n order to cope wth varatons that exst wthn an nstrument group. 3. Preprocessng Detals on the preprocessng stage are shown n Fg.. The audo sgnals were recorded at 4400 Hz samplng frequency and are represented n tmefrequency doman usng Short-tme Fourer Transform (STFT). We mantan tme-frequency resoluton by applyng wndow length of 89 samples for the Fourer Transform and hop length of 44 samples, provdng 5.4 Hz frequency resoluton and 0 ms tme resoluton. For feature extracton, we use onset reducton functon that has been proven to be stable wth respect to frequency resoluton for Javanese gamelan musc [8], namely spectral flux (SF). Equaton and show the formulaton of SF functon. SF + ( n) = H( X(, n) X( ω, n ) ) ω= ω () x + x H( x) = () X s the magntude spectrum of the sgnal resulted from STFT; ω s the frequency bn and n s tme sample. H(x) s half wave rectfer. SF functon measures the change of magntude over tme for each frequency bn. Through the use of half wave rectfer, t defnes the detecton functon as the postve change of spectral flux across frequences. We consder the presence of tempo fluctuaton n Javanese gamelan musc. Thus the muscal track s dvded nto short analyss frames. And snce the analyss must accommodate tempo change that mght occur n a frame, the followng frame must overlaps the prevous one. The analyss frame must be long enough to be able to represent the longest beat perod n the track, whle the hop must be short enough to track tempo change. We refer to [4] to set the ncrement step (L f ) 5% of the frame length (L h ), provdng 75% overlap, as shown by Eq. 3. SF( n) n = + ( ) Lh... Lf + ( ) Lh F( n) = (3) 0 otherwse F (n) s the -th analyss frame. We apply the value of L h = 04 DS and L f = 56 DS to adapt wth Javanese gamelan musc. DS stands for detecton sample, and s a unt sample of the onset detecton functon. As we have mentoned n the prevous secton, each nstrument n Javanese gamelan ensemble has ts own tempo pattern. The nstrument whch has the longest duraton between two consecutve notes s gong, whose notatons appear at the end of a lne. Snce the objectve of ths research s to vsualze the tempo cluster dstrbuton whch comes from all nstruments playng n a song, then we set the length of the analyss frame to be able to present the longest beat perod of nstrument. A lne of notaton n Javanese gamelan musc conssts of sxteen beats, and based on our observaton, the duraton between two beats s approxmately 0.5 s. Thus the duraton of two gongs notes s at least 8 s. Snce the resoluton of DS n the analyss frame s 0 ms accordng to STFT settngs, then the requred length of the analyss frame s at least 800 DS. We choose the value of 04 DS for the reason of convenence. 4

Fgure 4 Perodcty Dstrbuton of a Sngle Frame Fgure 3 Perodcty Inducton From ths we also set the hop length at 56 DS to reflect the perod of a bar, whch s one fourth of the perod of a lne, snce a lne conssts of four bars. In order to remove nosy peaks that usually appear n the onset detecton sgnal, we apply a movng mean threshold and a half wave rectfer to the sgnal, as shown n eq. 4 and 5. Q n+ F ( n) = F( q) (4) Q q= n ~ F( n) = H( F( n) F( n)) (5) Q s the length of wndow whch s set to be 6 DS [4]. H(x) s half wave rectfer functon as mentoned n Eq.. Perodctes exst n the audo sgnal are then enhanced usng autocorrelaton functon descrbed n Eq. 6. Lf ~ F ~ ( n). F( n l) n= A( l) = l Lf l =,..., L f (6) 3.. Perodcty Inducton The autocorrelaton matrx resulted from prevous stage s consdered as the drvng functon where the perodcty analyss takes place. Fgure 3 shows the beat perod nducton based on [4]. The frst step s to create a comb template. Ths template serves as a reflecton of perodcty at several metrcal level and s represented by the sum of weghted delta functons at nteger multples of a perodcty, as shown by Eq. 7. Each comb template has a wdth proportonal to the perodcty and has a heght normalzed by ts wdth. We set the longest perodcty to be the same as the hop length of analyss frames, to derve at least one beat from each analyss frame. l ( l) = 4 p δ( τp + v) l p λ (7) p= v= p A Raylegh dstrbuton functon was used as a weghtng curve to approxmate pror dstrbuton of beat perod hypotheses. Ths functon has hgh ncrease for short lags whle t slowly decays for longer lags after the peak. Ths functon was selected snce t prefers shorter lags to be beat perods than the longer ones. τ τ β R ( τ) = e τ =,..., β L h (8) β s a parameter that sets the locaton of the peak. Daves et al used β = 43 to represent the common tempo of 0 bpm [4]. The product of comb template and Raylegh functon results n a shft nvarant comb flter bank, as shown n Eq. 9. Fnally, ths comb flter bank s used to generate beat perod dstrbuton by multplyng t wth the autocorrelaton functons of all analyss frames. Ths means that we take the dot product of the autocorrelaton functon of each analyss frame wth each beat perod hypotheses, as descrbed n eq. 0. The output matrx represents the beat perod dstrbuton of all analyss frames. C( l, τ ) = R( τ) λ ( l) (9) τ 4

clusterng algorthm s based on the mnmzaton of fuzzy c-means functonal as denoted n Eq.. c m J( Z; U, V) = ( µ ) z v () = k= k k B Fgure 5 Tempo Clusterng And Vsualzaton L f Y( ) = A( l) C( l, τ) τ (0) l= Fgure 4 depcts an example of beat perod dstrbuton of the frst analyss frame of a Javanese gamelan song n database. The man dfference between ths paper and that of Daves et al s on the pont of vew on ths beat perod dstrbuton. In ther paper, they chose the lag whch corresponds to the hghest confdence level as the beat perod of that partcular analyss frame, whch s around 40 DS n the example. They consder ths chosen lag as the beat perod whch s strongly contaned n the song, or n another words, t s contaned n all nstrument notatons. Whle n ths research, we use all beat perod dstrbutons nstead of choosng the most confdent one. We consder that each beat perod that has non-zero value of confdence level s contaned n the song frame and belongs to any of the nstrument notatons. The followng sub secton descrbes the utlzaton of beat perod dstrbuton for vsualzaton. 3. Clusterng and Vsualzaton Durng playng muscal nstruments or sngng a song, humans are naturally unable to follow the exact tempo repettvely. It s lke when we record a person s speech of the same utterance for several tmes, and we compare all the results, then we wll end up by havng many varatons n the sgnals. There wll always be tempo bas when humans play musc as well. Some neghborng beat perods may represent varatons to the reference value. Therefore we propose to use clusterng algorthm to overcome ths problem. We use fuzzy clusterng algorthm whch s capable of defnng membershp functon for all data to each cluster based on C- Mean objectve functon []. The flow of ths stage s depcted n Fg. 5. Fuzzy C-Means (FCM) Z s data, B s fuzzy subset, U s fuzzy partton matrx whch contans values of the -th membershp functon of B of Z. c s the number of cluster, and s the number of data. V s a vector of cluster centers that s to be determned,. n [ v, v,..., ] R V = v c, v () Followng fuzzy partton rules, we obtan condtons n Eq. 3-4. c = U c = B Z = (3) µ [ 0,], µ =, 0 < µ < (4) k k k= Where < < c and < k <. The mnmzaton s based on the squared nner product dstance norm as shown n Eq. 5. D kb k = z v (5) The mnmum pont of fuzzy objectve functon can be obtaned by takng the frst dervatve of Eq. 6 by settng the dervatve wth respect to U, V, λ to zero. J( Z; U, V, λ) = c = k= µ m kdkb + B c λk k= = k µ k That s when these followng condtons are met. k = c = D D kb jkb ( m ) (6) µ (7) 43

v k= = µ z k= m k µ m k k (8) The cluster center v s determned by takng a weghted mean of data that belongs to cluster, usng the membershp degrees of data to the cluster as the weghts. The membershp degree may vary from zero to one, provdng a soft clusterng, where each datum may have probablty to belong to more than one cluster. The algorthm was used to cluster the confdence level of beat perod hypotheses. The clusterng s performed n two dmenson space, where the frst axs s the beat perod and the second axs s the confdence level. We assume that there s no outler snce the data are of tme seres. (a) Sde vew of perodcty dstrbuton 4. EXPERIMETS Dataset contanng thrty Javanese gamelan songs s provded for experments. It conssts of three song structures categorzed as gendhng alt n Javanese gamelan musc, whch are lancaran, ladrang, and ketawang. The songs contan multple nstrument sounds, ncludng snger voces. There are three experments carred out n ths research. The frst experment amed to compare two settngs of analyss frame length and hop length. Fgure 6 shows the results. The second experment nvestgates the sutable number of perodcty cluster to represent nstrument groups, and t s depcted n Fg. 7. Whle Fg. 8 0 show the results of the thrd experment, whch s the analyss of song structure of Javanese gamelan musc contaned n dataset. We conducted frst experment wth two dfferent parameter settngs, whch are analyss frame length and hop length. Frst, we adopted the settng of [4], usng 5 DS analyss frame and 8 DS hop length. We compare the results wth those of our settng whch s based on the characterstc of Javanese gamelan musc. We have mentoned n prevous secton, that n order to adjust the frame length to the beat perod of gong, we need to set t to the length of a lne n Javanese gamelan musc, that s 04 DS approxmately. Snce a lne n Javanese gamelan musc conssts of four bars, we also set the beat perod hypotheses to be up to 56 DS. (b) Topvew of perodcty dstrbuton usng 5/8settng, showng a dstrbuton clp over the longer perodctes (c) Top vew of perodcty dstrbuton usng 04/56 settng, showng unclpped dstrbuton Fgure 6 Comparson of 5/8 Settng and 04/56 Settng 44

(a) Vsualzaton wth 0 Perodcty Clusters (b) Vsualzaton wth 30 Perodcty Clusters Fgure 7 Comparsons on the umber of Clusters And n order to derve at least one beat per frame, the number of beat perod hypotheses must be equal wth the hop length. The upper graph n Fg 6 represents perodcty dstrbuton usng 5/8 settng whle the lower one represents perodcty dstrbuton usng 04/56 settng. Usng the frst settng, the long perodctes (the upper part of dstrbuton) seems to get clpped. Whle usng the second settng, the dstrbuton shows all perodctes contaned n each frame and t s ndcated by the zero value of confdence levels n the upper part of the graph. Fgure 6(a) shows vsualzaton of perodcty dstrbuton along the track observed from sde vew. Fgure 6(b) and fg. 6(c) observe perodcty dstrbuton of the song from top vew n order to compare the dstrbuton usng general settng (5/8) and customzed settng (04/56). The lower sde of the graph represents faster perodcty whle the upper sde represents longer perodcty. Usng the frst settng, we obtan larger number of analyss frames and for the reason of clarty; we present the frst half of the frames n the graph. From both graphs we conclude that the general settng s not sutable for Javanese gamelan musc snce t caused dstrbuton clp that does not allow perodctes larger than 8 DS to appear, whch actually present n Javanese gamelan musc. Therefore we used the customzed settng to vsualze perodcty dstrbutons n the followng experments. The clp no longer appears when we enlarged the frame length and hop length accordng to our prevous calculaton. The clusterng process s conducted to address too many perodcty varatons caused by humans whle playng the nstruments. But on the other sde, the less number of clusters we determne, the more nformaton loss we get. The second reason of clusterng s to represent the perodcty of an nstrument group through the center of each cluster. Upon decdng the optmum number of perodcty cluster, we consder the number of nstrument groups that may present n an orchestra. In total, there are about ten nstrument groups n a complete gamelan set [3]. Each group can be dvded nto two or three small groups, so a complete set may have almost thrty nstrument groups. Fgure 7 depcts a comparson between vsualzaton wth 0 clusters and that wth 30 clusters. Fgure 7(a) represents perodcty dstrbuton wth more nformaton loss compared to that of Fg. 7(b). Ths can be seen from the pattern of domnant perodctes whch are marked by orange, grey and yellow colors. Fgure 7 has detaled shape of these colors whch may help analyzng the pattern of perodcty n a song. Therefore, we chose to set the number of cluster to be thrty, so that each cluster may conssts of 8-9 perodcty varatons from the total of 56 perodcty hypotheses. The thrd experment was carred out on a dataset of thrty Javanese gamelan songs, that conssts of three types of song structure, whch are lancaran, ladrang, and ketawang. Fgure 8-0 show representatves of each song structure n dataset. There are four features of nterest that we would present from the vsualzaton, whch are the wdth of dstrbuton, the average heght of dstrbuton, the pattern of domnant perodctes, and the value of tempo (perodcty wth hghest level of confdence) and ts fluctuaton along the track. The followng secton explans each of the features and how t can be used to analyze the structure of a Javanese gamelan song, as well as dscusses the analyss of the experment results. 5. DISCUSSIO Each of Javanese gamelan song structure has typcal characterstc that can be seen from the number of kethuk, kenong, and kempul notes and 45

ther postons n a notaton [, 3]. Unfortunately, detectng the sounds of these nstruments s very dffcult due to low sgnal ampltude compared to those of other nstruments. evertheless, gamelan experts and practtoners are able to recognze the song structure and dstngush t from the other ones by lstenng to the song, wthout havng the notaton. Therefore we propose to present a vsualzaton of perodcty dstrbuton and to explot some of ts features as a mean of analyss of Javanese gamelan song structures. For analyss purpose, we tested the normalty of the vsualzaton of perodcty dstrbutons of all songs n our database. Based on the central lmt theorem, as the number of sample drawn from a populaton s gettng large, whle the varance of the sample dstrbuton s fnte, the dstrbuton of the average of the random sample wll be approachng normal [4]. We used Lllefors for normalty test, whch s sutable for condton where the parameters of hypotheszed dstrbuton are not completely known [5]. The null hypothess s that the perodcty dstrbuton of all frames s a normal dstrbuton. The result s a logcal value h, whch can be 0 that accept the null hypothess, or t can be that rejects the null hypothess, at 5% sgnfcance level. The results proof that all songs have non normal dstrbuton, as ndcated by four varables, test result, h = ; p-value, p = 0; and the value of test statstcs s greater than that of crtcal value. There s hgh non lnearty n the sgnals whch makes the analyss much more complex and therefore we propose a vsualzaton approach for song structure analyss of Javanese gamelan musc. Analyzng the song structure of Javanese gamelan musc based on perodcty dstrbuton, brngs us back to the above-mentoned features that present n the vsualzaton. The frst feature s the wdth of dstrbuton. The term wdth refers to the perodcty clusters whch have sgnfcant value. We take an average value of confdence level for each cluster along the track. The sgnfcance of the value of each cluster s determned by a threshold. The clusters whose values are below the threshold are consdered as nsgnfcant. The wdth of a dstrbuton s the number of sgnfcant clusters. The threshold s calculated by followng Eq. 9. (a) Perodcty Dstrbuton of Kebo Gro (b) Fluctuaton of Tempo (c) Fluctuaton of Tempo Loudness Fgure 8 Representaton of Lancaran Song Structure 46

T S α = Y( t, s) TS φ (9) t= s= Y (t,s) s the perodcty dstrbuton, where t s perodcty cluster, s s analyss frame, 0 < α <. Wder dstrbuton means the song contans more varous perodcty contents. Snce dfferent nstrument generates dfferent perodcty, wder dstrbuton represents more number of nstruments than that of the narrower one. Songs whch fall nto lancaran structure have vbrant and flowng rhythm. Ths dynamc mpresson s bult by the nvolvement of many nstruments and by the relatvely fast and flat tempo. That s why songs of ths structure are usually put at the openng of events or ceremones. By fgurng out the wdth of perodcty dstrbuton, we may expect to whch structure the song has a closer relaton wth. The second feature s the average heght of perodcty dstrbuton. Ths feature depends on the magntude of tme-frequency representaton of audo sgnal. It s affected by the number of nstruments and the playng style, both of whch contrbute to the loudness of sound. Louder sound may mply hgher passon and therefore songs wth louder sound may relate to lancaran structure. The heght of confdence level n the vsualzaton s shown n the legend of each graph. Domnant perodctes are marked by dfferent colors n the vsualzaton, whch are orange, grey and yellow colors. We categorze the pattern of domnant perodctes nto sparse or dense, and short or long. Please note that ths s the only feature that s not numercally quantzed, but s vsually perceved. The last feature s based on the value of tempo and ts fluctuaton along the track. We wll present two graphs for ths feature, one that shows the fluctuaton of tempo value, and one that shows the fluctuaton of tempo loudness that s represented by the confdence level. From these two graphs, we analyze the change of the most confdent perodcty cluster that may ndcate song part transton and furthermore, may mply the rhythmc pattern of the song. We present n Fg. 8 - Fg. 0 three perodcty dstrbutons of Javanese gamelan songs, each of whch represents lancaran, ladrang, and ketawang song structures respectvely. Each perodcty dstrbuton s supported by two charts showng the fluctuaton of the most domnant perodcty (also known as tempo), and the fluctuaton of tempo loudness along the track. In (a) Perodcty Dstrbuton of Kutut Manggung (b) Fluctuaton of Tempo (c) Fluctuaton of Tempo Loudness Fgure 9 Representaton of Ladrang Song Structure 47

each chart we compare the fluctuaton wth a regresson lne to show the trend and we also put the equaton of the lne. From all analyss frames n a song, we only present one hundred consecutve analyss frames whose perodcty dstrbuton represents the rhythmc structure of the song for the reason of clarty. Fgure 8 depcts an example of lancaran song structure, whch s Kebo Gro. Based on calculaton usng Eq., we obtaned dstrbuton wdth of 3 clusters. Whle the other lancaran songs n dataset may have dstrbuton wdth n the range of -4 clusters. The average heght of dstrbuton s 4, whle the maxmum heght s almost 000 (Fg. 8(c)). Domnant perodctes are dense n the frst twenty analyss frames, but they are gettng sparser afterwards, as shown n Fg. 8(a). By observng Fg. 8(b) we could notce that the tempo fluctuates from 4 to but t mantans constant trend (ncreasng wth small gradent of 0.004), whle the loudness of tempo s decreasng (Fg. 8(c)). We could conclude that the playng style was strong at the begnnng but then t was gettng softer to the end, producng more sparse dstrbuton and lower confdence level of tempo. However, ths condton does not ndcate song part transston and t s supported by the fact that Kebo Gro s n the type of lancaran nban. Lancaran nban n ths case, mantans the rhythmc pattern untl the end of the song snce t usually conssts of one song part (part A) whch s played repettvely. Ladrang song structure has relatvely narrower perodcty dstrbuton n each analyss frame, as shown n Fg. 9 wth dstrbuton wdth of clusters. Fgure 9(a) s a vsualzaton of perodcty dstrbuton of Kutut Manggung, whose average heght of dstrbuton s 69.6. We could see from the dstrbuton that the values of wdth and average heght of ladrang example are less than those of lancaran example. We may conclude from these facts that the number of nstruments played n the frst structure s less than that n the second structure. Ths supports softer mpresson that arses from ladrang songs generally [3]. Domnant perodctes are dense but are not contnuous along the song. The dscontnuty of the pattern ndcates song part transtons, whle the densty of domnant perodctes ndcates the densty of notaton pattern, where note appears at almost every beat n the song []. (a) Perodcty Dstrbuton of Ibu Pertw (b) Fluctuaton of Tempo (c) Fluctuaton of Tempo Loudness Fgure 0 Representaton of Ketawang Song Structure 48

Song structure Table Performance Measure TP FP F P R F- measure Lancaran 8 0 0.89 0.94 Ladrang 6 3 0.86 0.67 0.75 Ketawang 9 0 0.8 0.80 Song transtons are also depcted by the drops of tempo loudness n Fg. 9(c). The trend of tempo fluctuaton decreases as shown by negatve gradent of lnear equaton n Fg. 9(b). Ths means that tempo gets faster to the end (shorter perodcty mples faster tempo). An example of ketawang song structure s represented n Fg. 0, whch s Ibu Pertw. The perodcty dstrbuton s 3 cluster wdth and has an average heght of 94.7. Compared to the abovementoned structures, ketawang has relatvely smlar varaton of perodcty contents as lancaran, whch s mpled by the same dstrbuton wdth. But ketawang has softer playng style than that of lancaran, as shown by smaller value of average heght of dstrbuton. Vsually, we can observe from Fg. 0(a) that domnant perodoctes (marked by orange color) are dense durng the frst 50 frames, but they become sparse durng the rest frames. Ths change of pattern ndcates song part transtons, although the appearance s not as clear as that of ladrang. The trend lne of tempo fluctuaton s decreasng, as depcted by Fg. 0(b), showng that tempo gets faster to the end of the song. Whle the loudness of tempo shows a constant trend (Fg. 0(c)), wth two bg peaks ndcatng song part transtons. Performance s measured for each type of song structure. The evaluaton s based on the value of precson (P) and recall (R) whch s called F measure, as explaned by eq. - 4. TP P = (0) TP + FP TP R = () TP + F PR F measure = () P + R TP or true postve represents correctly dentfed song structure, FP or false postve represents ncorrectly dentfed song structure, and F or true negatve represents ncorrectly rejected song structure. Thus, precson represents how good the dentfcaton s. And recall represents how good s the features of perodcty dstrbuton. Table shows the overall results. The best performance s obtaned by lancaran wth F measure of 0.94. It s followed by ketawang and ladrang wth F measure of 0.90 and 0.75 respectvely. The hghest precson value s obtaned by lancaran, whle the hghest recall value s obtaned by ketawang. From these results we may conclude that ths method s most confdent to dentfy lancaran song structure, but on the other sde ths method s most senstve to dentfy ketawang song structure. These results are supported by the fact that the features of perodcty dstrbuton belong to lancaran are more dscrmnatve than those of other song structures. 6. COCLUSIOS We have presented a method of analyzng song structure of Javanese gamelan musc based on vsualzaton of perodcty dstrbuton. The exstence of non-lnearty n the audo sgnals has been proofed usng Lllefors test of normalty. Therefore we propose to utlze some vsual features n tempo dstrbuton to make nferences regardng the song structure of Javanese gamelan musc. We conclude that there are features of perodcty dstrbuton from whch we may use to dstngush the structure of a Javanese gamelan song. Although the analyss n ths research s manly qualtatve and vsually perceved, these results are useful for our future research. We may mplement a classfcaton of Javanese gamelan song structure based on these features, usng one of machne learnng technques. We also note that n order to obtan convergent results, we need to mprove our data selecton, for example the qualty of recordng. We found that there are data whch are qute nosy, and the analyss on the perodcty dstrbutons of these data s msleadng. The other mportant rule s to make sure that the songs are of classcal type. Ths wll guarantee that we obtan clear and dscrmnatve rhythmc patterns. REFERECES: [] Wardono, Soewondo, Teor Karawtan Jawa. Madun, Indonesa: Warga (984). [] Palgunad, B. Karawtan Jaw, Bandung, Indonesa: Penerbt ITB (00). [3] Supanggah, R. Bothekan Karawtan II: Garap, Solo, Indonesa: ISI Press Surakarta (009). [4] Daves, M. E. P., Plumbley, M. D., Contextdependent beat trackng of muscal audo, IEEE Transactons on Audo, Speech, and Language Processng, 5, 009-00 (007). 49

[5] Suprapto, Y. K., Wulandar, D. P., and Tjahyanto, A., Saron musc transcrpton usng LPF-cross correlaton, Journal of Theoretcal and Appled Informaton Technology, 3, 7-79 (0). [6] Suprapto, Y. K., Harad, M., and Purnomo, M. H., Tradtonal Musc Sound Extracton Based on Spectral Densty Model usng Adaptve Cross-correlaton for Automatc Transcrpton, IAEG Internatonal Journal of Computer Scence, 38, -8 (0). [7] Suprapto, Y. K. Spectral Densty Based on Phase Shftng for Musc otaton. Jurnal Ilmah Kursor, 6, 39-46 (0). [8] Wulandar, D. P., Tjahyanto, A., and Suprapto, Y. K., Gamelan Musc Onset Detecton based on Spectral Features. Telkomnka,, 078 (03). [9] Wulandar, D.P., Suprapto, Y.K., and Purnomo, M.H., Gamelan musc onset detecton usng Elman etwork, IEEE Internatonal Conference on Computatonal Intellgence for Measurement Systems and Applcatons (CIMSA), 996 (0). [0] Wulandar, D. P., Suprapto, Y. K., and Tjahyanto, A., Saron transcrpton based on tme-frequency analyss of onset detecton usng Short-tme Fourer Transform, Proceedngs of Internatonal Conference and Workshop on Basc and Appled Scences (ICOWOBAS), (0). [] Suprapto, Y. K., Purnomo, M. H., and Harad, M., Segmentaton of Identcal and Smultaneously Played Tradtonal Musc Instruments usng Adaptve, IPTEK The Journal of Technology and Scence, 0, 889 (009). [] Wntart, A., Suprapto, Y. K., and Wrawan, Independence test of gamelan nstrument sgnal n tme doman and frequency doman. Jurnal Ilmah Kursor, 7, 47-54 (03). [3] Tjahyanto, A., Suprapto, Y. K., and Wulandar, D. P., Spectral-based Features Rankng for Gamelan Instruments Identfcaton usng Flter Technques. Telkomnka,, 95-06 (03). [4] Tjahyanto, A., Suprapto, Y. K., Purnomo, M. H., and Wulandar, D. P., FFT-based features selecton for Javanese musc note and nstrument dentfcaton usng support vector machnes, IEEE Internatonal Conference on Computer Scence and Automaton Engneerng (CSAE), 439-443 (0). [5] Wulandar, D. P., Tjahyanto, A., Suprapto, Y. K., Sudarma, M., Beat-trackng of Javanese Gamelan Muscal Audo based on Comb Flter Resonator, Semnar on Intellgent Technology and Its Applcatons (SITIA), (04). [6] Klapur, A., Davy, M., Sgnal Processng Methods for Musc Transcrpton, Sprnger, (006). [7] Talkn, D., A robust algorthm for ptch trackng, Speech Codng and Synthess, Amsterdam: Elsever Academc Press, 495-57 (995). [8] de Chevegn e, A., Kawahara, H., YI, a fundamental frequency estmator for speech and musc, Journal of the Acoustcal Socety of Amerca,, 97-930 (00). [9] Lahat, M., ederjohn, R., and Krubsack, D., A spectral autocorrelaton method for measurement of the fundamental frequency of nose-corrupted speech. IEEE Transactons on Acoustcs, Speech and Sgnal Processng, 35, 74-750 (987). [0] Kuneda,., Shmamura, T., and Suzuk, J., Robust method of measurement of fundamental frequency by ACLOS: autocorrelaton of log spectrum, IEEE Internatonal Conference on Acoustcs, Speech, and Sgnal Processng,, 3-35 (996). [] McKnney, M. F., Moelants, D., Daves, M. E. P., Klapur, A., Evaluaton of audo beat trackng and musc tempo extracton algorthms. Journal of ew Musc Research, 36, -6 (007). [] Bezdek, J. C., Ehrlch, R., Full, W., FCM: The fuzzy c-means clusterng algorthm. Computers & Geoscences, 0, 9-03 (984). [3] Sumarsam. Gamelan: cultural nteracton and muscal development n central Java, Chcago, Unted States: Unversty of Chcago Press, (995). 50

[4] DeGroot, M.H., Schervsh, M.J., Probablty and statstcs, Boston: Pearson Educaton Inc., 893 (0). [5] Razal,. M., Wah, Y. B., and Scences, M., Power comparsons of Shapro-Wlk, Kolmogorov-Smrnov, Lllefors and Anderson-Darlng tests, Journal of Statstcal Modelng and Analytcs,, 33 (0). 5