UNITED STATES PATENT AND TRADEMARK OFFICE BEFORE THE PATENT TRIAL AND APPEAL BOARD

UNITED STATES PATENT AND TRADEMARK OFFICE BEFORE THE PATENT TRIAL AND APPEAL BOARD HARMONIX MUSIC SYSTEMS, INC. and KONAMI DIGITAL ENTERTAINMENT INC., Petitioners v. PRINCETON DIGITAL IMAGE CORPORATION, Patent Owner Case No. tbd Patent No. 5,513,129 PETITION FOR INTER PARTES REVIEW OF U.S. PATENT NO. 5,513,129

Table of Contents I. INTRODUCTION... 1 II. REQUIREMENTS FOR INTER PARTES REVIEW UNDER 37 C.F.R. 42.104... 1 A. GROUNDS FOR STANDING UNDER 37 C.F.R. 42.104(A)... 1 B. IDENTIFICATION OF CHALLENGE UNDER 37 C.F.R. 42.104(B) AND RELIEF REQUESTED 1 1. The Grounds For Challenge... 1 2. Level of a Person Having Ordinary Skill in the Art... 3 3. Claim Construction Under 37 C.F.R. 42.104(b)(3)... 3 III. OVERVIEW OF THE 129 PATENT... 11 A. DESCRIPTION OF THE ALLEGED INVENTION OF THE 129 PATENT... 11 B. SUMMARY OF THE PROSECUTION HISTORY OF THE 129 PATENT... 11 IV. THERE IS A REASONABLE LIKELIHOOD THAT CLAIMS 1-23 ARE UNPATENTABLE... 12 A. TSUMURA ANTICIPATES CLAIMS 10 AND 11 UNDER 35 U.S.C. 102(A)... 13 B. LYTLE ANTICIPATES CLAIMS 5-7, 9-12, 16-20, AND 22-23 UNDER 35 U.S.C. 102(B).. 16 C. ADACHI ANTICIPATES CLAIMS 1, 12, 13, 15, AND 21 UNDER 35 U.S.C. 102(B)... 27 D. LYTLE IN VIEW OF ADACHI RENDERS CLAIMS 1, 8, 12, 13, 15, AND 21 OBVIOUS UNDER 35 U.S.C. 103(A)... 32 E. THALMANN IN VIEW OF WILLIAMS RENDERS CLAIMS 1-4, 12, 13, 15, AND 21 OBVIOUS UNDER 35 U.S.C. 103(A)... 36 F. ADACHI IN VIEW OF TSUMURA RENDERS CLAIMS 5-7, 14, AND 16-20 OBVIOUS UNDER 35 U.S.C. 103... 46 V. MANDATORY NOTICES UNDER 42.8... 55 A. REAL PARTIES-IN-INTEREST AND RELATED MATTERS... 55 B. LEAD AND BACK-UP COUNSEL UNDER 37 C.F.R. 42.8(B)(3) & (4)... 56 C. PAYMENT OF FEES... 56 VI. CONCLUSION... 57 1

I. INTRODUCTION Petitioners Harmonix Music Systems, Inc. and Konami Digital Entertainment Inc. (collectively Petitioners ) request an Inter Partes Review ( IPR ) of claims 1-23 (collectively, the Challenged Claims ) of U.S. Patent No. 5,513,129 ( the 129 Patent ) issued on April 30, 1996 to Mark Bolas, et al. ( Applicants ). Exhibit 1001, 129 Patent. II. REQUIREMENTS FOR INTER PARTES REVIEW UNDER 37 C.F.R. 42.104 A. Grounds For Standing Under 37 C.F.R. 42.104(a) Petitioners certify that the 129 Patent is available for IPR and that Petitioners are not barred or estopped from requesting IPR challenging the claims of the 129 Patent. The timing for this Petition also is proper under 37 C.F.R. 42.122(b). On October 17, 2014, the Board instituted an inter partes review trial on the 129 Patent [Ex. 2013] on a timely first petition filed by Ubisoft (Case No. IPR2014-00635), and (2) Harmonix and Konami accompany this second petition with a motion for joinder under 37 C.F.R. 42.22. B. Identification Of Challenge Under 37 C.F.R. 42.104(b) And Relief Requested In view of the prior art, evidence, and claim charts, claims 1-23 of the 129 Patent are unpatentable and should be cancelled. 37 C.F.R. 42.104(b)(1). 1. The Grounds For Challenge Based on the prior art references identified below, IPR of the Challenged Claims should be granted. 37 C.F.R. 42.104(b)(2). The Board has already granted review of the Challenged Claims on the following grounds in IPR 2014-00635: 1

Claims 10 and 11 are anticipated under 35 U.S.C. 102(b) by U.S. Patent No. 5,208,413 to Tsumura, et al. ( Tsumura ) [Ex. 1002]. Claims 5-7, 9-12, 16-18, and 22-23 are anticipated under 35 U.S.C. 102(b) by Driving Computer Graphics Animation From a Musical Score by Lytle ( Lytle ) [Ex. 1003]. Claims 1, 12, 13, 15, and 21 are anticipated under 35 U.S.C. 102(b) by U.S. Patent No. 5,048,390 to Adachi, et al. ( Adachi ) [Ex. 1004]. Claims 1, 8, 12, 13, 15, and 21 are obvious under 35 U.S.C. 103(a) over Lytle [Ex. 1003] in view of Adachi [Ex. 1004]. Claims 1-4, 12, 13, 15, and 21 are obvious under 35 U.S.C. 103(a) over Using Virtual Reality Techniques in the Animation Process by Thalmann ( Thalmann ) [Ex. 1006] in view of Williams [Ex. 1005]. This Petition seeks review of the Challenged Claims on the grounds listed above as well as the additional grounds listed below: Claims 5-7, 14, and 16-20 are obvious under 35 U.S.C. 103(a) over Adachi [Ex. 1004] in view of Tsumura [Ex. 1002]. Claims 19-20 are anticipated under 35 U.S.C. 102(b) by Driving Computer Graphics Animation From a Musical Score by Lytle ( Lytle ) [Ex. 1003]. Section IV identifies where each element of the Challenged Claims is found in the prior art patents. 37 C.F.R. 42.104(b)(4). The exhibit numbers of the supporting evidence 2

relied upon to support the challenges are provided above and the relevance of the evidence to the challenges raised are provided in Section IV. 37 C.F.R. 42.104(b)(5). Exhibits 1001 1014 are also submitted herewith. 2. Level of a Person Having Ordinary Skill in the Art A person of ordinary skill in the field of audio-controlled virtual objects in 1993 would have a B.S. in electrical engineering, computer engineering, computer science or related engineering discipline and at least two years of experience in practical or postgraduate work in the area of computer-generated animations and/or graphics or equivalent experience or education. The person would also have some knowledge of media processing and digital audio programming. Ex. 1007, Pope Decl., at 19-20. 3. Claim Construction Under 37 C.F.R. 42.104(b)(3) The 129 Patent expired on July 14, 2013 and are therefore not subject to amendment. For purposes of this Petition, the claims are construed pursuant to Phillips v. AWH Corp., 415 F.3d 1303, 1327 (Fed. Cir. 2005) (words of a claim are generally given their ordinary and customary meaning as understood by a person of ordinary skill in the art in question at the time of the invention). In IPR2014-00635, the Board provided constructions for the terms set forth below. For the purposes of this IPR only, Petitioners 3

use the terms below in accordance with the Board s constructions. 1 (a) Non-Means-Plus Function Terms i) Virtual Environment (Claims 1-9 and 12-21) For the purposes of this Petition, the term virtual environment means a computer-simulated environment (intended to be immersive) which includes a graphic display (from a user s first person perspective, in a form intended to be immersive to the user), and optionally also sounds which simulate environmental sounds. Ex. 1001, 129 Patent at 1:22-28; Ex. 1012, pp. 8-9. ii) Virtual Reality Computer System (Claims 1-9 and 12-21) For purposes of this Petition, the term virtual reality computer system means a computer system programmed with software, and including peripheral devices, for producing a virtual environment. Ex. 1001, 129 Patent at 1:22-33; Ex. 1012, p.9. (b) Means-Plus-Function Terms Claims 12-20 and 22-23 include recitations in means-plus-function form. Petitioners propose the following constructions in accordance with 35 U.S.C. 112, 6 (now 35 U.S.C. 112(f)): i) means for supplying a first signal selected from a group consisting of (Claim 12) 1 This should not be viewed as a concession by Petitioners as to the proper scope of any claim term in any litigation. Petitioners do not waive any argument in any litigation that claim terms in the 129 Patent are indefinite or otherwise invalid. 4

This limitation is written as a Markush group. As such, the limitation is in alternative form and the stated function is supplying a first signal where the first signal is: 1) a control signal having music and/or control information generated in response to a music signal ; 2) a prerecorded control track having music and/or control information corresponding to the music signal ; or 3) a control signal having music and/or control information generated in response to the prerecorded control track. Petitioners note that the second and third elements of the Markush group include recitations that lack antecedent basis (i.e., the music signal and the prerecorded control track ). Because challenges under 35 U.S.C. 112 are not available in IPR petitions, Petitioners contend that the only structure arguably suggested to perform the function is Acoustic Etch unit 3, which includes a general purpose processor (also referred to as an analyzer) and/or a music source. Ex. 1001 at Figs. 1, 2, & 4, 8:38-56, 9:56-10:2. The functions of the Acoustic Etch unit may be embodied in a physically distinct object (id. at Fig. 1, 7:65-8:2, Fig. 4, 9:56-58, 7:11-20), embodied in the VR processor (id. at 8:44-51), or incorporated as part of another device, e.g., the player of the input music signal or the prerecorded control tracks or the VR system (id. at 11:62-65). The music source may be, for example, a tape player (id. at 10:56-57, Fig. 6 (item 200)), CD player (id. at 8:65, 11:65-12:2), microphone (id. at 17:11), or any form of an audio source (id. at 20:26-28). Petitioners note that the specification does not make any meaningful difference between the alternatively claimed elements of the three Markush Group members, but apply the following constructions for purposes of this Petition. 5

Where the first signal is a control signal having [] control information generated in response to a music signal, the Acoustic Etch processor is programmed with software for implementing algorithms such as those related to simple filtering and analysis or of the same type used by well known graphic equalizers to process a music signal to produce control information and pass it on to a VR computer. Id. at 5:1-7, 11:28-39, 4:29-33, Fig. 4. Where the first signal is a prerecorded control track having [] control information corresponding to the music signal, the processor is programmed with software to implement the extraction of the control information from a control track. Id. at 11:1-4; also 8:55-57, Figs. 1, 2, and 4. For example, the processor may convert digital data into a serial data stream such as RS232 or MIDI data stream. Id. at 16:62-65, Fig. 6 (item 240), 13:65-14:9, 13:21-23. ii) means for receiving the first signal and influencing action within a virtual environment (Claim 12) The stated function is receiving the first signal and influencing action within a virtual environment. The disclosed structure for performing the function is a general purpose computer (i.e., VR Processor 7 and VR System 250). Ex. 1001 at Figs. 1, 2, 6, Fig. 16, 7:67-8:7, 8:18-21, 13:60-14:25, 17:13-26, 17:37-49. Where the first signal is a control signal having [] control information generated in response to a music signal, the 129 Patent provides minimal disclosure regarding the specific algorithm for influencing action within a virtual environment. At a minimum, the processor is programmed with software to populat[e] the virtual environment with 6

animated virtual objects which move in response to the music. Id. at 4:29-33; see also id. at 11:36-43 ( [T]he beat of the music is passed on to the VR system which can then perform operations such as displaying virtual hands clapping in time to the beat of the music.... As the music rises and falls in overall level, the VR processor could create and destroy virtual objects. ), 12:17-24, 7:67-8:7, 6:12-23. Where the first signal is a prerecorded control track having [] control information corresponding to the music signal, the processor is programmed to 1) read control track data; 2) read any digitized music information which corresponds to the control track data and/or the output of any input devices that are connected to the VR system such as instrument gloves, six-degree-of-freedom trackers, custom human input devices, mice, and the like; and 3) create, destroy, move or modify the virtual environment or virtual objects therein as described at 18:3-14. Id. at 17:13-18:56, Fig. 16. Where the first signal is a control signal having music alone or a prerecorded control track having music alone, there is no structure disclosed in the 129 Patent to perform the function of influencing action within a virtual environment. Since challenges under 35 U.S.C. 112 are not available in IPR petitions, Petitioners demonstrate that the claimed function is unpatentable below. iii) means for receiving said music signal in digital or analog form, and processing said music signal to produce control information for modification of objects in the virtual environment (Claim 13) The stated function is receiving said music signal in digital or analog form, and processing said music signal to produce control information for modification of objects in 7

the virtual environment. As written, it is unclear to which antecedent music signal the claim 13 said music signal refers. In particular, because claim 12, from which claim 13 directly depends, is written in Markush form, if the first signal is a control signal having music and/or control information generated in response to the prerecorded control track, the limitation of claim 15 lacks antecedent basis. Since challenges under 35 U.S.C. 112 are not available in IPR petitions, Petitioners identify the disclosed structure for performing the function as an A-to-D converter (if necessary) and a processor of the Acoustic Etch unit depicted in Figure 4, where the processor is programmed with software for implementing algorithms such as those related to simple filtering and analysis or of the same type used by well known graphic equalizers to process a music signal to produce control information. Id. at 5:1-7, 11:28-39, Fig. 4. iv) music playing means for supplying said music signal (Claim 15) The stated function is supplying said music signal. As discussed above, it is unclear to which antecedent music signal the claim 15 said music signal refers. Since challenges under 35 U.S.C. 112 are not available in IPR petitions, Petitioners identify the disclosed structure for performing the function as a music source. Id. at Figs. 1 and 2 (item 1). Music source may be a tape player (id. at 10:56-57, Fig. 6), CD player (id. at 8:65, 11:65-12:2), microphone (id. at 17:11), or any form of an audio source (id. at 20:26-28). v) means for prerecording a control track having music and/or control information corresponding to a music signal (Claim 16) means for prerecording a control track having audio and/or control information corresponding to an audio signal (Claim 22) 8

The stated function is prerecording a control track having music/audio and/or control information corresponding to a music/audio signal. The disclosed structure for performing the function is illustrated in Figure 5 and described at 13:11-59 and 15:17-16:41. In the preferred embodiment, a control track is prerecorded onto an audio magnetic tape. Id. at 13:45-49. However, the control track can be recorded on a video game cartridge (id. at 8:62-65), CD (id. at 11:65-12:2, 20:10-13), Digital Audio Tape (id. at 20:10-13), or other format (id. at 20:13-25). The 129 Patent discloses prerecording control tracks in either (or both) of two ways : 1) by automatically deriving control signals from an original recording, and 2) by allowing a human operator to create control signals via input switches and/or a computer data storage device. Id. at 15:17-24, 16:8-29. vi) means for producing the virtual environment in response to said prerecorded control track (Claim16) The stated function is producing the virtual environment in response to said prerecorded control track. The disclosed structure for performing the function is a general purpose computer (i.e., VR Processor 7 and VR System 250) programmed to 1) read control track data; and 2) create, destroy, move or modify the virtual environment or virtual objects therein based upon control track information, such as described at 18:15-56. See also id. at Figs. 1, 2, & 6, 8:44-51, 13:60-14:25, 17:13-49. vii) means for producing a graphic display of the virtual environment on the display device (claim 17) The stated function is producing a graphic display of the virtual environment on the display device. The disclosed structure is a general purpose computer programmed to 9

create, destroy, move or modify the virtual environment or virtual objects therein based upon control track information, such as described at 18:15-56. viii) means for supplying the music signal to the means for producing the virtual environment (claim 18) means for supplying the audio signal to the processor (claim 23) The stated function is supplying the audio/music signal to the processor/means for producing the virtual environment. The disclosed structure is a four-track tape 180T and four track tape playing unit 200 (or a digital recording medium and corresponding playing unit, or live microphone) and multichannel audio digitizer 245 (if receiving analog signals). Id. at Fig. 6, 16:56-58, 17:7-12, 20:10-25. ix) means for producing said virtual environment in response to both said music signal and said prerecorded control track (claim 18) The stated function is producing said virtual environment in response to both said music signal and said prerecorded control track. The disclosed structure for performing the function is a general purpose computer programmed to control, create, and/or manipulate the virtual environment or virtual objects therein in a manner choreographed with the original music signal. Id. at 17:42-49, 18:10-15. In one example, the VR program reads control track information and loads corresponding objects and displays the objects at fixed X and Y locations in synchronization with the music signal (i.e., lyrics and other song dependent data displayed at the same time a singer vocalizes the words in the music). Id. at 18:38-53; see also id. at 18:57-19:11. 10

III. OVERVIEW OF THE 129 PATENT A. Description of the Alleged Invention of the 129 Patent The 129 Patent describes controlling a computer system in response to music signals or in response to prerecorded control tracks corresponding audio signals. Ex. 1001, 129 Patent at 1:8-11. The 129 Patent discloses deriving control signals from music by, for example, employing simple algorithms (i.e., spectral analysis) to extract a rhythm or beat signal indicative from the level of a particular frequency band. Id. at 11:28-39. The 129 Patent discloses prerecording control tracks in either (or both) of two ways : 1) by automatically deriving control signals from an original recording, and 2) by allowing a human operator to create control signals via input switches and/or a computer data storage device. Id. at 15:17-24, 16:8-29. Examples of the virtual environments created by the disclosed virtual reality computer system include: displaying virtual hands clapping in time to the beat of the music (id. at 11:36-41); show[ing] images of the performers texture mapped onto dancing characters which dance in time to the music (id. at 11:56-62); display[ing] (virtual) stick figure danc[ing] in time (id. at 12:18-24); and words [are] displayed at the same time a singer (represented by a control track corresponding to the music signal) vocalizes the words in the music (id. at 18:45-53). B. Summary of the Prosecution History of the 129 Patent The 129 Patent was filed as U.S. Ser. No. 08/091,650 ( the 650 Application ) on July 14, 1993 with 23 initial claims. Ex. 1009, 129 File History at As-Filed Application. On June 29, 1994 the Examiner rejected all claims under 35 U.S.C. 103 as being unpatentable 11

over Applicants admission of the invention and the prior art. Id. at June 29, 1994 Office Action, p. 2. The Examiner identified the difference from the prior art to be that the prior art is object driven and the claimed invention is music driven. Id., p. 3. The Examiner rejected the claims on the basis that the reversal of a known process is not patentable. Id., pp. 3-4. In response, Applicants argued that the prerecorded control track of claims 5-11 and 16-20 was not taught in the prior art. Id. at Oct. 31, 1994 Office Action Response, pp. 1-2. With respect to the remaining claims, Applicant argued the reversal of the prior art methods would not result in the claimed invention. Id., pp. 3-4. On February 24, 1995, the examiner rejected all claims under 35 U.S.C. 103, as being unpatentable over Applicants admission in view of U.S. Patent No. 3,609,019 to Tuber. Following an interview between the Examiner and inventor Mark Bolas on November 15, 1995, Applicants amended each of the independent claims to require the claimed control signal or control track to have music and/or control information or audio and/or control information. Id. at November 16, 1995 Office Action Response, pp. 1-3. A Notice of Allowance issued on November 28, 1995. The patent issued on April 30, 1996. The 129 Patent expired on July 14, 2013. IV. THERE IS A REASONABLE LIKELIHOOD THAT CLAIMS 1-23 ARE UNPATENTABLE Musically-controlled virtual objects were prevalent before July 14, 1993. The following prior art references disclose each limitation of the Challenged Claims. As such, 12

the Challenged Claims are unpatentable. Included in the claim charts below are exemplary citations to the prior art references. A. Tsumura Anticipates Claims 10 and 11 Under 35 U.S.C. 102(a) Claims 10 and 11 recite methods for controlling a computer system having the very basic steps of prerecording a control track and operating the computer system in response to said prerecorded control track. As one example of a computer system operating from a prerecorded control track, the 129 Patent describes a karaoke-type VR program where the virtual environment includes the display of song lyric word data (i.e., virtual objects) at the same time a singer vocalizes the words in the music. Ex. 1001 at 18:41-56, 16:20-21. Tsumura was not cited or considered during original prosecution and discloses a karaoke device for displaying lyrics and vocal features during the production of music for vocal accompaniment. Ex. 1002, Tsumura at 1:6-8, 1:27-47. Tsumura discloses processing a user s vocal performance of a song to detect the user s vocal pitch using frequency analysis and to generate a signal indicative of the basic frequency (i.e., control signals). Id. at 12:35-49, 8:30-61. An image generator means synchronizes and compares stored pitch data with the detected frequency to generate control messages displayed to the user (e.g., "lower your pitched", "as you are" or "raise your pitch"). Id. at 12:7-13:35. Claim 10 Anticipated By Tsumura (Ex. 1002) 10. A method Tsumura discloses production of a computer-simulated karaoke for controlling environment by a computer system programmed with software, and a computer including peripheral devices such as a microphone (i.e., virtual reality system, including the steps of: computer system). Namely, Tsumura discloses a computer system that extracts and displays lyric data at the same time a singer vocalizes the words. Tsumura further discloses an interactive karaoke system where the 13

user s vocal pitch or strength is compared to stored vocal data and the results of the comparison are displayed to the user (e.g., "lower your pitched," "as you are" or "raise your pitch"), indicating that the system is interactive and intended to be immersive consistent with the descriptions in the 129 Patent. The invention also enables the detection of the strength and basic frequency of an actual vocal presentation which can then be compared with the vocal data and the results of the comparison displayed on the visual display medium. The user is in this way able to gauge the perfection of his own vocal rendition in terms of, for example, its strength and pitch. Appropriate indications are also output in accordance with the results of the comparison made between the vocal data and the strength and basic frequency of the actual rendition. Ex. 1002, Tsumura at 1:48-58; see also id. at Abstract. In said comparator 641, the pitch data and the basic frequency at the current lyric position are synchronized in accordance with the current lyric position indicator as described above and then compared. It is then determined whether or not the basic frequency is either "over pitched", in which case the basic frequency stands at a higher pitch than that prescribed by the pitch data, or is at the "correct pitch", in which case the basic frequency lies within the tolerance limits prescribed by the pitch data or is "under pitched", in which case the basic frequency stands at a lower pitch than that prescribed by the pitch data.... The message selector 642 selects an appropriate message in accordance with whether the basic frequency is found to be either "over pitched", at the "correct pitch" or "under pitched" and the display device 643 then outputs an appropriate display signal in accordance with the message received. On receipt of the display signal, the visual display medium 650 displays the appropriate message on screen. The message which corresponds to "over pitched" is "lower your pitch", the message which corresponds to a "correct pitch" is "as you are" and the message which corresponds to "under pitched" is "raise your pitch". Id. at 12:53-13:10; see also id. at 10:52-11:23. 14

(a) prerecording a control track having audio and/or control information corresponding to an audio signal; and Tsumura discloses storing (i.e., prerecording) music together with and corresponding to control information such as information relating to the vocal features of the music, screen display indicators, and lyric display position indicators. In FIG. 2 110 is a memory means in which music data for a large number of different pieces of music is stored. Each item of music data also contains vocal data relating to the vocal features of the music. As shown in FIG. 3, the data is divided in conceptual terms into a number of blocks 1, 2, 3--in the ratio of one block to one bar and the blocks are arranged in order in accordance with the forward development of the tune. The vocal data blocks are each almost exactly one block in advance of their corresponding music data blocks. Said vocal data also incorporates strength data which is used to indicate the appropriate strength of the vocal presentation. A screen display indicator is inserted at the end of each block as shown by the long arrows in FIG. 3 to indicate that the screen display should be updated at these points. Current lyric display position indicators are similarly inserted as required at the points marked by the short arrows in FIG. 3 to show that these are the appropriate points at which to indicate the lyric display position. In practice, of course, each screen display indicator is, in fact, set at a specific time interval t in advance of the boundary of each block of music data. As a result each current lyric position indicator is also set at the same specific time interval t in advance of its real position. Id. at 2:40-65. 15

. (b) operating Tsumura discloses a vocal data reading means and lyric position reader (i.e., the computer disclosed structure) programmed to implement extraction of the vocal data system in and current lyric display position from memory. Id. at Figs. 14 and 15 response to (items 620 and 630), 12:7-19, 13:11-18. said Tsumura also discloses an image generating means that synchronizes and prerecorded compares stored pitch data with a detected frequency of the user s vocal control track. performance to generate a computer-simulated environment and control messages displayed to the user (e.g., "lower your pitched", "as you are" or "raise your pitch"). Claim 11 Anticipated By Tsumura (Ex. 1002) 11. The method of claim 10, Tsumura also discloses that step (b) includes the step of also including the steps of: supplying the music to a music reproduction means of the (c) supplying the audio signal disclosed system. Id. at 12:3-6, Fig. 15 (item 660). to the computer system; and (d) operating the computer system in response to both the audio signal and the prerecorded control track. Tsumura discloses an image generating means that synchronizes and compares prerecorded pitch data with a detected frequency of the user s vocal performance to generate a computer-simulated environment and control messages displayed to the user (e.g., "lower your pitched", "as you are" or "raise your pitch"). See Tsumura as applied to element (b) of Claim 1. Tsumura also discloses reproducing the music in synchronization with the displayed lyrics and vocal instructions. Tsumura specifically discloses use of a delay circuit to compensate for system lag. Id. at 8:24-50, 3:22-24, 3:61-64. B. Lytle Anticipates Claims 5-7, 9-12, 16-20, and 22-23 Under 35 U.S.C. 102(b) Lytle was not cited or considered during original prosecution and describes the production of a computer graphics animation in a three-dimensional computer-simulated environment called More Bells and Whistles using a method for algorithmically controlling 16

computer graphics animation from a musical score. Ex. 1003, Lytle at 644, 649, Fig. 200. The system used to produce the animation, Computer Graphics/Electronic Music System ( CGEMS ), provides music-to-graphics mapping tools to correlate the computer graphics animation exactly to a synthesized musical soundtrack. Id. at 645, 649, 666. Lytle discloses that an original music composition and corresponding MIDI input file was used to map musical parameters to instrument motions, such that MIDI input parameters (i.e., note-on, note-off, velocity) could manipulate graphical object parameters (i.e., scale, translation, rotation, reflectance, lighting). Id. at 649, 651-652. Claim 5 Anticipated By Lytle (Ex. 1003) 5. A method for controlling production of a virtual environment by a virtual reality computer system, including the steps of: Lytle discloses a workstation computer and supercomputer programmed with software for creating and controlling computer graphics animation in a 3D computer-simulated environment. Lytle also discloses that, in an ideal production environment, the music, mapping, and graphics applications would run concurrently on a single machine. A method for algorithmically controlling computer graphics animation from a musical score is presented, and its implementation is described. This method has been used in the creation of video work consisting entirely of visually simulated musical instruments synchronized to their synthesized soundtrack counterparts.... [This technique] can be seen as either a dynamic visual modeling of existing music, a solution to the problem of precisely controlling graphical objects from musical data.... The production of a computer graphics animation utilizing this technique is described. In this example, the music is an original composition by the author produced on a bank of synthesizers driven by a personal computer running music sequencing software, communicating through Musical Instrument Digital Interface (MIDI). The MIDI file, which describes the musical score, was used to directly drive the synthesizers and to control the computer graphics instruments. Ex. 1003, Lytle at 644; see also 665. Geometric modeling of instrument components, motion testing, environment design, and adjustments concerning lighting and camera 17

angles take place in an interactive graphics environment on a workstation during the setup phase.... The mapping of musical parameters to instrument motions is the vital component, correlating the sound being heard with the visually perceived instrument.... All data from this preparatory phase, along with the MIDI data file, is then transferred to the supercomputer. The rendering phase consists of applying the specific mapping algorithms to the instrument models at appropriate times as derived from the musical score, building complete instrument objects, and generating individual frames.... In the editing phase, the musical piece from the first videotape is transferred to the audio portion of the master videotape, and the corresponding parts of graphics animation are layered on top of the video portion completing the animation. Id. at 649-650; also id. at 656. An ideal music-graphics production environment (if hardware limitations were not an issue) would provide real-time graphics animation previewing synchronized with the music. The music, mapping, and graphics applications would all run concurrently on a single machine, allowing simultaneous editing of musical, graphical, and mapping parameters. In practice, current technology graphics hardware is a bottleneck; most workstations do not have sufficient speed to provide this capability except for very simple instrumental models. Therefore, instrument element motion is first designed and previewed at the speed allowed by the available graphics hardware, then animation segments are rendered on high-performance compute engines and are transferred to videotape where the corresponding segment of music is overlaid. Id. at 655-656. 18

Id. at Fig. 199; also id. at 645, 647 (discussing modeling graphical objects in their three-dimensional environment), 648, 655-656. Lytle illustrates that the computer-simulated environment is rendered from a perspective similar to what a person s avatar would see through their own eyes, if they were playing the animated instrument objects (i.e., first person perspective). (a) prerecording a control track having audio and/or control information corresponding to an audio signal; and Id. at Cover; see also id. at Figs. 215, 216. Lytle also discloses that scientific researchers found the animation produced using the disclosed system, More Bells and Whistles, to be inspirational and that [c]hildren are mesmerized by it and request repeated showings indicating that the computer-simulated environment is immersive. Id. at 667. Lytle discloses that a computer running music sequencing software produces a MIDI file (i.e., prerecording a control track) that corresponds to a recorded music signal. In this example, the music is an original composition by the author, produced on a bank of synthesizers driven by a personal computer running music sequencing software, communicating through Musical Instrument Digital Interface (MIDI). The MIDI file, which describes the musical score, was used to directly drive the synthesizers and to control the computer graphics instruments. Id. at 644. Sequencers store encoded musical scores as MIDI data files, which contain timing information relating to each musical aspect. Id. at 646; also id. at 650 ( The music sequencer used for all testing and production with this system was Texture 3.0 from Magnetic Music, Inc. ). All data from this preparatory phase, along with the MIDI data file, is 19

(b) operating the virtual reality computer system in response to said prerecorded control track to generate said virtual environment. then transferred to the supercomputer. The rendering phase consists of applying the specific mapping algorithms to the instrument models at appropriate times as derived from the musical score, building complete instrument objects, and generating individual frames. Id. at 649. In this context, the musical application is defined as a producer of musical data. Typically, this will be a music sequencer representing musical data in the MIDI format, but it could also be any program which encodes music at the level of individual notes and performance nuances. Id. at 648; see generally 651-652. Lytle discloses a workstation computer and supercomputer programmed with software for creating and controlling computer graphics animation in a computer-simulated environment by mapping musical MIDI data from the MIDI input file to 3D instrument objects. The setup phase before rendering involves several aspects for a given piece of music. For each musical part, a corresponding graphics instrument must be modeled and mappings designed and implemented. The output of each mapper is then set to be routed to the variables in the model that are dependent on the music. Once instrument objects have been placed in the environment and other details have been attended to, preparations for the rendering phase is complete. CGEMS was implemented to perform the music-to-graphics mapping operations, receiving a MIDI file as input, and producing as output a series of parameter files which are passed to the graphics application. It runs at the workstation level for the setup phase (mapping design and correlation with graphics objects) and at the supercomputer level for the rendering phase (composite object generation before imaging).... Applying these output parameters [(period, amplitude, an damping)] to the X-rotations of a set of bells achieves a swaying motion an appropriate effect of being hit by a mallet (the mallet motion would be handled by a separate mapping module). The note number (NOTE) and MIDI velocity (VEL) database parameters are mapped into the desired ranges for input into the mapping function by applying the interpolate operator (shown in Figure 207 on page 677 with some sample ranges). Period is inversely proportional to note number and, therefore, is proportional to bell size, while amplitude is proportional to MIDI velocity. The note number also contributes to the amplitude, because higher, smaller bells will swing farther than a larger bell receiving the same impact. This module was applied to drive the emotion of the three bells shown in Figure 198 on page 670. Id. at 656-20

657; also id. at 644, 649-650 (reproduced at the preamble of Claim 5). For a given frame, CGEMS reads the MIDI file, performs mapping operations, and produces graphics parameter files. These files are then directed to object generator programs that construct completed instrument objects by instancing template objects representing instrument elements according to a set of instructions describing parameter application. Finally, these objects, along with the remaining static objects, reflectance information, lighting, and other environment description data are passed to the image generator that produces the given frame. For all testing and final production, this phase took place on a pair of IBM ES/3090-600Js and an IBM RISC System/6000 Model 530 workstation. The two 3090s were running MP AIX/370 and were combined to form a single logical system using IBM AIX/TCF. Id. at 664; also id. at 666-667. Id. at Fig. 202; also id. at Cover and Fig. 199, Figs. 215, 216, 665-666 (discussing implementation of percussion element animation). Claim 6 Anticipated By Lytle (Ex. 1003) 6. The method of claim 5, wherein step (b) includes the Lytle discloses that step (b) includes producing a graphic display of the virtual environment on a display device. Namely, Lytle discloses a supercomputer that displays graphic images and animations and therefore include a monitor. Lytle also discloses step (c) of supplying a recording of 21

step of producing a graphic display of the virtual environment on a display device, and also including the steps of: (c) supplying the audio signal to the virtual reality computer system; and (d) operating the virtual reality computer system in response to both said audio signal and said the music signal to a video control console for editing. The goal of the system is a videotape consisting of a computer graphics animation correlating exactly to a synthesized musical soundtrack. The audio and visual components are produced separately and integrated together in the editing phase..., Music is composed and arranged using a sequencer running on a personal computer.... When notes and levels are all satisfactory, a realization of the musical composition is recorded onto the audio portion of a videotape.... Geometric modeling of instrument components, motion testing, environment design, and adjustments concerning lighting and camera angles take place in an interactive graphics environment on a workstation during the setup phase.... The rendering phase consists of applying the specified mapping algorithms to the instrument models at appropriate times as derived from the musical score, building complete instrument objects, and generating individual frames.... Each frame is individually transferred to the frame buffer, where it is converted to NTSC video and stored on a digital disk recorder. From there, sequences of frames are played back at full animation rates, and the video portion is recorded onto a second video tape. In the editing phase, the musical piece from the first videotape is transferred to the audio portion of the master videotape, and the corresponding parts of graphics animation are layered on top of the video portion completing the animation. Id. at 649-650. Lytle discloses that, in an ideal production environment, the music, mapping, and graphics applications would run concurrently on a single machine to provide real-time graphics animation previewing synchronized with the music. See id. at 655-656; also id. at 644-645, 647, 664, Fig. 199, 200. Lytle discloses that step (b) also includes providing the MIDI file to a supercomputer programmed with software to automatically animate 3D graphical instrument objects based on the MIDI data. See Lytle as applied to element (b) of Claim 5. Lytle also discloses that during the editing phase, the video portion of the animation and the corresponding music is layered to complete the computer-simulated environment. In the editing phase, the musical piece from the first videotape is transferred to the audio portion of the master videotape, and the 22

prerecorded control track to generate said virtual environment. corresponding parts of graphics animation are layered on top of the video portion completing the animation. Id. at 650; also Fig. 199. Lytle discloses that, in an ideal production environment, the music, mapping, and graphics applications would run concurrently on a single machine to provide real-time graphics animation previewing synchronized with the music. See id. at 655-656. Claim 7 Anticipated By Lytle (Ex. 1003) 7. The method of claim 6, wherein step (c) includes the step of supplying the audio signal to the virtual reality computer system with a first delay relative to the prerecorded control track, wherein the first delay is selected to enable generation of sounds in response to the audio signal in a manner so that the sounds have a desired time relationship to the graphic display. Lytle discloses that step (c) includes supplying the recording of the music signal to a video control console with a delay relative to the MIDI file being provided to the supercomputer for rendering. Lytle also discloses that the audio and video portions of the animation are recorded separately and then transferred to a single videotape (via the video control console). As such, the delay enables generation of sounds in response to the music signal such that the sounds are synchronized with the animation. Id. at 649-650, 655-656 (reproduced above), Fig. 199. Claim 9 Anticipated By Lytle (Ex. 1003) 9. The method of claim 5, wherein step (a) includes the step of manually operating an input device to generate the control track. Lytle discloses that the music is an original composition produced on a bank of synthesizers driven by a computer running music sequencing software. During the composition phase, music is manually edited using the sequencing software to finalize the composition and generate the final musical score and associated MIDI data. In this example, the music is an original composition by the author, produced on a bank of synthesizers driven by a personal computer running music sequencing software, communicating through Musical Instrument Digital Interface (MIDI). The MIDI file, which describes the musical score, was used to directly drive the synthesizers and to control the computer graphics instruments. Id. at 644; also id. at 649. Programs for personal computers called sequencers receive and transmit MIDI and allow editing of musical data. A sequencer is analogous to a word processor, pertaining to notes instead of characters. Functionality is provided to insert, delete, or cut-and-paste notes and phrases. Many more features, such as transposition and harmonization, are provided to allow a composer to sculpt a piece of music in fine detail. Id. at 646. Claim 10 Anticipated By Lytle (Ex. 1003) 10. A method for controlling a computer Lytle as applied to the preamble of Claim 23

system, including the steps of: 5. (a) prerecording a control track having audio Lytle as applied to element (a) of Claim 5. and/or control information corresponding to an audio signal; and (b) operating the computer system in response Lytle as applied to element (b) of Claim 5. to said prerecorded control track. Claim 11 Anticipated By Lytle (Ex. 1003) 11. The method of claim 10, also including the Lytle as applied to element (c) of Claim 6. steps of: (c) supplying the audio signal to the computer system; and (d) operating the computer system in response Lytle as applied to element (d) of Claim to both the audio signal and the prerecorded 6. control track. Claim 12 Anticipated By Lytle (Ex. 1003) 12. A virtual reality computer system, Lytle as applied to the preamble of Claim 5. including: [12(a)] means for supplying a first signal selected from a group consisting of a control signal having music and/or control information generated in response to a music signal, a prerecorded control track having music and/or control information corresponding to the music signal, and a control signal having music and/or control information generated in response to the prerecorded control track; and [12(b)] means for receiving the first signal and influencing action within a virtual environment in response to said first signal. As applied to element (a) of Claim 5, Lytle discloses the function of supplying a first signal, that is a prerecorded control track having music and/or control information corresponding to the music signal. Namely, Lytle discloses supplying a MIDI file to a supercomputer. The disclosed structure is a personal computer programmed with music sequencing software. Id. at 644, 646, 648, 649, 650 (reproduced for element (a) of Claim 5), also Fig. 199. Lytle discloses is a supercomputer programmed to read the MIDI file and create, move, and modify the 3D virtual instrument objects in the virtual environment. See Lytle as applied to element (b) of Claim 5). Claim 16 Anticipated By Lytle (Ex. 1003) 16. A virtual reality computer system for producing a virtual environment, including: [16(a)] means for prerecording a control track having music and/or control Lytle as applied to the preamble of Claim 5. Lytle discloses that a computer running music sequencing software produces a MIDI file that is transferable to a supercomputer. The disclosed structure is a personal computer programmed with music sequencing software and associated 24

information corresponding to a music signal; and [16(b)] means for producing the virtual environment in response to said prerecorded control track. synthesizers. Id. at 644, 646, 648, 649, 650 (reproduced above for element (a) of Claim 5), also Fig. 199. As applied to element (b) of Claim 5, Lytle discloses the claimed function. The disclosed structure is a workstation computer and supercomputer programmed to read the MIDI file and create, move, and modify the virtual instrument objects in the virtual environment. See Lytle as applied to element (b) of Claim 5. Claim 17 Anticipated By Lytle (Ex. 1003) 17. The system of claim 16, wherein the means for producing the virtual environment includes: [17(a)] a display device; and [17(b)] a means for producing a graphic display of the virtual environment on the display device. Lytle discloses that the supercomputer displays graphic images and animations and therefore includes a monitor. Lytle also discloses that, in an ideal production environment, the music, mapping, and graphics applications would run concurrently on a single machine to provide real-time graphics animation previewing synchronized with the music. Id. at 649-650, 655-656 (reproduced above for Claim 6), Fig. 199. Lytle discloses the function of producing a graphic display of the virtual environment on the display device and the structure is a supercomputer. Id. at 649-650, 655-656 (reproduced above with respect to Claim 6), Fig. 199. Claim 18 Anticipated By Lytle (Ex. 1003) 18. The system of claim 16, also including: [(a)] means for supplying the music signal to the means for producing the virtual environment, and [(b)] wherein the means for producing the virtual environment includes means for producing said virtual environment in response to both said music signal and said prerecorded control track. Lytle discloses supplying a recording of the music signal to a video control console for editing. The disclosed structure is a personal computer programmed with sequencing software and recording unit for recording composed music onto the audio portion of a videotape. Id. at 649-650, 655-656 (reproduced above with respect to Claim 6), Fig. 199. Lytle discloses producing the virtual environment in response to both the music signal and the MIDI file. Lytle discloses that during the editing phase, the video portion of the animation and the corresponding music are layered to complete the computersimulated environment. The disclosed structure is a supercomputer programmed to control, create, and manipulate 3D virtual instrument objects in a manner choreographed with the original music signal and a video editing console. Id. at 649-650, 655-656 (reproduced above for Claim 6), Fig. 199. Claim 19 Anticipated By Lytle (Ex. 1003) Lytle discloses a control track containing additional information to that which can be extracted from the music signal. For example, Lytle Apparatus as in claim 16, wherein 25

said control track contains additional information to that which can be extracted from the music signal. discloses that MIDI includes special purpose events and instrument control mechanisms, which would not be extracted from the music signal. In addition, each event in a MIDI file carries a channel number specifying the instrument to which it is directed and has an associated time stamp representing the elapsed time since the beginning of the piece which would also not be extracted from the music signal. See id. at 651. Claim 20 Anticipated By Lytle (Ex. 1003) The system of claim 16, wherein said control track is time shifted relative to the music signal to compensate for delays in said virtual reality computer system. Lytle discloses that the control track is time shifted relative to the music signal to compensate for delays in said virtual reality computer system. For example, Lytle discloses supplying the recording of the music signal to a video control console at a different point in time relative to the MIDI file being provided to the supercomputer for rendering. Lytle also discloses that the audio and video portions of the animation are recorded separately and then transferred to a single videotape (via the video control console). As such, the time shifting of the MIDI relative to the audio signal enables generation of sounds in response to the music signal such that the sounds are synchronized with the animation, and to compensate for the delays in the virtual reality computer system as [t]he complete process of generating each frame [can take] several minutes. Id. at 649-650, 655-656 (reproduced above for Claim 7), Fig. 199. Claim 22 Anticipated By Lytle (Ex. 1003) 22. A computer system, Lytle as applied to the preamble of Claim 5. including: [(a)] means for prerecording a control track having audio and/or control information corresponding to an audio signal; and [(b)] a processor which receives the control track and which is programmed with software for operating the computer system in As applied to element (a) of Claim 5, Lytle discloses the claimed function. The disclosed structure is a personal computer programmed with music sequencing software and associated synthesizers. Id. at 644, 646, 648, 649, 650 (reproduced above with respect to element (a) of Claim 5), also Fig. 199. Lytle discloses that a supercomputer receives the MIDI input file and is programmed with software to control computer graphics animation in a computer-simulated environment using mappings of MIDI data to instrument objects. To determine the current state of each graphical instrument in the environment, the MIDI file is read, and mapping 26

response to said control track. algorithms are applied to transform musical information into corresponding graphical parameters. The function of the technique described in this paper can be summarized as a mapping of musical data to graphical data. Id. at 648. Geometric modeling of instrument components, motion testing, environment design, and adjustments concerning lighting and camera angles take place in an interactive graphics environment on a workstation during the setup phase.... The mapping of musical parameters to instrument motions is the vital component, correlating the sound being heard with the visually perceived instrument. All data from this preparatory phase, along with the MIDI data file, is then transferred to the supercomputer. The rendering phase consists of applying the specific mapping algorithms to the instrument models at appropriate times as derived from the musical score, building complete instrument objects, and generating individual frames. The complete process of generating each frame takes several minutes on one processor of an IBM 3090 TM. Id. at 649; see also Lytle as applied to element (c) of Claim 5. Claim 23 Anticipated By Lytle (Ex. 1003) 23. The system of claim 22, also including: [(a)] means for supplying the audio signal to the processor, and [(b)] wherein the processor is programmed with software for operating the computer system in response to both the audio signal and the control track. Lytle discloses supplying a recording of the music signal to a video control console for editing. The disclosed structure is a personal computer programmed with sequencing software and recording unit for recording composed music onto the audio portion of a videotape. Id. at 649-650, 655-656 (reproduced above for Claim 6), Fig. 199. Lytle discloses that the MIDI file is provided to a supercomputer programmed with software to automatically animate 3D graphical instrument objects. See element (b) of Claim 5. Lytle discloses that during the editing phase, the video portion of the animation and the corresponding music are layered to complete the computer-simulated environment. See element (b) of Claim 6. C. Adachi Anticipates Claims 1, 12, 13, 15, and 21 Under 35 U.S.C. 102(b) Adachi was not cited or considered during original prosecution and discloses a tone visualizing apparatus that controls a displayed image in response to a tone. Ex. 1004, Adachi 27

at 1:6-10. Adachi discloses processing a musical tone signal to detect characteristics of the signal such as amplitude/level or spectrum signal components and controlling the perceived distance between an object and its background on a three-dimensional image display unit such as a stereoscopic television. Id. at 5:2-6:2. Claim 1 Anticipated By Adachi (Ex. 1004) Adachi discloses a tone visualizing apparatus that includes a central processing unit programmed with software for controlling the perceived distance between an object and its background on a three-dimensional image display unit such as a stereoscopic television based on the level or amplitude of an inputted musical tone signal. 1. A method for controlling production of a virtual environment by a virtual reality computer system, including the steps of: In the apparatus shown in FIG. 1, when the audio signal representative of the musical tone is inputted into the envelope detecting circuit 5, this audio signal is effected by AM detection and integration so that an envelope signal corresponding to scale (i.e., level or amplitude) of this inputted audio signal is generated. This envelope signal is converted into digital signal by the A/D converter 7, and the digital signal is supplied to the CPU 1 via the bus line 13. The CPU 1 transmits such digitized envelope signal to the display control circuit 9 via the bus line 13 at each timing of the predetermined time constant. The display control circuit 9 reads image information from the image memory 3, and then the display control circuit 9 displays an image including a predetermined object and its background on the display screen of the display unit 11 based on the read image information. As the predetermined object, it is possible to display an image of bicycle, train, drum and fife band or the like, for example. In the case where the display unit 11 is a three-dimensional image display unit such as a stereoscopic television, the image of object is displayed within the background image three-dimensionally. Then, as described before, the display unit 9 controls the display unit 11 to magnify or reduce the scale of displayed background image in response to the amplitude of envelope indicated by the envelope information which is supplied thereto via the bus line 13, so that mutual relation between the object and background will be changed and therefore sense of distance of the object will be controlled. In the case where the display unit 11 is the three-dimensional image display unit, the sense of distance between the object and/or the background image is controlled or the mutual relation between the object and the background is changed 28

(a) processing music signals to generate control signals having music and/or control information; and based on three-dimensional image control. Further, it is possible to adopt another method for controlling the sense of distance of object by varying position relation of the object within the background. For example, the display unit 11 displays an image in which a singer sings a song in front of the band at first. Then, it is possible to express the sense of distance of the singer from audience side by moving position of the singer to the center position or backward position. Ex. 1004, Adachi at 5:22-64; also id. at Abstract. 4. A tone visualizing apparatus according to claim 1 wherein said image display means is a three-dimensional image display unit capable of adjusting the sense of distance of said object three-dimensionally. Id. at 16:33-36; also id. at 1:60-2:11, Fig. 1. Adachi specifically discloses displaying objects on a three-dimensional image display unit such as a stereoscopic television, indicating that the displayed computer-simulated environment is intended to be immersive. Adachi also discloses an exemplary embodiment where a user observes a singer singing in front of a band and the singer changes position with respect to the audience based on detected musical tone parameters. As the user is part of the audience, the user observes the performance from a first person perspective. Adachi discloses processing a musical tone signal to detect characteristics of the signal such as amplitude/level or spectrum signal components and create an envelope signal (i.e., control signal). The apparatus shown in FIG. 1 comprises a central processing unit (CPU) 1 constituted by a microprocessor or the like; an image memory 3; an envelope detecting circuit 5 which inputs the musical tone signal and other audio signals; an analog-to-digital (A/D) converter 7; a display control circuit 9; a display unit 11 such as the CRT display unit; and a bus line 13.... In addition, the envelope detecting circuit 5 can be constituted by use of an AM (Amplitude Modulation) detector and an integration circuit which integrates output of this AM detector by a predetermined time constant, for example. In the apparatus shown in FIG. 1, when the audio signal representative of the musical tone is inputted into the envelope detecting circuit 5, this audio signal is effected by AM detection and integration so that an envelope signal corresponding to scale (i.e., level or amplitude) of this inputted audio signal is generated. This envelope signal is converted into digital signal by the A/D converter 7, and the digital signal is supplied to the CPU 1 via the bus line 13. Id. at 5:6-30. 29

(b) operating the virtual reality computer system in response to the control signals to generate said virtual environment. Incidentally, the first embodiment detects the envelope and then performs image control in response to variation of the detected envelope. However, instead of the envelope, it is possible to detect other musical tone parameters such as tone color, tone volume and frequency in the first embodiment. Id. at 5:65-6:2. The FFT circuit 205 receives and then executes the fast Fourier transform on the inputted audio signal such as the musical tone signal so that the audio signal will be divided into spectrum signals. The CPU 201 extracts a signal of fundamental wave component from the spectrum signals outputted from the FFT circuit 205, and then the frequency of this extracted signal is denoted as f B, which is stored in the CPU 201 (in a step S11). Id. at 8:65-9:5; also id. at Abstract, 9:5-20, 9:47-6:13, 15:24-32, Figs. 5, 6A, 6B. Adachi discloses operating the tone visualizing apparatus (i.e., virtual reality computer system) in response to the detected signal parameters (i.e., control signals) to generate the threedimensional computer-simulated environment and control the perceived distance between an object and its background image. Id. at 5:2-64, 16:15-36. Claim 12 Anticipated By Adachi (Ex. 1004) 12. A virtual reality computer system, including: [12(a)] means for supplying a first signal selected from a group consisting of a control signal having music and/or control information generated in response to a music signal, a prerecorded control track having music and/or control information corresponding to the music signal, and a control signal having music and/or control information Adachi as applied to the preamble of Claim 1. Adachi discloses the function of supplying a first signal that is a control signal having music and/or control information generated in response to a music signal and the disclosed structure for performing the claimed function is audio input and envelope detecting circuit or Fast Fourier Transform circuit. Namely, Adachi discloses an envelope detecting circuit or a Fast Fourier Transform circuit for processing a musical tone signal to detect musical parameters such as level/amplitude, tone color, tone volume, and/or frequency (by detecting spectrum signal components) (i.e., control information) and pass the musical parameters on to a CPU. Id. at 5:6-30, 5:65-6:2, 8:63-9:5; also id. at Figs. 1 and 5: 30

generated in response to the prerecorded control track; and [12(b)] means for receiving the first signal and influencing action within a virtual environment in response to said first signal. Adachi discloses a CPU and display control unit programmed to use detected musical parameters (i.e., a control signal) to select objects for display on a three-dimensional image display unit such as a stereoscopic television and to control the perceived distance between an object and its background image in a three-dimensional computer-simulated environment. Id. at Fig. 1 (items 1, 9), 5:2-6:64, 16:15-36. Claim 13 Anticipated By Adachi (Ex. 1004) 13. The apparatus of claim 12, wherein the means for supplying the first signal includes an analysis apparatus having means for receiving said music signal in digital or analog form, and processing said music signal to produce control information for modification of objects in the virtual environment. Adachi discloses that the envelope detecting circuit and Fast Fourier Transform circuit is an analysis apparatus that receives a music signal in analog form and processes the signal to detect musical parameters for controlling the perceived distance between an object and its background image in a three-dimensional computer-simulated environment. See Claim 12. Claim 15 Anticipated By Adachi (Ex. 1004) 15. The apparatus of claim 12, wherein the means for supplying the first signal includes a music playing means for supplying said music signal. As discussed with respect to element (a) of Claim 12, Adachi discloses an audio input that can be a live microphone that receives tones of nonelectronic musical instruments. Id. at 12:23-26. Claim 21 Anticipated By Adachi (Ex. 1004) 21. A virtual reality computer Adachi as applied to the preamble of Claim 1. system, including: [(a)] a source of a Adachi discloses a source of a music signal is an audio input that 31

music signal; and produces a musical tone analog signal. Id. at Fig. 1; see generally id. at 5:5-12, 5:22-30. Adachi discloses the source of a musical signal as, for example, an electric musical instrument (e.g., keyboard) or a nonelectric musical instrument (e.g., violin, guitar, piano, etc.). Id. at 6:26-32, 10:21-22, 12:25-26, Figs. 2, 3, 10. [(b)] an apparatus for extracting information from the music signal for modification of objects in a virtual environment. Adachi discloses a tone visualizing apparatus that includes an envelope detecting circuit or a Fast Fourier Transform circuit for extracting information from a music signal. For example, Adachi discloses that the envelope detecting circuit is an Amplitude Modulation detector. Adachi discloses that other musical tone parameters such as tone color, tone volume, and frequency (by detecting spectrum signal components) can be detected. Id. at Abstract, 5:6-30, 5:65-6:2, 8:63-9:5. Adachi discloses using extracted music information to select objects for display on a three-dimensional image display unit such as a stereoscopic television and to control the perceived distance between an object and its background image, thereby generating a three-dimensional computer-simulated environment. Id. at 5:2-6:2, 16:15-36. D. Lytle in view of Adachi Renders Claims 1, 8, 12, 13, 15, and 21 Obvious Under 35 U.S.C. 103(a) Lytle, Adachi, and the 129 Patent each relate to controlling a computer system in response to music signals. See, e.g., Ex. 1001 at 1:9-11, Ex. 1004, Adachi at 1:6-10, Ex. 1003, Lytle at 646. Lytle discloses composing an original score, where the author had access to and utilized the underlying MIDI data to animate virtual objects. Ex. 1003, Lytle at 646, 649-650. Lytle, however, recognized that as of 1990, [m]ethods exist[ed] to translate pitch to MIDI, and musical encoding schemes other than MIDI are integrable into the system. Id. at 667; see also id. at 651 ( [S]ince the technique is not integrally tied to the format of input data, it can easily be implemented to accommodate other encoding signals. ). Adachi discloses detecting musical parameters (e.g., tone color, volume, frequency, level/amplitude) using an envelope detecting circuit or Fast Fourier Transform circuit. 32

Ex. 1004, Adachi at 5:6-30, 5:65-6:2, 8:63-9:5, Figs. 1 and 5. Adachi further teaches processing music signals to generate a digital signal that is supplied to a computer. Id. at 5:28-30, 8:65-9:5. Therefore, upon reading the disclosure of Adachi, a skilled artisan would have recognized that modifying Lytle to process a music signal to generate a control signal that is transmitted to a computer system as described by Adachi would be beneficial. Ex. 1007, Pope Decl., at 43, 44. Indeed, Lytle contemplates that other schemes for encoding a music signal can be used with its system for creating animations based on a music source. Ex. 1003, Lytle at 667. A skilled artisan, therefore, would know that any music encoding scheme could be used with the system of Lytle without affecting its operation. Ex. 1007, Pope Decl., at 44. A skilled artisan would have also appreciated that this improvement to Lytle could be achieved by simply using a different method for processing a music signal, such as that disclosed by Adachi. Id. at 45, 46. Thus, it would have been natural and an application of nothing more than ordinary skill and common sense for a skilled artisan to combine Lytle with the specific type of processing of music signals disclosed in Adachi. Ex. 1007, Pope Decl., at 44-46. Even further, such a modification would have yielded predictable results without undue experimentation. As is evident from the descriptions above, Lytle and Adachi are in the same field of endeavor as the 129 Patent controlling a computer system with music signals to generate a display based on the music signals and are each, therefore, analogous to the 129 Patent. See e.g., Ex. 1001, 129 Patent at 11:21-43. 33

Claim 1 1. A method for controlling production of a virtual environment by a virtual reality computer system, including the steps of: (a) processing music signals to generate control signals having music and/or control information; and Obvious over Lytle (Ex. 1003) in view of Adachi (Ex. 1004) Lytle as applied to the preamble of Claim 1 in Section IV.B. Lytle discloses using a computer running music sequencing software to produce a MIDI file. Lytle also discloses that, although not part of the disclosed implementation, musical sound could first be analyzed to extract data describing individual notes played by each instrument. In this context, the musical application is defined as a producer of musical data. Typically, this will be a music sequencer representing musical data in the MIDI format, but it could also be any program which encodes music at the level of individual notes and performance nuances. It must be stressed that the data input to the mapper includes only information about which notes were played on which instrument, including when, how loudly, and how long the sound lasted. Musical sound encoded as digital wave samples is at too low a level to be used by this technique and must first be analyzed to extract data describing individual notes played by each instrument, which is not a feature of the current implementation. Ex. 1003, Lytle at 648. [A]lthough the implementation was for music encoded in MIDI, there is nothing inherent in this method precluding its application to non-midi instruments. Methods exist to translate pitch to MIDI, and musical encoding schemes other than MIDI are integrable into the system. Id. at 667; see also id. at 651. Adachi discloses processing a musical tone signal to detect characteristics of the signal such as amplitude/level or spectrum signal components and create an envelope signal. See Adachi applied to element (a) of Claim 1 in Section IV.C. (b) operating the virtual reality computer system in response to the control signals to generate said virtual environment. Lytle as applied to element (b) of Claim 5 in Section IV.B. Claim 8 Obvious over Lytle (Ex. 1003) in view of Adachi (Ex. 1004) Lytle discloses using a computer running music sequencing software to produce a MIDI file. Lytle also discloses that, although not part of the disclosed implementation, musical sound could first be analyzed to extract data describing individual notes played by each instrument. Ex. 1003, Lytle at 648, 667 (reproduced above for element (a) of Claim 1). 8. The method of claim 5, wherein step (a) includes the step of automatically 34

generating the control track by processing the audio signal. Adachi discloses automatically processing a musical tone signal to detect characteristics of the signal such as amplitude/level or spectrum signal components. See Adachi as applied to element (a) of Claim 1 in Section IV.C above. Claim 12 12. A virtual reality computer system, including: [12(a)] means for supplying a first signal selected from a group consisting of a control signal having music and/or control information generated in response to a music signal, a prerecorded control track having music and/or control information corresponding to the music signal, and a control signal having music and/or control information generated in response to the prerecorded track; and control [12(b)] means for receiving the first signal and influencing action within a virtual environment in response to said first signal. Obvious over Lytle (Ex. 1003) in view of Adachi (Ex. 1004) Lytle as applied to the preamble of Claim 5 in Section IV.B. As applied to element (a) of Claim 12 in Section IV.B, Lytle discloses supplying a MIDI file to a supercomputer. The disclosed structure for performing these functions is a personal computer programmed with music sequencing software. See Lytle as applied to element (a) of Claim 12 in Section IV.B. Lytle also discloses that, although not part of the disclosed implementation, musical sound could first be analyzed to extract data describing individual notes played by each instrument. Ex. 1003, Lytle at 648, 651 (reproduced for element (a) of Claim 1). Adachi discloses the function of supplying a first signal where the first signal is a control signal having music and/or control information generated in response to a music signal and the disclosed structure for performing the claimed function is audio input and envelope detecting circuit or Fast Fourier Transform circuit. Namely, Adachi discloses an envelope detecting circuit or a Fast Fourier Transform circuit for processing a musical tone signal to detect musical parameters such as level/amplitude, tone color, tone volume, and/or frequency (by detecting spectrum signal components) and pass the musical parameters on to a CPU. See, Adachi as applied to element (a) of Claim 12 in Section IV.C. Claim 13 13. The apparatus of claim 12, wherein the means for supplying the first signal includes an analysis apparatus having means for receiving said music signal in digital or analog form, and processing said music signal to produce control information for modification of objects in the virtual environment. Lytle as applied to element (b) of Claim 12 in Section IV.B. Obvious over Lytle (Ex. 1003) in view of Adachi (Ex. 1004) Lytle in view of Adachi as applied to Claim 12 in Section IV.B. 35

Claim 15 Obvious over Lytle (Ex. 1003) in view of Adachi (Ex. 1004) 15. The apparatus of claim 12, wherein the means for Adachi as applied to Claim 15 in supplying the first signal includes a music playing Section IV.C. means for supplying said music signal. Claim 21 Obvious over Lytle (Ex. 1003) in view of Adachi (Ex. 1004) 21. A virtual reality Lytle as applied to the preamble of Claim 12 in Section IV.B. computer system, including: [(a)] a source of a Adachi as applied to element (a) of Claim 21 in Section IV.C. music signal; and [(b)] an apparatus for extracting information from the music signal for modification of objects in a virtual environment. Lytle discloses that a computer running music sequencing software produces a MIDI file that is used to generate a computer-simulated environment where 3D instrument objects are mapped to musical data. Ex. 1003, Lytle at 644, 648, 649, 656-657, 664, 666-667, Fig. 202 (reproduced for Claim 1 in Section IV.A). Lytle discloses manipulating (i.e., modifying) graphical objects based on the MIDI input data. Id. at 651-652, 655-656. Lytle also discloses that, although not part of the disclosed implementation, musical sound could first be analyzed to extract data describing individual notes played by each instrument. Id. at 648, 651. Adachi discloses a tone visualizing apparatus that includes an envelope detecting circuit or a Fast Fourier Transform circuit for extracting information from a music signal. For example, Adachi discloses that the envelope detecting circuit an Amplitude Modulation detector. Adachi discloses that other musical tone parameters such as tone color, tone volume, and frequency (by detecting spectrum signal components) can be detected. Ex. 1004, Adachi at 5:6-30, 5:65-6:2, 8:63-9:5. E. Thalmann in view of Williams Renders Claims 1-4, 12, 13, 15, and 21 Obvious Under 35 U.S.C. 103(a) Thalmann, Williams, and the 129 Patent each relate to controlling a computer system in response to audio or music signals. See, e.g., Ex. 1001 at 1:9-11, Ex. 1005, Williams at 1:9-11 and 4:37-48, Ex. 1006, Thalmann at 4, 5. The 129 Patent describes using the claimed VR system for generating content data (i.e., animated image data and audio data) to fill or populate a virtual environment. Ex. 1001 at 2:20-21. Similarly, Thalmann 36

describes generating computer-simulated three-dimensional animation scenes in response to VR device inputs such as DataGloves, 3D mice, SpaceBalls, MIDI keyboards, and audio input devices. Ex. 1006, Thalmann at 1-2. Thalmann specifically discloses using audio input (i.e., sound and/or speech) for facial animation. Id. at 4, 5. Williams also discloses analyzing features (i.e., frequency, intensity and percussive sounds) of a sound recording (which may be music, speech, or any other type of sound recording), to automatically associate predetermined actions with time positions in the sound recording. Ex. 1005, Williams at 4:36-63. Williams specifically discloses using the animation so that a character s mouth, face and body movements are synchronized with a sound recording. Id. at 6:62-65. Therefore, upon reading the disclosure of Williams, one of ordinary skill in the art would have recognized that modifying Thalmann to process a music signal to generate a control signal, as taught by Williams, that operates a virtual reality computer system would achieve the result described by Thalmann. Ex. 1007, Pope Decl., at 33. Namely, such a modification would utilize an audio input to produce signals that control the facial animation of a character in the virtual world of Thalmann. As noted above, Thalmann expressly contemplates using an audio input to interactively control an animation. A skilled artisan, therefore, would have appreciated that this improvement to Thalmann could be achieved merely by using the method of processing audio signals as taught by Williams. Id. at 33, 34. Thus, it would have been natural and nothing more than an application of nothing more than ordinary skill to combine Thalmann with the processing of Williams. Id. at 35. Indeed, such a modification would have yielded predictable results a virtual reality 37

system that is controlled by audio signals without undue experimentation. Id. at 34. As is evident from the descriptions above, Thalmann and Williams are in the same field of endeavor as the 129 Patent controlling a computer system (virtual reality or otherwise) with music signals to generate a display based on the music signals and are each, therefore, analogous to the 129 Patent. See e.g., Ex. 1001, 129 Patent at 11:21-43. Claim 1 Obvious over Thalmann (Ex. 1006) in view of Williams (Ex. 1005) Thalmann discloses production of an immersive computer-simulated environment that is produced by a computer system programmed with animation software, and including peripheral VR devices such as DataGloves, 3D mice, SpaceBalls, MIDI keyboards, and audio input devices. Specifically, Thalmann describes use of virtual reality devices to create three-dimensional animation scenes. 1. A method for controlling production of a virtual environment by a virtual reality computer system, including the steps of: For a long time, we could observe virtual worlds only through the window of the workstation's screen with a very limited interaction possibility. Today, new technologies may immerse us in these computergenerated worlds or at least communicate with them using specific devices. In particular, with the existence of graphics workstations able to display complex scenes containing several thousands of polygons at interactive speed, and with the advent of such new interactive devices as the SpaceBall, EyePhone, and DataGlove, it is possible to create applications based on a full 3-D interaction metaphor in which the specifications of deformations or motion are given in real-time. This new concepts drastically change the way of designing animation sequences. In this paper, we call VR-based animation techniques all techniques based on this new way of specifying animation. We also call VR devices all interactive devices allowing to communicate with virtual worlds. They include classic devices like head-mounted display systems, DataGloves as well as all 3D mice or SpaceBalls. We also consider as VR devices MIDI keyboards, force-feedback devices and multimedia capabilities like realtime video input devices and even audio input devices. In the next Section, we present a summary of these various VR devices. More details may be found in (Balaguer and Mangili 1991; Brooks 1986; Fisher et al. 1986). Ex. 1006, Thalmann at 1-2; see generally id. at 2-4, 5 38

(a) processing music signals to generate control signals having music and/or control information; and (discussing VR devices). Our Lab consists in a network of Silicon Graphics IRIS, a NeXT computer and VR devices including EyePhone, DataGloves, SpaceBalls, StereoView, a Polhemus digitizer, video input/output and a MIDI and audio equipment. Id. at 15. Thalmann discloses use of real-time audio input such as sounds and speech for interactive facial animation. 2.6 Real-time audio input Audio input may be also considered as a way of interactively controlling animation. However, it generally implies a real-time speech recognition and natural language processing. Id. at 4. During the creating process, the animator should enter a lot of data into the computer. The input data may be of various nature: geometric: 3D positions, 3D orientations, trajectories, shapes, deformations kinematics: velocities, accelerations, gestures dynamics: forces and torques in physics-based animation lights and colors sounds commands The following table shows VR-devices with corresponding input data: Ex. 1006, Thalmann at 5. We call a real-time recognition-based metaphor a method consisting of recording input data from a VR device in real-time. The input data are analyzed. Based on the meaning of the input data, a corresponding directive is executed. For example, when the animator opens the fingers 3 centimeters, the synthetic actor's face on the screen opens his mouth 3 centimeters. The system has recognized the gesture and interpreted the 39