MPEG decoder Case K.A. Vissers UC Berkeley Chamleon Systems Inc. and Pieter van der Wolf Philips Research Eindhoven, The Netherlands 1
Outline Introduction Consumer Electronics Kahn Process Networks Revisited MPEG application Mapping Design space exploration Conclusions 2
Consumer Electronics 3
Consumer Electronics Cost (< $100 for electronics of TV) Low power consumption (no fan / mobile) Large series (> 0.5 million pieces) Robustness (no hazards, no reboots) Real-time constraints (no loss of data, guaranteed response) Both control and signal processing Increasing requirements for computation & communication (more functions, higher resolutions) Wide range of information appliances Large portfolio of semiconductor products 4
Consumer Electronics Trend towards multi-functional multi-standard products web browsing e-mail high-speed modem (xdsl) teletext EPG graphics MPEG-2 picture rate conversion (50-100 Hz) video scaling picture enhancement etc. 5
Video Processing Requirements for computation & communication Standard resolution TV: PAL: 720 x 576 @ 25 frames/sec NTSC: 720 x 480 @ 30 frames/sec 10.4 million samples/sec luminance (y) and chrominance (u and v, subsampled) typically 8-10 bits for luminance, 8 bits for chrominance Say in TV system 100-1000 operations / pixel: 1-10 billion operations / sec 1-10 Gbyte / sec internal bandwidth High definition TV (1920 X 1080 @ 30): factor 6 more 6
Philips Nexperia TM MIPS TM SDRAM TriMedia TM General Purpose RISC Processor 50 to 300+ MHz 32-bit or 64-bit Library of Device Blocks Image coprocessors DSPs UART 1394 USB D$ I$ MIPS CPU PRxxxx DEVICE I/P BLOCK DEVICE I/P BLOCK.. DEVICE I/P BLOCK. PI BUS MMI DVP MEMORY BUS PI BUS TriMedia CPU D$ TM-xxxx I$ DEVICE I/P BLOCK DEVICE I/P. BLOCK.. DEVICE I/P BLOCK VLIW Media Processor: 100 to 300+ MHz 32-bit or 64-bit Nexperia System Busses PI bus Memory bus 32-128 bit and more DVP System Silicon Flexible architecture for digital video applications 7
Nexperia TM Scalability RISC SDRAM MMI RISC SDRAM MMI VLIW SDRAM MMI VLIW MIPS CPU + Device blocks + Software MIPS CPU + Trimedia CPU replacing some Device blocks TriMedia CPU + Device blocks when control functions are minimal Single architecture, multiple product configurations Processor core options - TM32, TM64, MIPS32, MIPS64... Device block options Highly programmable to weakly programmable 8
Design Problem SDRAM video-in video-out timers Serial I/O PCI bridge MPEG VLIW cpu I$ D$ audio-out audio-in 9
MPEG2 decoding Audio and Video, Audio completely done in software on VLIW Video: describe in Kahn Process Networks Test for compliance with the standards: ranging from H263 to HDTV! Map on software + dedicated hardware 10
MPEG2 decoding status IP content in Philips Semiconductors: NX8500 family: single chip HDTV decoder (audio + Video): dedicated MPEG2 video decoder specialized Video Output many other features 11
Methodology In Research: Describe the full functionality of the MPEG2 video decoding. Mapping and performance analysis Compare with the actual Semiconductors results in ICs 12
Steps Download the Berkeley MPEG decoder Remove errors Remove global variables Split in separate processes natural partitioning insert read, write and execute statements profile and run 13
Application Modeling: Yapi A C Process FIFO B Write Read Execute Y-api: An C++ api for Kahn Process Networks: Expose parallelism and communication 14
Workload analysis 34815 A Operation Count A-up 56952 A-down 1834 58786 58786 FIFO B Operation Count B-filter 58786 C Operation Count C-join 34815 C-pass 9834 C-skip 7274 Computation and Communication workload 15
MPEG decoding Take a stream of encoded bits Separate the headers and parse for parameters Variable length decoding On blocks of 16x 16 pixels do: Inverse scan quantization Inverse Discrete Cosine Transform Add to the motion compensated block Reorder images IPBB -> IBBP 16
Motion-compensation n-1 D/2 n-1/2 -D/2 picture number n 17
Mpeg2 video decoder Model of MPEG2 video decoder in Yapi 18
Design Problem SDRAM video-in video-out timers Serial I/O PCI bridge MPEG VLIW cpu I$ D$ audio-out audio-in 19
Mpeg2 mapping Software on the VLIW Memory Memory Video Out Video In Memory Dedicated Mpeg coprocessor Dedicated Mpeg coprocessor: pipe with fifo buffers and blocking semantics 20
Mpeg2 mapping for SD! Software on the VLIW Memory Memory Video Out Video In Software on the VLIW Memory Dedicated Mpeg coprocessor: pipe with fifo buffers and blocking semantics 21
Mapping and performance analysis Process A Read C Application model (YAPI) Architecture model X FIFO B W(2) E(B-filter) R(1) Abstract processor model Write Y Abstract bus model W(3) E(C-join) R(2) R(1) Mem Execute 22
MPEG2 mapping Data dependant: different video streams: Computation on the VLIW and coprocessors Communication on the bus and internally in the Mpeg2 decoder Exploration Questions: Bus load, burst size of the communication, latency of the memory interface 23
Exploration results: bus waits 20000 18000 SU stalls for bus "int_su_o2.dat" 16000 14000 12000 Storage Unit 10000 8000 6000 4000 2000 0 0 10 20 30 40 50 60 70 cycle 24
Exploration results: bus waits 2500 VInput stalls for bus "int_vin_o1.dat" 2000 1500 Video In 1000 500 0 0 10 20 30 40 50 60 70 cycle 25
Design Space Exploration 26
Lessons learned Models of the concurrency matter Quantifying design choices is key Match of Model of Computation and Model of Architecture Tools required: Mapping, Simulation and Design Space Exploration support Raise the level of abstraction for exploration 27
References Pieter van der Wolf, Paul Lieverse, Mudit Goel, David La Hei, and Kees Vissers, "An MPEG- 2 Decoder Case Study as a Driver for a System Level Design Methodology" In: Proc. 7th Int. Workshop on Hardware/Software Codesign (CODES'99), Rome, Italy, May 3-5 1999. E.A. de Kock, G. Essink, W.J.M. Smits, P. van der Wolf, J.-Y Brunel, W.M. Kruijtzer, P. Lieverse, K.A. Vissers. YAPI: Application Modeling for Signal Processing Systems In: Proc. 37th DAC, Los Angeles, June 2000. Http://ptolemy.eecs.berkeley.edu/~vissers 28