# Development of Low Power ISDB-T One-Segment Decoder by Mobile Multi-Media Engine SoC (S1G)

K. Mori, M. Suzuki<sup>\*,</sup> Y. Ohara, S. Matsuo and A. Asano<sup>\*</sup>

Toshiba Corporation Semiconductor Company, 580-1 Horikawa-Cho, Saiwai-ku, Kawasaki 212-8520, Japan \* Toshiba Corporation Digital Media Network Company, 2-9 Suehiro-Cho, Ome, Tokyo 198-8710, Japan

Abstract - TOSHIBA has developed mobile multi-media engine SoC, we call as S1G, which can realize low power ISDB-T one-segment decode in 42mW for eight months short period of time. Since MPEG2 TS (transport stream) de-multiplexing, AAC decoding and H.264 decoding should be simultaneously processed in ISDB-T one-segment decode, two TOSHIBA MeP (Media embedded Processor) processors and one DSP and hardware blocks are used effectively with pipeline operation in this LSI. Although it is generally considered that dedicated hardware accelerator should be used to realize low power operation for ISDB-T one-segment decode, TOSHBA succeeded in developing low power ISDB-T one-segment decoder using maximum software resources.

# I Introduction

Recently, ISDB-T one-segment receiving device is required from mobile market. It is expected that the device supports not only ISDB-T one-segment receive function, but also long view time operation with low power consumption in mobile market. This LSI (S1G) can process video (H.264) and audio (AAC+SBR) decode for ISDB-T one-segment receiving, and also can process H.264 encode, MPEG4 codec, JPEG codec, 2D/3D graphic operations. S1G includes the LCD controller to connect with two external LCD panels and camera interface to connect with two external cameras. That is to say, S1G is the one chip multi-media engine SoC to fit into mobile market.

Especially, S1G realizes 42mW low power consumption at actual evaluation system for ISDB-T one-segment application. S1G succeeded in having flexibility with low power consumption by using processors effectively. The flexibility satisfies many kinds of application requirements from mobile market, which is difficult by ordinary hardware solution in the past.

#### II. System Level Low Power Approach

Generally, dedicated hardware accelerators have advantage for low power consumption. As various multi-media applications are needed in mobile market and new complicated applications are increasing, existing dedicated hardware accelerators are difficult to meet the requirements timely to the market demand. In addition, if many hardware logics are implemented in an LSI to support many kinds of application, the power consumption will increase. S1G provides a solution as follows. S1G hardware block diagram is shown in Fig.1. Peripherals to interface with external devices are shown in upper part. Major functional blocks which include processors are shown in middle part. 20Mbits embedded DRAM is shown in lower

part. S1G works at 162MHz frequency to achieve multi-media applications.



As shown in middle part of Fig.1, two TOSHIBA MeP RISC processors and one DSP are implemented. These processors are used not only for ISDB-T one-segment application, but also for almost all the applications of S1G. MPG-MeP and TOSHIBA generic DSP, which we call as @DSP are main processors for video and audio application. GMM-MeP is mainly used for 2D/3D graphic application, and can control function of S1G system.



Fig.2 Power comparison between Toshiba SoCs

Fig.2 shows comparison of power consumption under MPEG4 decode application. Product A and B are TOSHIBA multi-media engine in the market. Product A is designed by 130nm technology and core power supply voltage is 1.5V. Product B is designed by 90nm technology and core power supply voltage is 1.2V same as S1G. While S1G realized MPEG4 decode by using two MeP processors and @DSP working at 162MHz, Product A and B realize the same decode by using multiple dedicated hardware accelerators working at less than 162MHz. In anther words, S1G needs higher frequency than Product A and B, but the power consumption is lower than Product A and B. It means that software solution provides low power consumption in some case compared with hardware solution. In S1G, Low power H.264 and audio decode processing in ISDB-T one-segment application is also performed by MeP processors and @DSP without dedicated hardware logics as well as MPEG4 decode application.

From Product A to B, improvement of power consumption is achieved by advancement of technology. As further improvement from Product B to S1G, system architecture is reviewed and software solution is adopted.

In addition, S1G has been developed for only eight months short period of time from specification definition to first engineering sample. Because there was no need to design new H.264 decoder logic except for transport stream interface since ISDB-T was accomplished by software solution. Then, this provides a time to optimize power consumption for other blocks.

#### III. H/W Architecture

Fig.3 shows floor plan of S1G and Table 1 shows chip specification.



Fig.3 S1G floor plan figure

| Table I bio emp specification | Table 1 | S1G | chip | specification |
|-------------------------------|---------|-----|------|---------------|
|-------------------------------|---------|-----|------|---------------|

|                     | 90nm CMOS, 5-metal layers  |  |  |
|---------------------|----------------------------|--|--|
| тесппоюду           | Multi-Vth, embedded DRAM   |  |  |
|                     | 289pin, 11mm x11mm, TFBGA  |  |  |
| Logic Gate          | 2.7MGate                   |  |  |
| Memory              | SRAM 1.3Mbit, eDRAM 20Mbit |  |  |
| Operation Frequency | 162MHz                     |  |  |

Total 20Mbits embedded DRAM makes high speed operation and low power consumption to be possible since program code and data can be stored in embedded DRAM without external memory. Each functional block has individual switch to enable or disable the clock by software. Therefore, clock of the functional block is enabled only when needed in the application, to minimize power consumption. In order for parallel processing and pipeline processing effectively, two MeP processors and one @DSP are implemented in S1G. Between two MeP processors, communication registers are implemented for the purpose of dedicated process communications. This makes also effective parallel processing. When Sleep or Halt instruction is issued, the core clock of MeP is stopped automatically and it contributes additional low power consumption in idling period.



Fig.4 Relation between operation MIPS and Power

Relation between operation MIPS and core power is shown in Fig.4. It is found that core power is proportional to MPG application MIPS in S1G system. Also core power at 0 MIPS point is 16mW and it shows power of MeP sleep state. Fig.4 shows that application which takes 130MIPS (Application X+Y) consume 46mW power. But even in sleep state, S1G consume 16mW offset power. Then actual operation power is correspond to only 30mW for 130MIPS application. Another aspect is that 16mW of sleep power is corresponding to about 35% of total core power of 130MIPS application. It shows power consumption of clock line can not be ignored at recent technology.

Furthermore, power in IDLE state of Product A is bigger than power of 130MIPS application in S1G. This shows offset power of hardware accelerator is too big to make use of the merit of hardware.



Fig.5 shows the relation between the power consumption of each function module in the state of IDLE and the gate counts of the module. The power is measured after reset releasing the module but clock is provided. It means that each dot is corresponding mainly to the power consumption of the sum of clock line and non-clock gating F/Fs for each module. Also Fig.5 shows that the power consumption is increasing as module gate counts increase. GMM and MPG MeP have additional clock gating mechanism at module entrance of clock paths in the state of sleep or halt. Therefore, the IDLE state power of MPG and GMM are lower than other functional modules in proportion to gate counts. Fig.4 shows that 100K gate module consumes about 5mW at IDLE state.



Fig.6 Image of LP cell replacement procedure

This means if the target application needs many hardware blocks, the power of IDLE state increases in accordance with module gate counts. This big IDLE power reduces the advantage of low operation power of hardware logic.

Since S1G MeP modules have additional clock gating mechanism at IDLE state and are keeping low power consumption in IDLE state, software solution makes better performance compared with hardware solution in some applications.

It is not only that S1G has low power function for dynamic power as described above, but also suppressing static power consumption is designed. For these, traditional multi-Vth approach is applied. In S1G, LP (low power) and HS (high speed) cells are used. HS cell have lower threshold voltage (Vth) than LP cell. HS cells are only used in timing critical paths in S1G to save static power.

By using procedure described in Fig. 6, HS type cell was replaced to LP type for non-critical paths. Finally, 75% of LP cells are occupied in S1G design.

Table2 Logic Static Leak Power

| IP                |             | 25°                | -              | 85°C               |                |  |
|-------------------|-------------|--------------------|----------------|--------------------|----------------|--|
| LP<br>Replacement | LP ratio[%] | Leak Power<br>[mW] | ⊿ Leak<br>[mW] | Leak Power<br>[mW] | ⊿ Leak<br>[mW] |  |
| Before            | 51.0        | 1.99               | 0.59           | 13.44              | 4.79           |  |
| After             | 75.3        | 1.39               | 0.09           | 8.65               | 4.73           |  |

Table 2 shows that improvement of static leak power at room temperature is 0.59mW. This is very small fraction for ISDB-T one-segment application power (42mW). But 0.59mW of improvement occupies about 4.5% for AAC audio decode application (13mW). Furthermore this replacement makes 4.8mW improvement of static power consumption at 85 degree temperature. This static power improvement becomes significant amount for the power consumption at high temperature.

#### IV. S/W Architecture and operation flow



ISDB-T one-segment decoder is implemented by software using MePs and DSP on S1G. Fig.7 shows function block diagram of one-segment decoder. It consists of de-multiplexer (MPEG-2 Transport Stream), Audio (AAC) decoder, Video (H.264) decoder, Compositor and Host IF etc. Each block has thread architecture and can be performed in parallel by time sharing. De-multiplexer extracts audio stream and video stream from TS stream and also performs PID filtering, Section filtering and System time clock controlling tasks. Audio thread decodes audio (AAC) stream from de-multiplexer and makes audio data. Video thread decodes video (H.264) stream from de-multiplexer and makes decoded pictures. Display images are generated in Compositor by magnification, reducing and rotation using Scaler hardware block.

TS Demux and Host IF thread functions are performed in GMM-MeP, and H.264 Decoder, AAC Decoder and Composition functions are performed in MPG-MeP with @DSP. In video and audio decoder, bit stream processing, error resilient processing and decoder control are performed in MPG-MeP. Signal processing such as motion compensation (MC), inverse transform (DCT) and inverse quantization (IQ) are performed in @DSP. Scaler hardware block is used for magnify and reduce images in Compositor thread. Functions are assigned effectively to appropriate processors based on characteristic of the processes. Then, ISDB-T one-segment decoder is accomplished by low frequency process using multiple processors.



Fig.8 shows decoding process of ISDB-T one-segment. De-multiplexing is performed in one MeP and simultaneously audio decoder and video decoder are performed in another MeP and one DSP. MeP and DSP are used efficiently without increasing hardware resources.

Fig.9 shows the structure of H.264 decoder. Signal processing such as inverse quantization, inverse transform and deblocking filter are performed in DSP, and bitstream processing such as CAVLC(Context-based Adaptive Variable Length Coding) are performed in MeP. Shared local memory can be accessed by both MPG-MeP and @DSP. Then signal process and bitstream process can be performed in macro-block unit by pipeline operation.



Furthermore, H.264 decoder is optimized by using UCI (MeP user custom instructions) for speed-up.

In result of the performance analysis of the H.264/AVC decoder implemented on MeP and DSP, the average of the cycles in MeP is less than that of cycles in DSP. Therefore, the average performance depends on the DSP performance. However, the bitstream processing in MeP is proportional to coded bits and when many coded bits are input, especially at I-Picuture, the cycles of bitstream processing are several times as many as the average cycles in bitstream processing. Therefore, the maximum cycles of decoding process depends on MeP performance in bitstream processing rather than DSP performance.



Fig.10 User Custom Instructions

To reduce cycles of bitstream processing, User Custom Instructions are implemented on S1G. UCI is an instruction that can be freely defined by users. It completes in one cycle as shown in Fig.10.

Fig.11 shows the analysis of syntax parsing process. It shows that CAVLC process takes 60% in IDR-Picture. Furthermore, as shown in Fig.12, coeff\_level and run\_before take 30% each in CAVLC. Then ten user custom instructions which accelerate above calculations are implemented. In video decoder processing, about ten percent (10%) decode processing MIPS are reduced by UCI instructions.



Fig.11 Cycles in syntax parsing process



V. Summary and Conclusions

|             | Video Condition |                   |               | Audio Condition |                   |                       | Core        |               |
|-------------|-----------------|-------------------|---------------|-----------------|-------------------|-----------------------|-------------|---------------|
| Application | Mode            | Size<br>[pxl*pxl] | Frame<br>Rate | Bit<br>Rate     | Mode              | Sampling<br>Frequency | Bit<br>Rate | Power<br>[mW] |
|             |                 | [by: by]          | [fps]         | [Kbps]          |                   | [KHz]                 | [Kbps]      | []            |
| ISDB-T      | H.264           | 320x180           | 15            | 214             | AAC+SBR<br>Stereo | 48                    | 32          | 42            |
| H.264       | H.264           | 320x240           | 15            | 384             | AAC               | 44.1                  | 32          | 49            |
| Decode      | п.204           | 3208240           | 15            | 304             | Stereo            | 44.1                  | 32          | 49            |
| MPEG4       | MDECA           | 320x240           | 15            | 384             | AMR-NB            | 8                     | 12.2        | 63            |
| Encode      | WPEG4           | 3208240           | 15            | 384             |                   | 0                     | 12.2        | 03            |
| MPEG4       | MDECA           | 320x240           | 45            | 384             |                   |                       |             | 45            |
| Decode      | MPEG4           | 320X240           | 15            | 384             | -                 | -                     | -           | 45            |
| AAC         |                 |                   |               |                 | AAC               | 48                    | 128         | 13            |
| Decode      | -               | -                 | -             | -               | Stereo            | 40                    | 120         | 15            |

Table 3 Power Consumption in Each Application

Table 3 shows the power consumption of typical applications where S1G provides in evaluation environment. As a conclusion, by building appropriate architecture for both hardware and software with existing low power techniques, S1G achieves low power consumption not only for ISDB-T one-segment, but also other multi-media applications such like, camera coder, audio player, etc.

## Acknowledgements

The Authors would like to thank Michito Nakanishi, Shinichi Mizuguchi, Shoko Kato, Yukihiko Shibata, Masahiro Okada and Hisae Kita for their continuing support.

## References

[1] http://www.mepcore.com

[2] Miyamori, T., "A Configurable and Extensible Media Processor," Embedded Processor Forum, Apr. 2002.

[3] Toshiyuki Furusawa et al, "A DSP Engine for an Extensible Media Embedded Processor," 2004 IEEE Asia-Pacific Conference on Advanced System Integrated Circuits, AP-ASIC 2004, pp160-163.

[4] ISO/IEC 13818-1: Information technology – Generic coding of moving pictures and associated audio information - Part1: Systems

[5] ISO/IEC 13818-3: Information technology – Generic coding of moving pictures and associated audio information - Part3: Audio

[6] ITU-T Rec.H.264 | ISO/IEC 11496-10 AVC