Chip-set for video display of multimedia information

Citation for published version (APA):

DOI:
10.1109/30.793577

Document status and date:
Published: 01/01/1999

Document Version:
Publisher's PDF, also known as Version of Record (includes final page, issue and volume numbers)

Please check the document version of this publication:
• A submitted manuscript is the version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website.
• The final author version and the galley proof are versions of the publication after peer review.
• The final published version features the final layout of the paper including the volume, issue and page numbers.

Link to publication

General rights
Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

• Users may download and print one copy of any publication from the public portal for the purpose of private study or research.
• You may not further distribute the material or use it for any profit-making activity or commercial gain
• You may freely distribute the URL identifying the publication in the public portal.

If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license above, please follow below link for the End User Agreement:
www.tue.nl/taverne

Take down policy
If you believe that this document breaches copyright please contact us at:
openaccess@tue.nl
providing details and we will investigate your claim.

Download date: 16. Nov. 2023
CHIP-SET FOR VIDEO DISPLAY OF MULTIMEDIA INFORMATION

Egbert G. T. Jaspers and Peter H. N. de With
1Philips Research Labs., Prof. Holstlaan 4, 5656 AA Eindhoven, The Netherlands
2University of Mannheim, Fac. Computer Engineering, 68131 Mannheim, Germany

Abstract — In this paper, we present a chip-set for digital video processing in a consumer television receiver or set-top box. Key aspects of the chip-set are a high flexibility and programmability of multi-window features with multiple Teletext (TXT) pages, Internet pages and video processing up to three live windows. The chip-set contains a microcontroller with peripherals featuring a.o. pixel-based graphics (GFX) and telecommunication interfaces. The second chip is a video processor containing a number of flexible coprocessors for horizontal and vertical scaling, sharpness enhancement, adaptive temporal noise reduction, blending of graphics, mixing of multiple video streams, and 100 Hz up-conversion.

Keywords — Video processing architecture, multi-window TV, HW/SW co-design, programmable hardware, dynamic data-flow.

I. INTRODUCTION

Developments in high-quality television (TV) applications clearly envision the simultaneous consumption of multiple information channels, as opposed to the watching of a conventional single-channel broadcast. The increased demand for parallel processing and the large range of display formats is additionally fuelled by the strongly growing use of information systems like the Internet access, Teletext (TXT), and electronic help manuals. Similar to TXT on a conventional TV set and help wizards on a computer screen, the user wants to watch such information simultaneously in addition to real-time video channels. The presented ICs support these requirements with flexible control of the picture quality. Furthermore, the ICs allow the applicant to extend the functionality in hardware, by using additional coprocessors, and/or in software by installing new functions.

Another trend, which is relevant for TV processing architectures, is the gradual shift from dedicated hardware applications towards software-based (or controlled) functions. This is amongst others enabled by the increased computing power of programmable general-purpose hardware. It can also be noticed that new digital video communication standards, such as MPEG-based coding, exploit software control for improving the flexibility of hardware-based processing.

A third aspect is the continuous growth of applications in the TV domain. Upcoming features are more channels, introduction of digital TV broadcasting and reception, electronic program guides, pixel-based graphics (GFX), signal processing for improved quality, and so on. A new chip-set should accommodate to these new application domains. With such a high number of functions, it becomes attractive to pursue re-use of hardware functions and the corresponding memory, where it is appropriate to reduce system costs.

The aforementioned discussion can be translated into a list of system requirements, which are summarized below.

• Reprogrammability of existing hardware for different functionality.
• Sharing of (overall) memory capacity.
• Increased use of SW-based control or implementations.
• Open for accommodating new functions.
• Low system costs for TV market.

This paper is organized as follows. Section II briefly outlines the architecture of the new chip-set. In the
TABLE 1
CHARACTERISTICS OF THE CHIP SET.

<table>
<thead>
<tr>
<th></th>
<th>Microcontroller</th>
<th>Coprocessor Array</th>
</tr>
</thead>
<tbody>
<tr>
<td>Process</td>
<td>CMOS 0.35 μm</td>
<td>CMOS 0.35 pm</td>
</tr>
<tr>
<td>Die size</td>
<td>80 mm²</td>
<td>165 mm²</td>
</tr>
<tr>
<td>Data clock</td>
<td>48 MHz</td>
<td>64 MHz (96 to memory)</td>
</tr>
<tr>
<td>Package</td>
<td>SBGA352</td>
<td>SBGA352</td>
</tr>
<tr>
<td>Dissipation</td>
<td>1.5 W (96)</td>
<td>5 W</td>
</tr>
<tr>
<td>Transistor count</td>
<td>2 x 10⁶</td>
<td>6 x 10⁶</td>
</tr>
<tr>
<td>Interfaces</td>
<td>JTAG, 2 x UART, 2 x PC, SDRAM</td>
<td>JTAG, 2 x PC, SDRAM</td>
</tr>
<tr>
<td></td>
<td>Serial Interconnect Bus (SIB)</td>
<td>3 x Video IN: YUV 4:2:2 (up to 60 MHz)</td>
</tr>
<tr>
<td></td>
<td>remote control</td>
<td>2 x Video OUT: YUV 4:2:2/4:4:4 or RGB</td>
</tr>
<tr>
<td></td>
<td>software AD converter</td>
<td></td>
</tr>
<tr>
<td></td>
<td>general I/O</td>
<td></td>
</tr>
<tr>
<td></td>
<td>Graphics OUT: RGBA/FB, H+V sync</td>
<td></td>
</tr>
</tbody>
</table>

following section, an overview of possible communication and integrated video functions enabled by the chip-set are presented. Section IV deals with an example application discussing the large flexibility of the chip-set and the variety of functions which can be implemented. Section V gives an overview of the application domains (communication, video, etc.) and the individual functional possibilities in those domains. In Section VI the conclusions are presented.

II. ARCHITECTURE

In [3], the necessity of a modular parallel processing architecture for TV applications is motivated and a preliminary application is discussed. In-depth details of the architecture are presented in a corresponding paper [8]. In this paper, we will focus on the functional possibilities.

Figure 1 shows the hardware architecture of the programmable video processor chip-set. The system consists of two chips: a microcontroller which has been presented in [7] and a coprocessor array. The former is for Teletext (TXT), graphics generation, system and chip control and various communication features. The latter executes the necessary video tasks in software and in weakly programmable hardware. Both ICs are autonomous units which communicate with each other via a central bus. Internally, this bus operates according to the Peripheral Interconnect (PI) bus protocol [6], whereas the external part of the bus uses a synchronous DRAM (SDRAM) memory protocol. This enables stand-alone operation of both ICs, while for the case that both ICs are interconnected, they make use of one single unified memory.

The novelty of the system for TV applications is the high flexibility in signal processing. The signal flow-graph, i.e. the order of signal processing functions through all coprocessors, is programmable by means of a flexible communication network [5]. The processing of data by several coprocessors is achieved without the need to access the bandwidth-limited memory for communication purposes. The coprocessors, which are interconnected via a communication network, are synchronized upon data-packet reception and scheduled by means of data-driven processing of the pixels [4]. This implies that each task in a data flow-graph (resembling the order of functions) is executed when data is present at the input, and if the subsequent receiving coprocessor(s) can absorb the output data. This self-scheduling mechanism provides autonomous processing without interference of a microcontroller or a hardware scheduler. More details can be found in [8].

Let us now discuss how the aforementioned architectural properties are exploited for flexibility and new features. Two examples are illustrated briefly. An overview is provided in the next section.

- The two chips are sharing the background memory. This enables the possibility to assign the available memory space to applications in a dynamical way. For example, if extra GFX data are generated for user control interface, the quality of one of the video applications, such as vertical scaling, can be scaled down temporarily to free resources.
- The order in which video functions are carried out (the flow-graph) is programmable. Furthermore, most coprocessors can be accessed more than once. This means that if functions are addressed in another order, the quality of the resulting image can be optimized differently, depending on the application. For example, for two video windows, the noise reduction can be carried out for the window where it is needed mostly.

III. OVERVIEW OF HARDWARE FUNCTIONS

In this section we briefly describe the hardware blocks integrated in the microcontroller and the video processor chip.

The microcontroller contains a 32-bit R3000 reduced instruction set computer (RISC) core for control of the video coprocessor array, and TV-set control. Moreover, blocks are integrated for graphics generation, Teletext
decoding, and modem functionality for Internet connection. The peripherals in the microcontroller, as depicted in Figure 1, are an interrupt controller and a number of standard communication modules (UARTs, Infrared support for remote control, JTAG for testing, etc.). The controller chip also contains an 

I/P block for generic control of neighbouring chips. Important is the graphics output processor, which enables pixel-based graphics with a resolution exceeding conventional TXT images. Finally, an SDRAM memory controller supports the connection of a large external SDRAM memory for executing all software tasks and exchanging data with the coprocessor array. Table II gives an overview of the various hardware functions.

The video coprocessor array (see Figure 2) performs all video signal-processing functions of the chip set (see Table III. It contains a set of coprocessors for TV functions which are typical for a high-end TV set or set-top box. For video image resizing, it contains a horizontal scaler and a vertical scaler with de-interlacing capability. The latter prevents aliasing artefacts and maintains optimal resolution when using interlaced video signals. The input signal quality may be improved using the integrated adaptive temporal noise reduction, which analyzes both the video signal noise and the motion. The sharpness may be augmented with the contrast-adaptive local sharpness enhancement [2]. There is also support for mixing several video streams. Firstly, a graphics blender features "alpha" blending of graphics and full-motion video. Secondly, a mixer can combine up to three moving video streams. Finally, a color-space converter maps YUV to RGB signals. Most coprocessors are implemented only once, but have capabilities for parallel processing of multiple tasks. In this flexible approach, each task is independently programmable, e.g. the scalers can compress or expand an image to any arbitrary size with arbitrary filter coefficients.

IV. Applications

Because functionalities of TV and PC are merging increasingly, it is likely that the user-interfaces of these devices are going to show similarities. For the TV, this implies a multi-window environment with multitasking capabilities. The flexibility of the described system enables this feature and provides a broad range of applications which are new in a consumer TV environment. From the previous sections, it can be derived that programming of an application is straightforward and is limited to programming of the signal flow-graph and the individual setting of the tasks. In the following subsection, the functionality and the settings of the individual coprocessors are described. This description

---

**TABLE I

<table>
<thead>
<tr>
<th>Category</th>
<th>Hardware support</th>
</tr>
</thead>
<tbody>
<tr>
<td>Software execution</td>
<td>R3000 RISC with 1- and D-cache, interrupt control timers, watchdog timer.</td>
</tr>
<tr>
<td>Teletext</td>
<td>Input for TXT front-end IC SAA5284.</td>
</tr>
<tr>
<td>Control communication</td>
<td>2 × high-speed UART (230 kbit/s)</td>
</tr>
<tr>
<td>Data communication</td>
<td>2 × I/P, IR connection, Serial Interconnect Bus (SIB) for UCB100H modem IC.</td>
</tr>
<tr>
<td>Memory connection</td>
<td>SDRAM controller for a.o. 16 Mbit and 64 Mbit memory devices, general-purpose ROM interface.</td>
</tr>
<tr>
<td>Testing, miscellaneous</td>
<td>enhanced JTAG interface, general I/O pins, software ADC pins.</td>
</tr>
</tbody>
</table>

**TABLE II

<table>
<thead>
<tr>
<th>Hardware functions in the coprocessor-array chip.</th>
</tr>
</thead>
<tbody>
<tr>
<td>Category</td>
</tr>
<tr>
<td>---------------------------------------------</td>
</tr>
<tr>
<td>Signal quality</td>
</tr>
<tr>
<td>Scaling</td>
</tr>
<tr>
<td>Graphics, mixing</td>
</tr>
<tr>
<td>Video Interfaces</td>
</tr>
<tr>
<td>Memory connection</td>
</tr>
<tr>
<td>Testing, miscellaneous</td>
</tr>
</tbody>
</table>

---

Fig. 2. Block diagram of the coprocessor array.
together with a few application examples give an impression of the application possibilities.

A. Functionality of the coprocessors

The coprocessor array (CPA), which performs the computationally expensive regular video processing, contains coprocessors with carefully selected functionalities. Some coprocessors contain a cluster of video functions, since these functions are always used together. Flexibility in the processing order of those functions would not make sense. In the following, the functionality of the individual coprocessors is described.

Vertical sampling-rate converter (VS) for expansion and compression of images in vertical direction to any arbitrary size. It is based on a 6-tap 32-phase polyphase filter and has a median filter to perform de-interlacing (optional). The number of applied filter coefficients is programmable as well as the magnitude of the coefficients. The de-interlacing is optional. Also programmable are the scaling factor (-4 .. 64) for compression and expansion and the resolution of the input image. A memory cache for eight full-colour video lines is available and can be divided over the several vertical-scaler tasks which can be executed simultaneously.

Horizontal sampling-rate converter (HS) for expansion and compression of images in horizontal direction to any arbitrary size. This scaler is based on a 6-tap 64-phase polyphase filter and can be switched in a transposed mode. In this mode, only one single Nyquist filter has to be selected to compress the video pictures to any arbitrary size with high picture quality. The scaling factor (-64 .. 64) for horizontal expansion is variable and according to a programmable parabolic curve as a function of the pixel position. This enables the commercially available "super zoom" option for aspect-ratio adaptation. A more detailed description of the used sampling-rate conversion technique can be found in [1].

Advanced dynamic noise reduction (NR) to perform temporal noise reduction, adaptive to the amount of noise and the amount of motion in the picture. A special low-pass filter ensures that noise reduction is performed in the frequency area where the human visual system is most sensitive to the noise, while high-frequency detail is preserved. The strength of the noise reduction and the motion adaptivity is programmable by means of a programmable look-up table.

Adaptive sharpness enhancement (SE) for subjective sharpness improvement of the luminance as well as the chrominance. For the luminance signal, a 2D high-pass filter is used to create an enhancement signal which is added to the original (peaking). A large extent of programmable control logic provides suppression of the enhancement on those places where edges are absent and noise is visible. Suppression also takes place on image locations where enhancement would introduce aliasing artifacts and edges are already sharp, e.g. synthetically generated graphics in the video. More detailed information about the adaptive peaking function can be found in [2]. For the chrominance signal, a non-linear algorithm is implemented that increases the steepness of the edges by means of pixel translations.

Graphics (GFX) with video blending for graphics formats up to 16-bit resolution. The formats are converted to a 24-bit RGB format and the video is converted via a programmable colour-space matrix to a YUV 4:2:2 or 4:4:4 format to a RGB or YUV 4:4:4 format. For higher video resolutions, it is possible to up-convert the GFX by pixel repetition with a factor 1-8 to limit the CPU load for the GFX generation. This preserves a high graphics bandwidth and gives subjectively the best picture quality. The fully digital blending of the GFX into the video is performed with an alpha factor (α), describing the fraction of the GFX in the output signal. For mapping GFX pixels onto the signal components, RGBA, a color look-up table (CLUT) is included. This allows the following GFX formats: CLUT8 (8 bits/pixel), RGBA 4:4:4:4 (α through LUT), RGBA 5:5:5:1 (α through LUT) and RGBA 5:6:5:0. Furthermore, the blender enables color keying for the video and/or the graphics.

Input processor for retrieval of three real-time video sources at resolutions up to HDTV and VGA. The processor supports several signal protocols for synchronization, e.g. with H- and V-pulse, with active video identifiers or according to the CCIR-656 recommendations. The digital input signal may have an 8-bit or 16-bit bus, depending on a programmable time-multiplex mode. In addition, the input processor may select a capture window in the video to select the pixels to be processed by the rest of the system. This could be useful to reduce the bandwidth when only a part of the video is required for further processing, e.g. in the case that a part is zoomed in by the scalers.

Output processor to output one or two video signals with similar features as the input processor. Additionally, the output processor also contains a multiplex mode to display a 24-bit YUV or RGB signal.

A memory input port and output port to store, buffer or shuffle video data at multiple positions in the video-signal flow-graph. It can access a programmable cyclic memory block in a sequential order and has the
TABLE IV
Task resources of the coprocessors.

<table>
<thead>
<tr>
<th>Task</th>
<th>Maximum rates</th>
</tr>
</thead>
<tbody>
<tr>
<td>Horizontal scaler</td>
<td>1x &lt; 64Mpixels/s or 2x &lt; 32Mpixels/s or 2x &lt; 16 + 1x &lt; 32Mpixels/s or 1x &lt; 16 + 1x &lt; 48Mpixels/s</td>
</tr>
<tr>
<td>Vertical scaler</td>
<td>2x &lt; 32Mpixels/s or 1x &lt; 16 + 1x &lt; 48Mpixels/s</td>
</tr>
<tr>
<td>Sharpness enhancement</td>
<td>1x &lt; 32Mpixels/s</td>
</tr>
<tr>
<td>Noise reduction</td>
<td>1x &lt; 16Mpixels/s HQ or 2x &lt; 16Mpixels/s MQ</td>
</tr>
<tr>
<td>Color conversion, GFX expansion, GFX blending</td>
<td>1x &lt; 64Mpixels/s</td>
</tr>
<tr>
<td>Video inputs</td>
<td>2x &lt; 64Mpixels/s</td>
</tr>
<tr>
<td>Video Outputs</td>
<td>2x &lt; 64Mpixels/s</td>
</tr>
<tr>
<td>Memory Inputs</td>
<td>8 inputs + 12 outputs</td>
</tr>
<tr>
<td>Memory Outputs</td>
<td>total &lt; 192Mpixels/s</td>
</tr>
</tbody>
</table>

additional option to skip video lines in order to access a progressive frame in an interlaced manner.

An advanced address generator (juggler) to write video data into the memory at any position with an arbitrary shape, e.g. to create a circular or alternative shaped Picture-in-Picture (PiP).

B. Resource requirements

To create an application with several tasks, the functions as mentioned above, have to be executed in parallel. Because some tasks in the application may even have the same functionality, each function should be able to execute more tasks simultaneously. As a consequence, the data rate in the coprocessors and the data bandwidth to the memory increases for more complex applications and is limited by the physical clock rate of the coprocessors (64 MHz) and the memory. Summarizing, the complexity of a complete application is limited by the available task resources of the individual coprocessors and the memory capacity and bandwidth. The task resources of the available coprocessors are shown in Table IV (gross signal rates). The current top high-end TV-sets show features like Picture-in-Picture (PiP), dual-screen and Internet TV. However, these features are just a subset of the application space which is available with the introduced chip-set.

Let us start with a straightforward example and consider a PiP application. Figure 3 shows the signal flow-graph of this application. For wide-screen standard-definition (SD) signals in the PiP application, the task resources as mentioned above are more than sufficient. To calculate the memory bandwidth, it is assumed that the data rate is \( f_t = 16 \) MHz. The system memory device runs at a clock rate of 96 MHz and has a bus width of 32 bits. For 16-bit pixels, this means a total memory bandwidth of 384 MByte/s. Since all communication between the CPU and the coprocessors is performed via memory, a part of the memory bandwidth is reserved and thus cannot be used for video processing. Assuming 30 MByte/s of memory bandwidth for control and communication between the CPU and the coprocessors, a bandwidth of 354 MByte/s remains for video processing. For the simple PiP application, only memory access for the mixing is necessary, thus the amount of available memory bandwidth is only used for a small part. This mixing or juggling of the video streams is designed such that it requires a minimum amount of memory bandwidth. In the background memory, two field blocks (for interlaced video) are allocated to construct the mixed images. These memory blocks are filled with respectively the odd and even fields of the picture, except for the pixel positions where the PiP window is located. This unfilled area in the memory is used by the second video path to write the PiP window. Therefore, the total amount of data stored is equal to the data of one complete picture and similarly, the total required bandwidth equals the bandwidth for writing one complete video stream. With 2 Byte/pixel the amount of memory becomes 0.98 MByte and the bandwidth 64 MByte/s (reading and writing).

Fig. 3. The signal flow-graph of a PiP application.

Fig. 4. A multi-window application with video and Internet.

Since the two input video sources are generally not synchronous, the output image should be synchronized to one of the input video sources by means of a parameter setting. For the PiP application, the output video...
signal is usually synchronized with the input video signal of the main background picture. As a result, the input video signal of the Pip is written asynchronously into the field memories.

C. Application example I

A more advanced example of an application is a Pip with a video background which contains a zoomed-in part of the Pip. The part of the Pip to be zoomed in can be selected by the consumer by means of a small graphics square that can be moved and resized. In addition, an Internet browser, executed on the CPU, is shown on the display (see Figure 4). For generation of the graphics in the picture, no task resources other than the CPU and a memory output port are necessary. Therefore, the following calculations consider the video signal processing only and assume that the graphics are available in the background memory.

The signal flow-graph of the application is shown in Figure 5 and contains noise reduction (NR), scaling (HS, VS), mixing (juggle), sharpness enhancement (SE) and graphics blending prior to picture display. The lower part of the figure shows that two video streams are processed. One for the zoomed-in background including noise reduction and one to create the Pip. After combining the separate video signals in the memory sharpness enhancement is applied. At the output stage, the video is blended with the graphics that is generated by the CPU.

ref tab:require presents an overview of all memory accesses, including the required memory capacity and the number of inputs and outputs to and from memory. First, the noise-reduction (NR) coprocessor accesses the memory to use a field delay for advanced adaptive temporal Infinite-Impulse-Response (IIR) filtering. For an SD image of 288 lines x 584 pixels with 2 Byte/pixel, the required amount of memory equals 0.49 MByte. For a pixel rate of 16 Mpixels/s, the total memory bandwidth for writing and reading is 64 MByte/s.

TABLE V
MEMORY REQUIREMENTS FOR THE MULTI-WINDOW APPLICATION.

<table>
<thead>
<tr>
<th>Access</th>
<th>Connect. to/from Mem</th>
<th>Memory (MByte)</th>
<th>Memory bandwidth (MByte/s)</th>
</tr>
</thead>
<tbody>
<tr>
<td>NR (expansion)</td>
<td>1/1</td>
<td>0.49</td>
<td>64</td>
</tr>
<tr>
<td>VS (expansion)</td>
<td>1/1</td>
<td>&lt;0.98</td>
<td>&lt;36</td>
</tr>
<tr>
<td>Juggling (write/read, worst case)</td>
<td>2/1</td>
<td>0.98</td>
<td>64</td>
</tr>
<tr>
<td>Graphics</td>
<td>0/1</td>
<td>0.49</td>
<td>64</td>
</tr>
<tr>
<td>Total (worst case)</td>
<td>4/4</td>
<td>2.94</td>
<td>256 (218)</td>
</tr>
</tbody>
</table>

processes the memory to use a field delay for advanced adaptive temporal Infinite-Impulse-Response (IIR) filtering. For an SD image of 288 lines x 584 pixels with 2 Byte/pixel, the required amount of memory equals 0.49 MByte. For a pixel rate of 16 Mpixels/s, the total memory bandwidth for writing and reading is 64 MByte/s. The memory requirements for the access prior to vertical scaling are different. The image is written with 16 Mpixels/s, but is read at 2 x 16/2 Mpixels/s for interfied processing, with Z being the expansion factor. Because Z > 1, the required memory bandwidth is smaller than 96 MByte/s. If intrafield processing is used for vertical scaling, the data rate is even less than 16/Z Mpixels/s. The computation for the amount of buffering is less straightforward. If interfied processing is used, a complete field of Lf lines has to be written in the memory and cannot be overwritten, because it has to be read out two times. Therefore, the required amount of memory for progressive video that is expanded equals:

$$Buf_{inter} = 2 \times \frac{L_f}{Z} B_1,$$

where B1 denotes the number of bytes per video line. For intrafield scaling, buffering is only necessary to compensate for the rate difference between the writing to and reading from the memory. In this case, the time for reading one field is equal to the time for reading one field. Writing of the video lines to be expanded is done at a higher rate than reading. The maximum distance in the memory between the read and write pointer is equal to the memory space that has to be buffered for the rate difference. This maximum distance is reached when the write pointer has just finished the field. Part of the field that has been read at that time is 1/Z. Therefore, the part that is not read yet equals 1 - 1/Z. As a result, the buffering that is required to deal with the rate difference equals:

$$Buf_{intra} = \frac{L_f}{Z} \left(1 - \frac{1}{Z}\right) B_1;$$

Since it is desired to have a static memory allocation, the maximum buffering can be found as follows:

$$\frac{d}{dZ} (Buf_{intra}) = -\frac{L_f}{Z^2} \left(1 - \frac{2}{Z}\right) B_1 = 0 \Rightarrow Z = 2,$$

$$Buf_{intra} = \frac{L_f}{Z} \left(1 - \frac{1}{Z}\right) B_1 = L_f 4 B_1.$$

For Lf = 288 and B1 = 1708 the amount of required buffering is Buf_{intra} = 0.12 MByte.

Finally, the mixing or juggling of the video streams for image composition is performed. As explained in the previous subsection, the amount of data stored is equal to one frame and the required bandwidth equals to the bandwidth for writing one complete video stream.

For generation of the graphics in the background memory, a field or frame memory could be used depending on the desired quality. When a field memory is used and the content is read for both odd and even fields, the amount of memory is reduced at the cost
of some vertical resolution. Since synthetically generated graphics may contain high spatial frequencies, the use of a frame memory may result in annoying line flicker, when the content of the memory is displayed in interlaced mode. Therefore, a field memory is most attractive for 50-60 Hz interlaced video, whereas for field rates higher than 70 Hz and high-resolution GFX, a frame memory could be used.

Summarizing, the total amount of applied video memory is less than 3 MByte and the maximum memory bandwidth is 256 MByte/s. This required bandwidth is only used during the transfer of the active pixels. At the blanking times, no data is transferred, thereby decreasing the average bandwidth significantly. In order to decrease the required peak bandwidth, the data transfer rate can be equalized over time. To do this, the read and write tasks of the CPA have the ability to spread the transfer of an active video line over the time of a complete video line including the horizontal blanking. Typical video signals contain 15% horizontal line blanking time, so that the total amount of bandwidth can be reduced by 15%. For this application, this leads to a net total memory bandwidth of 218 MByte/s.

D. Application example II

Because the system is mainly limited by the throughput bandwidth of the coprocessors and the memory, a large range of resolutions and frame rates can be generated. It may even provide 50-100 Hz conversion, making advanced use of the memory. Figure 6 shows a simple dual-screen application which also provides 50-to-100 Hz conversion. The total signal flow-graph can be divided into several independent subgraphs, separated by memory accesses. Because temporal scaling requires the use of field and/or frame memories, this can only be provided by the intermediate memory accesses. Therefore, it is not possible to perform temporal scaling within a subgraph. Only spatial scaling with relatively small local buffering can be applied. The subgraphs that contain the input processors (IN) should operate at the field rate of the (50-Hz) input signals, due to the data-driven concept. The bottom part of Figure 6 illustrates the position of the vertical video lines as a function of time. After temporal noise reduction (NR) and horizontal scaling (HS), the interlaced fields are written into the memory. In the succeeding subgraphs, the video data are read from memory again and scaled down in the vertical dimension to obtain the correct aspect ratio of the input images. As was mentioned in the previous subsection, the vertical scaler may read the data from the memory in a progressive-scan format to enable high-quality scaling. The vertical scaling is then applied to progressive video and is interlaced again at the output stage of the scaler. For this mode of operation, the interlacing at the output is not used and the video is written into the memory again in a progressive-scan format. In the bottom part of Figure 6, it is shown that all missing lines of the interlaced fields are filled with video lines from the median filter. If further vertical processing of progressive video would be desirable (e.g. 2D sharpness enhancement), it would be obvious to perform it in this subgraph. The right-hand side of the figure contains the subgraph that reads the 50-Hz progressive frames in memory with an interlaced scan to do the field-rate up-conversion and to create the 100-Hz output signal. The figure shows a pair of subsequent fields containing both original video lines or video lines from the median filter. This type of 50-to-100 Hz conversion is commercially available in some TV sets and known as "digital scan". Let us finally consider the memory requirements for this 100-Hz dual-screen application. The necessary bandwidth equals 300 MByte/s and the amount of memory used is 2.94 MByte. These numbers are computed using similar assumptions as in the previous subsection, and include a 100-Hz graphics signal (stored in a field memory) in the final picture.
V. OVERVIEW OF TV APPLICATIONS

The flexibility of the described system enables a broad range of features which are new in a consumer TV environment. Examples are given in Figure 7, which provides an overview of the application possibilities. Some of the features will be discussed briefly here.

The top half of Figure 7 shows the application area of the telecommunication and control processor (TCP). An infrared (IR) device accepts user control commands from the remote control. Furthermore, acquisition and decoding of Teletext information is carried out and data ports such as UARTs are available. New are the integrated modem interface enabling glueless connection to an existing modem IC and the generation of pixel-based graphics (GFX). The latter also supports Internet applications as is indicated. The processor also provides control of the special video hardware, which is shown at the bottom half of Figure 7 and control of external additional hardware. This flexibility is possible, because the video processing does not require a large cycle budget of the CPU. Control of the video processing is discussed later in this section. Let us first focus on the aforementioned features.

In the field of Teletext, On-Screen-Display (OSD) and GFX processing, the TCP has to decode Teletext information and generate the GFX in the memory, without severe real-time constraints. However, for GFX refresh rates higher than 10 Hz, the load on the CPU cycle budget becomes significant. The following features could be provided and are supported by the hardware:
- high-quality pixel-based 2D graphics;
- multiple simultaneous Teletext pages and programmable fonts;
- all-page storage for Teletext;
- integration of GFX with photos or images;
- user-defined GFX environment;
- electronic TV program guide application;
- automatic TV controller with user-dependent setting and support via messages and menus.

For the modem functionality of the TCP, the real-time constrains are much more demanding, since the majority of the modem data conversion is executed as a SW program on the CPU. This may result in less performance of the graphics, depending on the application and the software architecture. The modem extension in the TV system offers a new range of telecommunication features, such as
- fax-message and data-file reception, storage and display;
- Internet connection for a.o. program data retrieval;
- interactive TV communication;
- downloading of photos and images;
- execution of various WinCE applications.

In the bottom half of Figure 7, the application domain of the video coprocessor array (CPA) is shown. The design of the architecture is such that video processing can take place without continuous set control. Control is performed on a periodical basis only (field rate), although control on interrupt basis is also possible. Video
signals are mostly noise reduced at the input stage in the coprocessor array (bottom left). Furthermore, fully programmable video scalers can be used for compressing or zooming of full-motion video signals. This enables virtually any type of scaling function with a large dynamic range, which results in a very flexible multi-window TV. The setting and control may also be defined by the individual user. The signal quality can be optimized over the several windows. The multi-signal processing capability is very important for composing pictures of various size and contents. In all of these modes, the TCP generates the high-level commands for programming and setting of the CPA coprocessor hardware, thereby enabling for example:

- aspect-ratio conversions (panorama, side-panel, wide-screen);
- PiP, dual-screen, multi-window (arbitrary sizes);
- PiP record and PiP playback;
- mosaic screen for visual channel selection;
- flexible matching to various input/output resolutions;
- high-quality sharpness improvement;
- dynamically moving of video, menu’s and graphics.

Finally, as indicated in the Figure 7, graphics and video are blended in full digital form. For this purpose, some of the graphics can be up-converted to a higher resolution, if required. For more details about applications and the quality of the individual coprocessors, the reader is referred to [3].

The most flexible and interesting features are enabled by the configuration that both chips are connected to each other with sufficient SDRAM and the modem function is activated. Whilst looking to a TV program, an image can be retrieved from the Internet and the TV may signal the completion of the recovered image to the consumer. If extra memory is needed temporarily for Internet communication, some memory may be taken from the video processing (e.g. the quality of the 100Hz conversion), in order to boost the microcontroller performance. With existing chip sets for TVs, such a flexibility is unknown to the manufacturer and the consumer.

It is evident that the chip-set can also be used in other consumer products than TV sets, such as the display signal part of a set-top box. Generally, this device features MPEG decoding, electronic programming guide, and interactivity via telephone for retrieval of a descrambling key. Furthermore, the chip-set could be used as a unified display driver which converts standard-definition signals or VGA-resolution video to any arbitrary format required for various display types, e.g. Cathode-Ray-Tubes (CRTs), computer monitors, plasma/Plasma-Addressed-Liquid-Crystal (PALC) displays or LCD displays. It can be concluded that the programmable concept behind the applications and the cost-efficient and modular architecture of these ICs give a high degree of applicability for various systems in the consumer market.

VI. CONCLUSIONS

We have presented a chip-set for high-end TV or set-top box, consisting of a microcontroller with a plurality of extensions and a video coprocessor array. The microcontroller consists of a RISC core with a number of peripherals to provide telecommunication features and high-quality graphics. The video coprocessor array offers a number of high-quality TV functions which are programmable and can be used in pluraliform ways. The individual coprocessors are able to process several video signals simultaneously and contain a self-controlling mechanism which enables programming on
a high system level. Additionally, the order of processing by the coprocessors is programmable, leading to a large variety of video display processing.

The chips may be used in stand-alone operation in combination with existing off-the-shelf external SDRAM. Furthermore, the combination of the two chips results in a highly versatile package of TV applications in which video processing quality and external interactive data communication can be interchanged with great flexibility. User-defined interfaces regarding the use of several video windows combined with the display of side information can be optimized with respect to optimal quality in all circumstances.

The proposed chip-set enhances both quality and programmability of existing TV-sets and it is a key component for approaching the interactive Internet-based multi-window TV or set-top box of the near future.

REFERENCES


Egbert Jaspers was born in Nijmegen, The Netherlands, in 1969. He graduated in electrical engineering from the Venlo Polytechnic College in 1992 and subsequently, he joined Philips Research Laboratories in Eindhoven. For one year, he worked on video compression for digital HDTV recording. In 1993, he continued his education at the Eindhoven University of Technology, from which he graduated in electrical engineering in 1996. In the same year, he joined Philips Research Laboratories Eindhoven, where he became a member of the TV Systems Department. He is currently involved in the research of programmable architectures and their implementation for TV and computer systems.

Peter H.N. de With graduated in electrical engineering from the University of Technology in Eindhoven. In 1992, he received the Ph.D. degree from the University of Technology Delft, The Netherlands, for his work on video bit-rate reduction for recording applications. He joined Philips Research Laboratories Eindhoven in 1984, where he became a member of the Magnetic Recording Systems Department. From 1985 to 1993 he was involved in several European projects on SDTV and HDTV recording. In the early nineties he contributed as a video coding expert to the DV standardization committee. Since 1994 he became a member of the TV System group where he was working on advanced programmable video processing architectures. In 1996 he became senior TV systems architect and in October 1997 he was appointed as full professor at the University of Mannheim, Germany. Regularly, he is a teacher of the Philips Training Centre and for other post-academic courses. In 1995 he co-authored the paper that received the IEEE CBS Transactions Paper award. In 1996, he received a company Invention Award. In 1997, Philips received the ITVA award for its contributions to the DV standard. Mr. de With is a senior member of the IEEE, member of the program committee of the IEEE CES and board member of the Benelux working group for Information and Communication theory.