## WORLD INTELLECTUAL PROPERTY ORGANIZATION International Bureau #### INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) | (51) International Patent Classification <sup>6</sup> : | | (11) International Publication Number: | WO 95/28802 | |---------------------------------------------------------|--------------|----------------------------------------|----------------------------| | H04N 7/00 | A1 | (43) International Publication Date: | 26 October 1995 (26.10.95) | | (21) International Application Numbers | PCT/HS95/019 | 08 (81) Designated States: IP Furonean | natent (AT BE CH DE DK | (30) Priority Data: 08/227,438 (22) International Filing Date: 14 April 1994 (14.04.94) US 16 February 1995 (16.02.95) (71) Applicant: MOTOROLA INC. [US/US]; 1303 East Algonquin Road, Schaumburg, IL 60196 (US). (72) Inventors: ZHU, Qin-Fan; Apartment 6, 33 Bennett Drive, Stoughton, MA 02072 (US). YONG, Mei; 3 Ledgewood Drive, Canton, MA 02021 (US). (74) Agents: STOCKLEY, Darleen, J. et al.; Motorola Inc., Intellectual Property Dept., 1303 East Algonquin Road, Schaumburg, IL 60196 (US). ES, FR, GB, GR, IE, IT, LU, MC, NL, PT, SE). #### **Published** With international search report. (54) Title: MINIMUM-DELAY JITTER SMOOTHING DEVICE AND METHOD FOR PACKET VIDEO COMMUNICATIONS #### (57) Abstract The method (1100) and device (200, 900, 1400) of the present invention provides a mechanism to apply smoothing delay to video packets transmitted through a packet-switched network. Since the method and device using this invention perform a smoothing delay at video frame level, instead of packet level, smoothing delay can be minimized and at the same time the video decoder overflow can still be prevented. The present invention reduces system complexity by eliminating the need to recover the time stamp for each packet. Furthermore, the present invention does not require the prior knowledge of the worst network delay $\Delta$ . Thus, the present invention provides for a minimized jitter smoothing delay with concurrent minimized implementation complexity. #### FOR THE PURPOSES OF INFORMATION ONLY Codes used to identify States party to the PCT on the front pages of pamphlets publishing international applications under the PCT. | ΑT | Austria | GB | United Kingdom | MR | Mauritania | |----|--------------------------|----|------------------------------|----|--------------------------| | ΑU | Australia | GE | Georgia | MW | Malawi | | BB | Barbados | GN | Guinea | NE | Niger | | BE | Belgium | GR | Greece | NL | Netherlands | | BF | Burkina Faso | HU | Hungary | NO | Norway | | BG | Bulgaria | IE | Ireland | NZ | New Zealand | | ВJ | Benin | IT | Italy | PL | Poland | | BR | Brazil | JP | Japan | PT | Portugal | | BY | Belarus | KE | Kenya | RO | Romania | | CA | Canada | KG | Kyrgystan | RU | Russian Federation | | CF | Central African Republic | KP | Democratic People's Republic | SD | Sudan | | CG | Congo | | of Korea | SE | Sweden | | CH | Switzerland | KR | Republic of Korea | SI | Slovenia | | CI | Côte d'Ivoire | KZ | Kazakhstan | SK | Slovakia | | CM | Cameroon | LI | Liechtenstein | SN | Senegal | | CN | China | LK | Sri Lanka | TD | Chad | | CS | Czechoslovakia | LU | Luxembourg | TG | Togo | | CZ | Czech Republic | LV | Latvia | TJ | Tajikistan | | DE | Germany | MC | Monaco | TT | Trinidad and Tobago | | DK | Denmark | MD | Republic of Moldova | UA | Ukraine | | ES | Spain | MG | Madagascar | US | United States of America | | FI | Finland | ML | Mali | UZ | Uzbekistan | | FR | France | MN | Mongolia | VN | Viet Nam | | GA | Gabon | | - | | | 1 # MINIMUM-DELAY JITTER SMOOTHING DEVICE AND METHOD FOR PACKET VIDEO COMMUNICATIONS #### Field of the Invention 5 This invention relates generally to packet video communications, and more particularly to delay jitter smoothing in packet video communications. 10 15 20 35 #### Background An important problem in packet video transmission is the variable delay, or delay jitter. Delay jitter can be caused by variable packetization delay and variable queuing delay in the network. Variable packetization delay results from variable bit-rate (VBR) coding and is usually small compared to the variable queuing delay. On the other hand, variable queuing delay depends on the level of network congestion and can vary in a wide range. Such delay jitter needs to be compensated in order to ensure a constant playout of a video bitstream and/or to ensure that the decoder buffer at the receiving video terminal will not overflow. A state-of-the-art solution for the delay jitter problem is to utilize the so-called time stamp, which is generated at the encoder and is modified by each network node along its route from source to destination. Provided that the maximum end-to-end delay from the source node to the destination node is known and the time stamp can be recovered for each packet, an extra delay, commonly referred to as smoothing delay, is added to each packet such that the overall delay (queuing delay plus smoothing delay) equals a constant for all packets. FIG. 1, numeral 100, is a general block diagram schematic of a delay jitter smoothing system as is known in 2 5 10 the art. The input of the system is video packets with variable delays. These packets are first stored in a packet buffer (102). The time stamp of each packet is recovered by a time recovery device (108) and a smoothing delay unit (104) adds a smoothing delay to the packet. The time recovery device (108) is coupled to the packet buffer (102) and provides its output to the smoothing delay unit (104) which is also coupled to the packet buffer (102). The smoothing delay is equal to the difference between a predetermined maximum delay and the actual delay suffered by each packet. Then the smoothed packets are depacketized into bitstream by a depacketizer unit (106) in a predetermined manner. In the above system, a complex procedure is involved to 15 implement the time recovery system, because it requires intermediate network nodes to update the time stamp based on the delay encountered in network queues. In addition, the maximum delay of the network has to be known beforehand in order to add the correct amount of smoothing delay in the 20 receiving node. On the other hand, the worst delay for different services at different times may vary depending on the network congestion. If the network traffic load is relatively low during the entire service period, then the queuing delay for all packets may be much smaller than the maximum delay. 25 However, the jitter smoother will still add an extra delay to keep the overall delay constant. Conceivably in this case, a shorter overall delay can be obtained by using an improved smoothing algorithms. Thus, there is a need for a device and method that provide smoothing delay while concurrently minimizing implementation complexity. #### Brief Descriptions of the Drawings FIG. 1 is a block diagram schematic of a jitter smoothing system as is known in the art. 5 - FIG. 2 is a block diagram schematic of one embodiment of a device in accordance with the present invention. - FIG. 3 is a block diagram schematic of an embodiment of 10 FIG. 2 with greater particularity. - FIG. 4 is a diagram that illustrates the dynamics of a decoder buffer verifier (DBV) buffer fullness condition where VBR coding is used in the encoder. 15 - FIG. 5 is a diagram that shows the dynamics of the DBV buffer fullness condition for the case of a constant bit-rate (CBR) channel. - FIG. 6 is a block diagram schematic of an H.261 video encoder. - FIG. 7 is a block diagram schematic of an H.261 video decoder. - FIG. 8 is a diagram that shows DBV buffer fullness as a function of time for a decoder buffer verifier using H.261 video coding. - FIG. 9 is a block diagram schematic of one embodiment of a device in accordance with the present invention incorporated into a VBR H.261 video communication system over an asynchronous transfer mode (ATM) network. 4 FIG. 10 graphically illustrates a frame format for use in an H.261 error correcting frame. - FIG. 11 is a flowchart of one embodiment of the steps of a method in accordance with the present invention. - FIG. 12 is a flowchart showing the step for monitoring a fullness of a hypothetical decoder buffer (HDB) by a decoder buffer verifier (DBV) with greater particularity. 10 15 30 35 - FIG. 13 is a flowchart showing, with greater particularity, the step of reading, by a bitstream generator, video information bits from the video information buffer at a predetermined rate and inserting, by the bitstream generator, a number of stuffing bits into the video information bitstream wherein the number of stuffing bits is determined according to a predetermined scheme to provide substantially minimized delay jitter smoothing. - FIG. 14 is a block diagram of another embodiment of a hypothetical buffer-based jitter smoothing device for minimizing smoothing delay in a packet video communication system in accordance with the present invention. ### 25 Detailed Description of a Preferred Embodiment The jitter smoothing method and device of the present invention, the device being referred to herein as a "minimum-delay jitter smoother", add a minimum delay to the received bitstream and at the same time guarantee that the decoder buffer will not overflow. Instead of performing jitter smoothing at the individual cell level, the present invention implements jitter smoothing at the video frame level. A very important feature of video communications is that a video frame can be displayed only after all the information for the frame is received, implying that some delay jitter may be allowed for the packets which contain information that belong to the same video frame. The present invention eliminates the need to recover the time stamp for each packet and hence reduces the system complexity. Furthermore, the present invention does not require prior knowledge of the worst network delay D. Thus, the present invention provides for a minimized jitter smoothing delay with concurrent minimized implementation complexity. 10 15 20 25 30 5 Fig. 2, numeral 200, is a block diagram schematic of one embodiment of a device, a minimum-delay jitter smoother, in accordance with the present invention. In the present invention, the video depacketizer (202) is used to depacketize the received video packets. The depacketized bitstream is stored in a video information buffer (204). Due to the variable delays suffered by different packets, the external decoder buffer may overflow or underflow if no counter-measure is applied during this process. In order to avoid such overflow, a decoder buffer verifier (DBV) (208) is attached to the video information buffer (204) to monitor the fullness of the DBV Thus, the decoder buffer verifier (DBV) (208) provides substantially the same function as the decoder buffer verifier used at the external encoder. At each frame interval, DBV buffer fullness is examined and a predetermined condition is checked. Where the predetermined condition is violated, a number of stuffing bits determined by a predetermined scheme are outputted to the bitstream generator (206) before sending video bits from the video information buffer (204). The predetermined condition for the DBV determines the smallest number of stuffing bits to minimize the incurred delay at the decoder. The stuffing is performed without introducing decoding ambiguity at the external decoder. 6 5 10 When the present invention generates the minimum smoothing delay, decoder buffer underflow may occur because of delay jitter. However, decoder buffer underflow usually does not cause a severe problem since the decoder can always repeat displaying the same video frame while waiting for data of the next frame. Furthermore, decoder buffer underflow typically happens only a few times during a period of continuous transmission of video service. Assuming that a worst network delay is D and the frame period is T, then the maximum number of occurrences of decoder buffer underflow U is the smallest integer which is no less than D/T. For example, where D = 120 ms (milliseconds) and T = 67 ms, then U = 2. 15 The present invention smoothes delay litter of video packets transmitted over packet-switched networks. Thus, the device in accordance with the present invention includes a video depacketizer (202), a video information buffer (204) that is operably coupled to the video depacketizer (202), a 20 decoder buffer verifier (DBV) (206) that is operably coupled to the video information buffer (204), and a bitstream generator (208) that is operably coupled to the video information buffer (204) and to the DBV (206). The video depacketizer (202) is used to receive and depacketize video packets. The video 25 information buffer (204) is used to store the output bits from the video depacketizer (202). The DBV (208) updates a hypothetical decoder buffer (HDB) fullness and verifies the HDB condition based on a pre-determined criterion. The bitstream generator (206) reads bits from the video 30 information buffer (204) at a predetermined rate and inserts stuffing bits into the output bitstream whenever the HDB condition, described further below, is violated or whenever the video information buffer (204) is empty. 7 One embodiment of the minimum-delay jitter smoother device of FIG. 2 is shown with greater particularity in FIG. 3, numeral 300. The device includes: (A) a video depacketizer (302) that is operably coupled to receive and depacketize video 5 packets, (B) a video information buffer (304) that is operably coupled to receive output bits from a video depacketizer (302). (C) a decoder buffer verifier (DBV) (308) that is operably coupled to the video information buffer (304) and that includes a frame bit counter (310), a hypothetical decoder buffer fullness counter (312) and a buffer condition verifier (314), 10 and (D) a bitstream generator (306) that is operably coupled to the video information buffer (304) and to the DBV (308) and that includes a stuffing bits generator (320), a stuffing bits buffer (318), and an output switch (316). The HDB fullness 15 counter (312) is an imaginary buffer in the DBV that monitors the output bits from the video information buffer (304) to prevent overflow at the external decoder buffer. The frame bit counter (310) and the HDB fullness counter (312), which are each operably coupled to the video information buffer (304), 20 are used to count a number of bits for each frame En and a decoder buffer fullness Bn, respectively. The buffer condition verifier (314), which is operably coupled to the frame bit counter (310) and to the HDB fullness counter (312), is activated at every frame interval. If the frame bit counter 25 (310) shows there is at least one frame number of bits in the HDB fullness counter, then the number of bits corresponding to the oldest frame in the HDB fullness counter is removed or subtracted from the HDB fullness counter. Then a predetermined HDB condition is checked in the buffer condition 30 verifier. Where the predetermined HDB condition is violated, the buffer condition verifier (314) provides to the stuffing bits generator (320) a number equal to the number of stuffing bits $S_{a}$ . The stuffing bits generator (320) is operably coupled to the video information buffer (304), to the buffer condition 35 verifier (314), and to the stuffing bits buffer (318) and is used 8 to monitor the fullness of both the video information buffer (304) and stuffing bits buffer (318). The stuffing bits generator (320) will be activated to generate the stuffing bits under two situations. 5 10 15 20 In the first situation, after a frame is removed from the HDB fullness counter, a nonzero $S_n$ is inputted from the buffer condition verifier (314). In this case, the stuffing bits generator (320) is used to generate $S_n$ stuffing bits. Since the stuffing bits must necessarily be generated in compliance with a preselected decoder bitstream syntax, the number of the generated stuffing bits $\hat{S}_n$ may be different from $S_n$ . Thus, $\hat{S}_n$ is chosen such that it is the smallest possible number that is no less than $S_n$ in order to reduce the incurred delay at the decoder. In the second situation, when both the video information buffer (304) and the stuffing bits buffer (318) are empty, the stuffing bits generator (320) will generate the minimum number of stuffing bits which is in compliance with the preselected decoder bitstream syntax to ensure that channel will not be idle. The stuffing bits buffer (318) is operably coupled to the stuffing bits generator (320) and stores the generated bits from the stuffing bits generator (320). The output switch is operably coupled to the video information buffer (304) and the stuffing bits buffer (318) and is used to get bits into an output channel. Where the stuffing bits buffer contains at least one bit, the output switch (316) receives input from the stuffing bits buffer (318). Otherwise, the output switch (316) receives input from the video information buffer (304). The buffer condition verifier (314), together with the frame bit counter (310) and the HDB fullness counter (312), performs substantially the same function as what is called the decoder buffer verifier (DBV) used at the external encoder. 5 10 15 FIG. 4, numeral 400, is a diagram that illustrates the dynamics of the DBV buffer fullness condition in the case where VBR coding is used in the encoder. The t-axis is labeled as the video frame display interval. In the CIF (Common Intermediate Format), the frame interval is approximately 33 ms. Where the buffer fullness (402) is $B_{n-1}$ at time $t_{n-1}$ , just after frame n-1 is removed from the hypothetical decoder, the number of coded bits for frame n is $E_n$ , and the output channel rate is R(t) bits per second. Then in order for the decoder buffer not to overflow, the following inequality has to be satisfied: $$B_{n-1} + \int_{t_{n-1}}^{t_{n+1}} R(t) dt - E_n < S$$ where S (404) is the buffer size and $R_{n-1,n+1} = \int_{t_{n-1}}^{t_{n+1}} R(t)dt$ is the number of output bits from time $t_{n-1}$ to $t_{n+1}$ sent to the channel. Thus, in order to prevent the decoder buffer from overflow, one of $E_n$ and R(t) may be adjusted. If $E_n$ is kept the same as what is generated by the encoder, then no more than $R_{n-1,n+1}$ bits are permitted to be transmitted to the channel during the time interval from $t_{n-1}$ to $t_{n+1}$ , i.e. $$R_{n-1,n+1} < S - B_{n-1} + E_n$$ . On the other hand, where the bit-rate of the channel is not adjustable, then the following inequality must be maintained: $$E_n > B_{n-1} + R_{n-1,n+1} - S$$ . 20 35 Where the above relation is violated, the number of stuffing bits $S_n$ is added, wherein $$S_n = B_{n-1} + R_{n-1,n+1} - S - E_n$$ is utilized to maintain the buffer condition. In the case of a fixed constant bit rate (CBR) channel, such as MPEG1, $R_{n-1,n+1}$ is simplified to $$R_{n-1,n+1} = R * (t_{n+1} - t_{n-1}) = 2T * R$$ where *R* is the channel rate and *T* is the frame interval. Thus, when the predetermined buffer condition is violated, bits stuffing must be utilized to avoid decoder buffer overflow. FIG. 5, numeral 500, is a diagram that illustrates the dynamics of buffer fullness for the case of a CBR channel. Buffer fullness (502) is shown with respect to S (504), the buffer size, over preselected time intervals. The device and method of the present invention may be utilized to provide delay jitter smoothing in transmission of H.261 bitstreams over ATM Networks. The ITU-T Recommendation H.261 is developed for videoconferencing applications at CBR rates of $p \times 64$ where p may be any integer from 1 to 30. The input video signal format may be either CIF or QCIF (Quarter CIF), with frame rates 30/n frames per second, where n can be 1, 2, 3, 4. FIG. 6, numeral 600, is a block diagram schematic of an H.261 encoder. A lossy encoder (602), operably coupled to receive video frames, is used to reduce redundancies in an input video signal. Motion-compensated prediction is used to 11 reduce temporal redundancy, and the Discrete Cosine Transform (DCT) is exploited to reduce spatial redundancy. Furthermore, a lossless variable length coder (Lossless VLC Encoder) (604), operably coupled to the lossy encoder (602), is used to reduce statistical redundancy in the output information from the lossy encoder (602). Because the bit-rate of an output channel is constant and the bit-rate of encoded video is variable, a buffer (606), operably coupled to the lossless VLC encoder (604), is used to store the coded bitstream to smooth out the bit-rate variations. A forward error control (FEC) coding and framing unit (612), operably coupled to the buffer (606), is optionally used to prevent channel errors. A decoder buffer verifier, which is called the hypothetical reference decoder (HRD) (608) in H.261, operably coupled to the buffer (606), is utilized to keep on track of a fullness of a hypothetical decoder buffer in the HRD (610) based on an amount of video information read from the video information buffer and provide an HRD output to the lossy encoder (602). A lossy encoder will adjust its output bit-rate according the feedback from the HRD to prevent the decoder buffer (606) from overflow. FIG. 7, numeral 700, is a block diagram schematic of an H.261 decoder, which performs the inverse operations of the H.261 encoder and reconstructs a replica of the original signal. The H.261 decoder comprises an optional FEC Checking and De-Framing Unit (702), a buffer (704), a Lossless VLC Decoder (706), and a Lossy Decoder (708) that process input from a channel to provide video frames of the replica. 30 5 10 15 20 25 FIG. 8, numeral 800, illustrates the dynamics of the HRD buffer fullness as a function of time for a decoder buffer verifier using H.261 video coding. The buffer condition defined in H.261 Recommendation is: 12 $$E_n > B_{n-1} + R * T - B$$ 5 10 15 20 where the terminology is as described above. Buffer fullness (802) is shown with respect to S (804), the buffer size, and with respect to B = 4\*R/29.97 (806) over preselected time intervals. FIG. 9, numeral 900, is a block diagram schematic of one embodiment of a device in accordance with the present invention incorporated into a VBR H.261 video communication system over an ATM network. A video source (902) sends video data to an H.261 encoder (904) that encodes data according to a predetermined scheme to provide a coded bitstream and sends the encoded bitstream to a transmitting node (906). The transmitting node (906) receives at least a first coded bitstream from H.261 encoder(s), manipulates the bitstream(s) to form cells for transmission, and transmits the cells to a receiving node (908). The receiving node (908) converts the received cells to a bitstream for decoding and transmits the bitstream to an H.261 decoder (910). The H.261 decoder (910) decodes the bitstream and sends it to a video displayer (912) to provide a video display. transcoder (914) that is operably coupled to receive constant bit rate (CBR) encoded information from the H.261 encoder (904), a packetizer (916) that is operably coupled to the transcoder (914) to receive a variable bit rate bitstream from the transcoder (914), a multiplexer (MUX) (918) that is operably coupled to the packetizer (916), and a first ATM node (920) that is operably coupled to the multiplexer (918) for transmitting cells containing the VBR encoded information. The receiving node (908) typically includes a second ATM switch (922) that is operably coupled to receive the cells, a demultiplexer (DEMUX) (924) that is operably coupled to the WO 95/28802 second ATM switch (922), and a minimum-delay jitter smoother (928) that is operably coupled to the DEMUX (924). At the transmitting node (906), the transcoder (914) is 5 used to convert the H.261 encoder (904) generated CBR bitstream into a VBR H.261 bitstream to minimize bandwidth through statistical multiplexing. The average of the transcoder output $\overline{R}$ is selected to be smaller than the output rate of the H.261 coder R where the rate ratio $R/\overline{R}$ is the maximum 10 achievable statistical gain. The output from the transcoder is packetized by the packetizer (916) into ATM cells with fixed length. Then these cells are multiplexed with cells from other sources and are sent by the first ATM switch (920) to the receiving node (908). Then these cells are demultiplexed at the DEMUX (924). The output cells from the DEMUX is inputted 15 into the minimum-delay jitter smoother (928) which performs depacketization and delay-jitter smoothing (928) as described above. The resulting bitstream is transmitted to the H.261 decoder (910) which decodes the bitstream and transmits the 20 decoded bitstream to the video displayer (912). At each frame interval, the HDB buffer condition in the jitter smoother (928) is checked. Where the condition is violated, then the following number of stuffing bits is transmitted: 25 $$S_n = B_{n-1} + R * T - B - E_n$$ . 30 There are two ways to accomplish bit stuffing in H.261. The first is to use MBA (Macroblock Address) stuffing. This is implemented by a special codeword in a VLC table for Macroblock Addresses, i.e., 0000 0001 111. MBA stuffing can be used immediately after a GOB (group of blocks) or a coded macroblock. The second technique of stuffing may be realized through 35 error correction framing. Before transmitting to an output WO 95/28802 5 10 25 30 35 PCT/US95/01998 14 channel, the coded video bitstream is placed into error correction frames shown as in FIG. 10, numeral 1000. An error correction frame is comprised of: (A) one framing bit (1004), (B) data (1006) including one bit fill indicator (Fi) bit (1008) and 492 bits of coded data (1010) and (C) 18 parity bits (1012). The frame alignment pattern (1002) is: $(S_1S_2S_3S_4S_5S_6S_7S_8) = (00011011)$ , where $S_1$ , $S_2$ , $S_3$ , $S_4$ , $S_5$ , $S_6$ , $S_7$ , and S<sub>8</sub> represent frame transmission order. The 18 parity checking bits are calculated against the first 493 bits including the Fi. Error correction framing stuffing is realized by setting Fi to zero (1014). In this case, only 492 consecutive fill bits (all 1's) (1016) are sent. The MBA stuffing is more flexible since the number of stuffing bits can be as small as 11 bits or any multiple of 11. 15 Utilization of the error correction framing stuffing requires insertion of a predetermined number of at least 492 stuffing bits. On the other hand, error correction framing stuffing is much easier to implement since, for MBA stuffing, the 20 boundary of GOBs or MBs must be detected. FIG. 11, numeral 1100, is a flowchart of one embodiment of the steps of a method in accordance with the present invention. The method is implemented in a network receiving node of a packet video communication system and converts a received digital video packet stream into a continuous bitstream for transmission to a video decoder with a substantially minimized smoothing delay. The method includes the steps of: (A) depacketizing the received video packets and storing the video bits (1102); (B) monitoring a fullness of a hypothetical decoder buffer (1104) by a decoder buffer verifier (DBV); and (C) reading, by a bitstream generator, video information bits from the video information buffer at a predetermined rate and inserting, by the bitstream generator, a number of stuffing bits into the video information bitstream 5 10 15 20 25 30 35 wherein the number of stuffing bits is determined according to a predetermined scheme (1106). The predetermined scheme includes stuffing the stuffing bits into the bitstream when one of: (A) a predetermined hypothetical buffer condition is violated; and (B) the video information buffer is empty. In one embodiment, the sum of the stuffing bits and the bits for an immediately previous frame removed from the hypothetical decoder equals a predetermined number according to the predetermined hypothetical buffer condition. As in the device, in the method, the predetermined hypothetical buffer condition may be selected to be of a form: $$E_n > B_{n-1} + R_{n-1,n+1} - S$$ . where $E_n$ is the number of coded bits for frame n, $B_{n-1}$ is the buffer fullness at time $t_{n-1}$ , $R_{n-1,n+1}$ is the number of bits from time $t_{n-1}$ to $t_{n+1}$ sent to the channel and S is the decoder buffer size. Violation of said buffer condition occurs when $E_n \leq B_{n-1} + R_{n-1,n+1} - S$ . After each frame is removed from the hypothetical decoder buffer, the HDB fullness is examined, the predetermined hypothetical buffer condition is checked, and where the predetermined hypothetical buffer condition is violated, the stuffing bits are outputted to the bitstream generator before sending video bits from the video information buffer. Generally, the minimum number of stuffing bits $S_n$ is determined by $S_n = B_{n-1} + R_{n-1,n+1} - S - E_n$ . For H.261 bitstream, either the macroblock address stuffing or the error correction framing stuffing can be used. 16 Again, the predetermined rate for reading video information bits from the video information buffer is variable. 5 As illustrated in the flow chart of FIG. 12, numeral 1200, the step of monitoring a fullness of a hypothetical decoder buffer by a decoder buffer verifier (DBV) typically includes: (A) utilizing a frame bit counter for counting a number of bits for each frame $E_n$ from the video information 10 buffer and utilizing a HDB fullness counter to count a HDB fullness $B_{\pi}$ (1202); (B) utilizing a buffer condition verifier for verifying whether the HDB fullness satisfies a predetermined hypothetical decoder buffer condition at each frame interval after a frame of bits is removed from the hypothetical decoder 15 buffer and, where the predetermined hypothetical decoder buffer condition is violated, providing to the bitstream generator a number equal to the number of stuffing bits S. (1204). 20 25 30 35 As illustrated in the flow chart of FIG. 13, numeral 1300, the step of reading, by a bitstream generator, video information bits from the video information buffer at a predetermined rate and inserting, by the bitstream generator. a number of stuffing bits into the video information bitstream wherein the number of stuffing bits is determined according to a predetermined scheme to provide substantially minimized delay jitter smoothing the bitstream generator typically includes: (A) utilizing a stuffing bits generator for generating stuffing bits in accordance with the predetermined scheme (1302); (B) utilizing a stuffing bits buffer for storing the generated stuffing bits and providing the stuffing bits to an output switch in accordance with occurrence of one of selected predetermined occurrences (1304); (C) utilizing the output switch (1306) for receiving one of: (C1) input from the stuffing bits buffer upon occurrence of one of the selected 17 predetermined occurrences, and (C2) input from the video information buffer upon nonoccurrence of one of the selected predetermined occurrences. 5 Typically, the selected predetermined occurrences include: (A) where a frame is removed from the hypothetical decoder buffer, a nonzero S is inputted from the buffer condition verifier, the stuffing bits generator generates $\hat{S}_{n}$ stuffing bits in compliance with a preselected decoder 10 bitstream syntax, and where a number of the generated stuffing bits $\hat{S}_n$ is different from $S_n$ , $\hat{S}_n$ is selected to be a smallest number that is no less than $S_n$ in order to reduce incurred delay at a decoder, and (B) where both the video information buffer and the stuffing bits buffer are empty, the 15 stuffing bits generator generates a minimum number of stuffing bits that is in compliance with the preselected decoder bitstream syntax to ensure that channel will not be idle. In one embodiment, the outside decoder may be selected to be an H.261 decoder. 25 30 FIG. 14, numeral 1400, is a block diagram of another embodiment of a hypothetical buffer-based jitter smoothing device for minimizing smoothing delay in a packet video communication system in accordance with the present invention. The device includes: A) a video depacketizer (1402) that is operably coupled to receive and depacketize digital video packets of a received digital video packet stream into video information bits, B) an information buffer (1404) that is operably coupled to the depacketizing means (1402), for storing the video information bits, and C) a hypothetical buffer-based smoothing unit (1406) that is operably coupled to the information buffer (1404), for utilizing a hypothetical 18 buffer to generate and transmit a continuous stuffed video information bitstream at a predetermined rate. As shown in FIG. 3, the hypothetical buffer-based 5 smoothing unit may be implemented to include (322): (A) a frame bit counter (310), operably coupled to the video information buffer, for counting a number of bits for each video frame $E_n$ , (B) a hypothetical buffer fullness counter (312), operably coupled to the video information buffer (304), 10 for counting a hypothetical buffer fullness B<sub>n</sub>, (C) a buffer condition verifier (314), operably coupled to the frame bit counter (310) and to the hypothetical buffer fullness counter (312), for verifying whether the number of the bits for a preselected frame removed from the hypothetical buffer 15 satisfies a predetermined hypothetical buffer condition, and, where the predetermined hypothetical buffer condition is violated, for providing to the bitstream generator a number equal to the number of stuffing bits $S_n$ , (D) a stuffing bits generator (320), operably coupled to the video information 20 buffer (304), to the buffer condition verifier (314), and to a stuffing bit buffer (318), for generating stuffing bits in accordance with the predetermined scheme, (E) the stuffing bits buffer (318), operably coupled to the stuffing bits generator (320), for storing the generated stuffing bits and for 25 providing the stuffing bits to an output switch in accordance with occurrence of one of selected predetermined occurrences, and (F) an output switch (316), operably coupled to the video information buffer (304) and to the stuffing bits buffer (318), for receiving one of: (F1) input from the stuffing bits buffer 30 upon occurrence of one of the selected predetermined occurrences, and (F2) input from the video information buffer upon nonoccurrence of one of the selected predetermined occurrences. 19 Although embodiments of the present invention are described above, it will be obvious to those skilled in the art that many alterations and modifications may be made without departing from the invention. Accordingly, it is intended that all such alterations and modifications be included within the spirit and scope of the invention as defined in the appended claims. We claim: 20 #### Claims: 5 10 15 20 - 1. A device in a receiving node of a packet video communication system for converting a received digital video packet stream into a continuous bitstream for transmission to a video decoder, the device comprising: - (1A) a video depacketizer, operably coupled to receive and depacketize digital video packets of the received digital video packet stream into video information bits; - (1B) a video information buffer, operably coupled to the video depacketizer for storing the video information bits; - (1C) a decoder buffer verifier (DBV), operably coupled to the video information buffer, for determining a number of transmitted information bits for a video frame, and for monitoring a fullness of a hypothetical decoder buffer (HDB); - (1D) a bitstream generator, operably coupled to the video information buffer and to the DBV, for transmitting the video information bits from the video information buffer at a predetermined rate and for inserting a number of stuffing bits into the video information bitstream wherein the number of stuffing bits is determined according to a predetermined scheme to provide a continuous bitstream with a substantially minimized smoothing delay. 5 10 20 25 - 2. The device of claim 1 wherein at least one of 2A-2F: - (2A) the predetermined scheme includes inserting the stuffing bits into the bitstream when one of 2A1-2A2: - (2A1 a predetermined hypothetical decoder buffer condition is violated; and - (2A2) the video information buffer is empty; - (2B) the fullness, $B_{n}$ , of the hypothetical decoder buffer (HDB) at time $t_n$ is determined according to $B_n = B_{n-1} + R_{n-1,n}$ , where $B_{n-1}$ is the buffer fullness at time $t_{n-1}$ , and $R_{n-1,n}$ is the number of video information bits transmitted during frame interval n-1 from time $t_{n-1}$ to $t_n$ : - (2C) the video decoder is an H.261 decoder, and, where selected, wherein the bits stuffing is performed utilizing one of 2C1-2C2: - 15 (2C1) macroblock address stuffing; and (2C2) error correction framing; - (2D) a predetermined rate for reading video information bits from the video information buffer is selectable; - (2E) the decoder buffer verifier includes 2E1-2E3: - (2E1) a frame bit counter, operably coupled to the video information buffer, for counting a number of bits for each video frame $E_n$ . - (2E2) a HDB fullness counter, operably coupled to the video information buffer, for counting a HDB fullness $B_{\text{n}}$ , and - (2E3) a buffer condition verifier, operably coupled to the frame bit counter and to the HDB fullness counter, for verifying whether the number of the bits for a preselected frame removed from the HDB satisfies a predetermined hypothetical decoder buffer condition, and, where the predetermined hypothetical decoder buffer condition is violated, for providing to the bitstream generator a number equal to the number of stuffing bits $S_n$ ; - (2F) the bitstream generator includes 2F1-2F3: PCT/US95/01998 22 (2F1) a stuffing bits generator, operably coupled to the video information buffer, to the buffer condition verifier, and to a stuffing bit buffer, for generating stuffing bits in accordance with the predetermined scheme, 5 WO 95/28802 (2F2) the stuffing bits buffer, operably coupled to the stuffing bits generator, for storing the generated stuffing bits and for providing the stuffing bits to an output switch in accordance with occurrence of one of selected predetermined occurrences, 10 (2F3) the output switch, operably coupled to the video information buffer and to the stuffing bits buffer, for receiving one of 2F3a-2F3b: (2F3a) input from the stuffing bits buffer upon occurrence of one of the selected predetermined occurrences, and 15 o (2F3b) input from the video information buffer upon nonoccurrence of one of the selected predetermined occurrences. 3. The device of claim 2 wherein, for 2A, the predetermined hypothetical decoder buffer condition is violated when $E_n < B_{n-1} + R_{n-1,n+1} - S$ , where $E_n$ is the number of information bits removed from the HDB for frame n, $B_{n-1}$ is the HDB fullness at time $t_{n-1}$ , $R_{n-1,n+1}$ is the number of information bits 25 transmitted from time $t_{n-1}$ to $t_{n+1}$ , and S is the size of the HDB; and where selected, wherein the predetermined hypothetical buffer condition is checked once every frame interval, and when the predetermined hypothetical buffer condition is 30 violated, the stuffing bits are output to the bitstream generator before sending any video information bits from the video information buffer; and where further selected, wherein the minimum number of stuffing bits $S_n$ is determined by $S_n = B_{n-1} + R_{n-1,n+1} - S - E_n$ . WO 95/28802 5 PCT/US95/01998 4. A method in a network receiving node of a packet video communication system for converting a received digital video packet bitstream into a continuous bitstream for transmission to a video decoder, the method comprising the steps of: - (4A) receiving the digital video packet stream and storing the information bits contained in said packets; - (4B) monitoring a fullness of a hypothetical decoder buffer by a decoder buffer verifier (DBV); and - (4C) reading, by a bitstream generator, video information 10 bits from the video information buffer at a predetermined rate and inserting, by the bitstream generator, a number of stuffing bits into the video information bitstream wherein the number of stuffing bits is determined according to a predetermined scheme to provide a continuous bitstream with a substantially minimized smoothing delay. 24 - 5. The method of claim 4 wherein at least one of 5A-5F: - (5A) the predetermined scheme includes inserting the stuffing bits into the bitstream when one of 5A1-5A2: - (5A1) a predetermined hypothetical buffer condition is violated; and 5 10 - (5A2) the video information buffer is empty; - (5B) the fullness Bn of a hypothetical decoder buffer (HDB) at time tn, which is just before frame n is removed from the hypothetical decoder, is determined according to Bn = Bn-1 + Rn-1,n, where Bn-1 is the buffer fullness just after frame n-1 is removed from the hypothetical decoder, and Rn-1,n is the number of video information bits transmitted during frame interval n-1 from time tn-1 to tn; - (5C) the external decoder is an H.261 decoder, and 15 where selected, wherein the bits stuffing is performed utilizing one of 5C1-5C2: - (5C1) macroblock address stuffing; and - (5C2) error correction framing; - (5D) a predetermined rate for reading video information bits from the video information buffer is selectable: - (5E) the step of monitoring a fullness of a hypothetical decoder buffer by a decoder buffer verifier (DBV) includes 5E1-5E3: - (5E1) utilizing a frame bit counter for counting a 25 number of bits for each frame Entransmitted from the video information buffer, - (5E2) utilizing a decoder buffer fullness counter for counting a decoder buffer fullness Bn, - (5E3) utilizing a buffer condition verifier for 30 verifying whether the number of bits for a preselected frame removed from the HDB satisfies a predetermined hypothetical decoder buffer condition, and, when the predetermined hypothetical decoder buffer condition is violated, providing to the bitstream generator a number equal to the number of stuffing bits Sn; and 35 5 10 15 25 30 (5F) the step of reading, by a bitstream generator, video information bits from the video information buffer at a predetermined rate and inserting, by the bitstream generator, a number of stuffing bits into the video information bitstream wherein the number of stuffing bits is determined according to a predetermined scheme to provide substantially minimized delay jitter smoothing includes 5F1-5F3: (5F1) utilizing a stuffing bits generator for generating stuffing bits in accordance with the predetermined scheme, (5F2) utilizing a stuffing bits buffer for storing the generated stuffing bits and providing the stuffing bits to an output switch in accordance with occurrence of one of selected predetermined occurrences, (5F3) utilizing the output switch for receiving one of 5F3a-5F3b: (5F3a) input from the stuffing bits buffer upon occurrence of one of the selected predetermined occurrences, and 20 (5F3b) input from the video information buffer upon nonoccurrence of one of the selected predetermined occurrences, and where selected, wherein the selected predetermined occurrences include 5F3c-5F3d: (5F3c) where a frame is removed from the HDB, a nonzero Sn inputted from the buffer condition verifier, the stuffing bits generator generates Sn stuffing bits in compliance with a preselected decoder bitstream syntax, and where a number of the generated stuffing bits $\hat{S}_n$ is different from Sn, $\hat{S}_n$ is selected to be a smallest number that is no less than Sn in order to reduce incurred delay at a decoder, and (5F3d) where both the video information buffer and the stuffing bits buffer are empty, the stuffing bits generator generates a minimum number of stuffing bits that is 26 in compliance with the preselected decoder bitstream syntax to ensure that channel will not be idle. - 6. The method of claim 5 wherein, in step 5A, the predetermined hypothetical decoder buffer condition is violated when $E_n < B_{n-1} + R_{n-1,n+1} S$ , where $E_n$ is the number of information bits removed from the hypothetical decoder buffer for frame n, $B_{n-1}$ is the hypothetical decoder buffer fullness at time $t_{n-1}$ , $R_{n-1,n+1}$ is the number of information bits transmitted from time $t_{n-1}$ to $t_{n+1}$ , and S is the size of the hypothetical decoder buffer, - and where selected, wherein the predetermined hypothetical buffer condition is checked at every frame interval, and when the predetermined hypothetical buffer condition is violated, - the stuffing bits are output to the bitstream generator before sending any video information bits from the video information buffer. - and where further selected, wherein the minimum number of stuffing bits $S_n$ is determined by $S_n = B_{n-1} + R_{n-1,n+1} S E_n$ . 20 5 - 7. A hypothetical buffer-based jitter smoothing device for minimizing smoothing delay in a packet video communication system, the device comprising: - (7A) a video depacketizer, operably coupled to receive and depacketize digital video packets of a received digital video packet stream into video information bits, - (7B) an information buffer, operably coupled to the depacketizing means, for storing the video information bits, - (7C) hypothetical buffer-based smoothing means, operably coupled to the information buffer, for utilizing a hypothetical buffer to generate and transmit a continuous stuffed video information bitstream at a predetermined rate. PCT/US95/01998 8. The hypothetical buffer-based jitter smoothing device of claim 7 wherein a number of stuffing bits inserted into the video information bitstream is determined according to a predetermined scheme. 5 10 20 25 - 9. The hypothetical buffer-based jitter smoothing device of claim 8 wherein at least one of 9A-9B: - (9A) the predetermined scheme includes inserting the stuffing bits into the bitstream, i.e., bits stuffing, when one of 9A1-9A2: - (9A1) a predetermined hypothetical buffer condition is violated; and - (9A2) the video information buffer is empty; and where selected, wherein at least one of 9B-9C: - 15 (9B) the predetermined hypothetical buffer condition is violated when $E_n < B_{n-1} + R_{n-1,n+1} - S$ , where $E_n$ is the number of information bits removed from the hypothetical buffer for frame n, $B_{n-1}$ is the fullness of said buffer at time $t_{n-1}$ , $R_{n-1}$ 1,n+1 is the number of information bits transmitted from time $t_{n-1}$ to $t_{n+1}$ , and S is the size of the hypothetical buffer; and - (9C) the predetermined hypothetical buffer condition is checked once every frame interval, and when the predetermined hypothetical buffer condition is violated, the stuffing bits are inserted into the video information bitstream before sending any video information bits from the video information buffer, and where selected, wherein the minimum number of stuffing bits $S_n$ is determined by $S_n = B_{n-1} + R_{n-1}$ $1,n+1 - S - E_n$ . - 30 The hypothetical buffer-based jitter smoothing device of claim 7 wherein at least one of 10A-10C: - (10A) the hypothetical buffer-based smoothing means determines a number of stuffing bits to be transmitted for a video frame and monitors a fullness of the hypothetical buffer. and where selected at least one of 10A1-10A2: 5 15 20 25 30 35 (10A1) wherein the fullness, $B_{n}$ , of the hypothetical buffer at time $t_n$ is determined according to $B_n = B_{n-1} + R_{n-1,n}$ , where $B_{n-1}$ is the buffer fullness at time $t_{n-1}$ , and $R_{n-1,n}$ is the number of video information bits transmitted during frame interval n-1 from time $t_{n-1}$ to $t_n$ ; (10A2) wherein determining the number of stuffing bits to be transmitted for the video frame is performed at a predetermined selectable rate; 10 (10B) the continuous stuffed video information bitstream output from the jitter smoothing device is transmitted to an H.261 decoder; and where selected, wherein the bits stuffing is performed utilizing one of 10B1-10B2: (10B1) macroblock address stuffing; and (10B2) error correction framing; and (10C) the hypothetical buffer-based smoothing means includes 10C1-10C6: (10C1) a frame bit counter, operably coupled to the video information buffer, for counting a number of bits for each video frame E<sub>n</sub>. (10C2) a hypothetical buffer fullness counter, operably coupled to the video information buffer, for counting a hypothetical buffer fullness B<sub>n</sub>, (10C3) a buffer condition verifier, operably coupled to the frame bit counter and to the hypothetical buffer fullness counter, for verifying whether the number of the bits for a preselected frame removed from the hypothetical buffer satisfies a predetermined hypothetical buffer condition, and, where the predetermined hypothetical buffer condition is violated, for providing to the bitstream generator a number equal to the number of stuffing bits $S_n$ , (10C4) a stuffing bits generator, operably coupled to the video information buffer, to the buffer condition 30 5 10 verifier, and to a stuffing bit buffer, for generating stuffing bits in accordance with the predetermined scheme, (10C5) the stuffing bits buffer, operably coupled to the stuffing bits generator, for storing the generated stuffing bits and for providing the stuffing bits to an output switch in accordance with occurrence of one of selected predetermined occurrences, and (10C6) the output switch, operably coupled to the video information buffer and to the stuffing bits buffer, for receiving one of 10C6a-10C6b: (10C6a) input from the stuffing bits buffer upon occurrence of one of the selected predetermined occurrences, and (10C6b) input from the video information 15 buffer upon nonoccurrence of one of the selected predetermined occurrences. FIG.3 FIG.10 *FIG.12* FIG.13 FIG.14 #### INTERNATIONAL SEARCH REPORT International application No. PCT/US95/01998 | A. CLASSIFICATION OF SUBJECT MATTER IPC(6): H04N 7/00 US CL: 348/15, 463, 465, 466, 467, 497; 370/102 According to International Patent Classification (IPC) or to both national classification and IPC | | | | | | | | | |------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------|--|--|--|--|--| | B. FIELDS SEARCHED | | | | | | | | | | | umentation searched (classification system followe | d by classification symbols) | | | | | | | | U.S. : 348/15, 463, 465, 466, 467, 497; 370/102 | | | | | | | | | | Documentation | n searched other than minimum documentation to th | e extent that such documents are included | in the fields searched | | | | | | | , | | | | | | | | | | Electronic data | a base consulted during the international search (na | ame of data base and, where practicable | , search terms used) | | | | | | | , | | , | | | | | | | | C. DOCU | C. DOCUMENTS CONSIDERED TO BE RELEVANT | | | | | | | | | Category* | Citation of document, with indication, where ap | ppropriate, of the relevant passages | Relevant to claim No. | | | | | | | Υ | US, A, 5,121,205 (NG ET AL) 09 June 1992, Figs. 6 and 7 1, 4, 7, 8 | | | | | | | | | | US, A, 5,159,447 (HASKELL ET AL) 27 October 1992, Fig. 1, 4, 7, 8 | | | | | | | | | A,P ( | US, A, 5,131,013 (CHOI) 14, July 1994, Figs. 4 and 5. 1, 4, 7, 8 | | | | | | | | | | US, A, 5,287,360 (REGENT ET AL) 18 February 1994, Fig. 1, 4, 7, 8 | | | | | | | | | | | | | | | | | | | | | · | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Further documents are listed in the continuation of Box C. See patent family annex. | | | | | | | | | | | al categories of cited documents:<br>ment defining the general state of the art which is not considered | "T" later document published after the inte<br>date and not in conflict with the applica<br>principle or theory underlying the inve | ition but cited to understand the | | | | | | | to be | part of particular relevance | "X" document of particular relevance; the | Ĭ | | | | | | | "L" docum | r document published on or after the international filing date ment which may throw doubts on priority claim(s) or which is | considered novel or cannot be considered when the document is taken alone | | | | | | | | specia | to establish the publication date of another citation or other<br>al reason (as specified) | "Y" document of particular relevance; the | | | | | | | | *O* docum | ment referring to an oral disclosure, use, exhibition or other | combined with one or more other such<br>being obvious to a person skilled in th | documents, such combination | | | | | | | | document published prior to the international filing date but later than "&" document member of the same patent family the priority date claimed | | | | | | | | | Date of the actual completion of the international search Date of mailing of the international search report | | | | | | | | | | 29 MARCH 1995 05 JUL 1995 | | | | | | | | | | Name and mailing address of the ISA/US Commissioner of Patents and Trademarks Authorized officer | | | | | | | | | | Box PCT Washington, D.C. 20231 VICTOR R. KOSTAK | | | | | | | | | | Facsimile No. | | Telephone No. (703) 305-4374 | <u>.</u> | | | | | |