H.239, BFCP, RDP and VbSS Data Sharing within Video Conferences


When Video Conferencing (including desktop application sharing), there must be a common denominator between platforms in the conference so that they can all effectively communicate and understand the traffic that is sent and received.

The purpose of this paper is to explain the differences between the various methods used; H.239; BFCP; RDP and VbSS by the major vendors of H.323, SIP and Skype for Business devices and clients when Data (Desktop or Application) Sharing within Video Conferences and how these differs from the older T.120 standard previously used via NetMeeting.

It is assumed that the reader has a general knowledge of Video Conferencing systems and the standards involved. However, the following technical papers are available to provide more information on these topics:

Terminology - be clear about what you mean:

When we talk about data or application sharing or collaboration, we need to be clear as to what we really mean. This is because, depending upon what system we use, data sharing might really mean data showing; or data collaboration might actually be data sharing and not true collaboration were we hand over control of the shared application to someone else. The distinction will hopefully become clearer as you read this paper.

The old T.120 Data Sharing standard has gone:

For Data Sharing (complete desktops, windows or applications), the ITU initially approved the T.120 standard. The basis of T.120 was originally developed and put forward by Microsoft.

T.120 allows data sharing and true collaboration to the extent that another endpoints could be granted and then actually take control of the shared desktop or application. Most vendors offered a T.120 solutions by integrating NetMeeting either directly into their PC based systems or indirectly with a PC linked to their non-PC based Settop systems.

T.120 was really a two stage opeartion. In the first stage, you could share your desktop or application with other participants in the video conference. These participants would then see or view your shared data, but that's all they could do, they couldn't interact or change your data. The second stage was true collabration in that once your data was shared, you could then hand control over, or a participant could request control and you could then grant control to them. Once control was handed over, they then could change or edit or delete your data as if it was theirs as they had full control. But at any time you could take back control. Hence, this is true data collaboration and not just data sharing (viewing).

However, NetMeeting was last supported in Windows XP and hence T.120 has gone and been replaced by the H.239 standard - sometimes referred to as Dual Video standard.

H.239 is the new data sharing standard used by H.323 devices:

The H.239 standard defines how additional media channels are used and managed by H.323 Video Conferencing systems. H.239 introduces the concept of 'data-showing', whereby the PC desktop graphics is converted into a separate media stream and transmitted along with the main video stream. The new common denominator is the media stream, so it does not matter if the endpoint is PC or settop based.

The key point here is that H.239 is the control protocol whilst the actual media (content) is video and typically encoded using H.264 AVC, but could fallback to use H.263.

Endpoints that support H.239 will receive the dual stream (desktop graphics and main video) and then display the desktop graphics and far-end video in separate windows. Endpoints that don't support H.239 will display the desktop graphics instead of the main video (far-end video to them) in one window, which may not be full screen, when data is shared. As soon as sharing stops, these endpioints will then revert back to displaying the main video in the single window.

But the shared desktop, window or application was actually a video stream; so all you can do is view the video. You cannot be handed control, or request control and then change the data; its just a video of whatever is being shared to you. So H.239 is not really true data collaboration, it's just data showing.

However, H.239 does allow for audio from the shared application to be included.

All the major H.323 product vendors support the H.239 Dual Video Standard in their latest products; Cisco (Tandberg) have DuoVideo, Polycom have H.239 People+Content and/or Polycom People+Content IP (PPCIP) in their RealPresence Group and HDX series, Lifesize have H.239 support in their Icon, Express, Team, Room and Cloud products and Yealink have H.239 in their VC800, VC500, VC Desktop, VC Mobile endpoints whilst the VP-T49G Video Phone can receive H.239 data. 

Binary Floor Control Protocol - BFCP - used by SIP devices:

We use the term Generic SIP to identify devices that adhere to the SIP standard as opposed to devices that use Microsoft's version of SIP, which for clarity we call (MS-SIP). 

SIP endpoints typically use BFCP, which like H.239, effectively sends the shared desktop or application as a second video stream that the receiving endpoint then displays in either a second window or in place of the 'talking heads' video whilst the sharing is active.

When two endpoints establish a BFCP connection, they must determine which endpoint will act as a floor control server, then the other will act as a floor control client for that specific stream. If there are two streams, then again one endpoint must act as the floor control server, but it does not have to be the same endpoint for each stream.

If you look at a trace of the application sharing, you will see something like:

m=application 3238 UDP/BFCP *

m=application 3238 UDP/BFCP * indicates that this particular application sharing stream is over IP Port 3238 using RTP that's embedded in UDP packets. And that the stream is actually BFCP.
a=setup:actpass the connection was not yet established; once done, this would be either active or passive.
a=connection:new indicates that it is a new connection.
a=floorctrl:c-s indicates that the sender is willing to act both as a floor control client and floor control server.

Again, the point here is that BFCP is the control protocol whilst the actual media (content) is video and typically encoded using H.264 AVC.

Major difference in connectivity between H.323 (and generic SIP) devices and Microsofts Skype for Business:

As we can see from the above, H.323 devices use H.239 to control the media stream whilst SIP devices use BFCP to control the media stream; with the actual media being video.

A key destination between how Microsofts Skype for Business works is that both H.323 and generic SIP devices deliver the content within the same connection. When content is shared, it effectively takes bandwidth from the main video (talking heads) stream within the session.

By comparison, Microsofts Skype for Business typically uses Microsofts X-H.264UC as the codec for the main video stream, G.722/2 Stereo as the codec for the audio stream and RDP  or VbSS to control the content; but these streams are created in separate connections, with each connection having their own allocated bandwidth.

With Skype for Business, as the content is shared within a separately established connection; the audio and video sessions don't actually need to exist for SfB to share content. By contrast, H.323 (and generic SIP) conferences must first establish the connection and video stream, then add the content stream to the call. 

Microsofts Remote Desktop Protocol - RDP - used by Skype for Business:

Microsoft developed their proprietary Remote Desktop Protocol - RDP and used it within Lync 2013, Skype for Business 2015  and Skype for Business 2016 for Desktop and Application Sharing between clients. More about Skype for Business 2016 later as it also supports a new method for sharing. 

RDP is an extension of the ITU-T T.128 standard for Multipoint Applications Sharing that sits under the T.120 umbrella standard that was originally used by Microsofts NetMeeting application.

If you look at a trace of the Skype for Business application sharing, you will see something like:

m=applicationsharing 58545 TCP/RTP/AVP 127
a=rtpmap:127 x-data/90000

m=applicationsharing 58545 TCP/RTP/AVP 127 indicates that this particular applicationsharing stream is over IP Port 58545 using RTP that's embedded in TCP packets and assigned a Payload Type of 127
a=rtpmap:127 x-data/90000 indicates that PT 127 refers to sending the data at a clock rate of 90000Hz.
a=x-applicationsharing-media-type:rdp indicates that the actual media is RDP

Microsofts Video based Screen Sharing - VbSS - used by Skype for Business 2016:

The main limitations of using RDP are low frame rate and high bandwidth consumption. To address these, Microsoft has developed Video based Screen Sharing - VbSS as an alternative method for Desktop Sharing. VbSS is supported by the latest Skype for Business 2016 client from (16.0.6330.1000) found within Office 2016.

The main aims of VbSS are to:

  • Make screen sharing more reliable compared to using RDP
  • Make session setup and the video experience faster. RDP can display ~3 frames per second; whilst
    VbSS can achieve up to 30 fps when using X-H.264UC video
  • Work much better than RDP in low bandwidth conditions; even when sharing fast moving content

To achieve these aims, VbSS uses UDP as its underlying protocol. This is more efficient than using TCP which RDP does, providing the network connection can support the traffic without loss. There is also a trade-off between image integrity and sharpness for reliabilty, motion and efficiency. But this should not really be too obvious to users. Effectively, VbSS sends the content as another video stream (similarly too, but not the same as H.239 or BFCP).

It's importnat to know that Skype for Business 2016 does not replace RDP with VbSS. Furthermore, there are some restrictions as to when VbSS can be used. Skype for Business 2016 still uses RDP as a fallback or alternative instead of VbSS when circumstances dictate.

  • There is only one content stream. All endpoints in the conference must support and use VbSS; if an older client that does not support VbSS joins, then everyone seamlessly switches to using RDP
  • Whilst VbSS is supported in both Point-to-Point and Multipoint conferences, it can only be used when Desktop Sharing. RDP is used when sharing a Window (Program) or PowerPoint files.
  • Content sharing with VbSS provides a view-only video stream. If an endpoint is given or requests to take remote control, the session will seamlessly revert to using RDP
  • Once a session switches from using VbSS to RDP; it does not switch back. To use VbSS again, the shared session must be stopped and restarted

Hence, in a trace of Skype for Business 2016 application sharing you will see both RDP and VbSS being advertised in the SDP (Session Description Protocol) statement. This is so that the session can seamlessly switch from using VbSS to RDP.

The first part shows the new a=x-mediabw session level attribute  with m=applicationsharing to advertise RDP will look something like:

Content-Type: application/sdp
Content-Transfer-Encoding: 7bit
Content-ID: <037525b415ce510283b11781d2fe0dc8>
Content-Disposition: session; handling=optional
o=- 0 1 IN IP4
c=IN IP4
t=0 0
a=x-mediabw:applicationsharing-video send=12000;recv=12000
m=applicationsharing 58956 TCP/RTP/AVP 127
--- full list of TCP candicates for RDP and crypto data ---
a=rtpmap:127 x-data/90000

m=applicationsharing 58956 TCP/RTP/AVP 127 indicates that this particular applicationsharing stream is over IP Port 58956 using RTP that's embedded in TCP packets and assigned a Payload Type of 127
a=rtpmap:127 x-data/90000 indicates that PT 127 refers to sending the data at a clock rate of 90000Hz.
a=x-applicationsharing-media-type:rdp indicates that the actual media is RDP

Next shows the additional VbSS part that advertises the new media session using the m=video statement and the two new a=label:applicationsharing-video and a=x-mediasettings: attributes for VbSS.

m=video 56573 RTP/AVP 122 123
c=IN IP4
a=rtcp-fb:* x-message app send:src,x-pli recv:src,x-pli
--- full list of UDP and TCP candicates for VbSS and crypto data ---
a=rtpmap:122 X-H264UC/90000
a=fmtp:122 packetization-mode=1;mst-mode=NI-TC
a=rtpmap:123 x-ulpfecuc/90000

m=video 56573 RTP/AVP 122 123 indicates that this particular video stream is over IP Port 56573 using RTP and order of preference is to use video codec with Payload Type of 122, then 123
a=rtpmap:122 X-H264UC indicates that 122 is the PT for Microsofts specific version of H.264 SVC, the same that is used for 'talking heads video'.
/90000 indicates the clock rate and is used by all the listed Lync video codecs.
packetization-mode=1 indicates that UCConfig Mode 1 is the maximum supported SVC scalability mode.

UCConfig Mode 1 is the ability to encode separate temporal layers within a single video stream per resolution. That said, with VbSS content sharing, there is only one resolution.

The maximum resolution that H.264 SVC supports is 1920x1080 (1080p) at 30 fps. But this is not advertised in the SDP statement and there is no easy way to identity what this maybe. H.264 SVC requires encoding and decoding by the respective SfB clients; so what resolution can be achieved is really dependent on the SfB clients hardware capabilities. 

Issues when Sharing between SIP and Skype for Business:

SIP (and H.323) standards based videoconferencing applications typically DO NOT use RDP or VbSS. Hence, this presents one of the many challenges that needs to be overcome when integrating Skype for Business and/or Lync 2013 clients with SIP (and/or H.323) videoconferencing systems, especially if you want the SIP endpoint to display (see) the Skype for Business clients shared application.

Sending content in the other direction and sharing the SIP (or H.323) endpoints desktop or applications with a Skype for Business or Lync 2013 client is a slightly lesser issue as these typically use BFCP or H.239 that effectively sends the desktop or application as a second video stream that the Skype for Business or Lync 2013 client can understand and display in either a second window or in place of the 'talking heads' video whilst the sharing is active..

The main issue is not with the media (video, audio, data) streams used by the different endpoints, it's with the signalling (protocol) streams. One solution is to use additional video infrastructure to transcode between the different endpoints respective media and signalling streams. But this can be expensive, especially if you have to provide and support your own On-Premise infrastructure. An alternative to your own infrastructure is to use a Cloud solution such as the Lifesize Cloud that provides everything on a subscription basis.

The other option is to use an endpoint that supports H.323 and native integration with Skype for Business. The Polycom RealPresence Group series with the Microsoft Integration option provides such a solution. With the latest software (v6.1.7) and integration option, the RealPresence Group natively supports both Microsoft RDP and VbSS  signalling and Microsofts H.264 SVC media. It can now register with either Skype for Business On-Premise servers or use an Office 365 subscription. To these servers, the RealPresence Group system looks like any other Skype for Business  2016 client. Hence, the RealPresence Group system can make either H.323 calls to other H.323 endpoints or it can make Skype for Business calls with other Skype for Business clients.

If the RealPresence Group system is in a MS-SIP conference with a Skype for Business 2016 client, both can use VbSS to send or recieve Desktop Sharing; and this can be up to 1080p30 if they use H.264 SVC media. 

Note: the RealPresence Group system cannot make generic SIP calls if it is registered with Skype for Business. This is because Skype for Business uses Microsoft's version of SIP (MS-SIP) and the RealPresence Group can only be configured to use one version of SIP, either MS-SIP or generic SIP.