Appendix B: Skype® for Business 2015 Video and Audio Codecs
This paper provides a backgrounding into the Video and Audio Codecs used by Skype® for Business 2015 and is Appendix B of a series that specifically looks at Microsoft® Skype for Business 2015 (Lync® 2013) and the challenges and solutions for integrating Skype for Business 2015 with H.323 or SIP standards compliant videoconferencing systems. Hence, it will focus on codecs used in A/V Conferencing and Application Sharing.
We will look at each of the video and audio codecs available to Skype for Business 2015 when A/V Conferencing and Application Sharing; this will then highlight the differences and challenges that need to be resolved when integrating with H.323 or SIP based systems.
Within these papers the terms, Lync, Skype, Skype for Business and SfB, unless stated otherwise, all refer to Skype for Business Server 2015. The paper is specifically based on Skype for Business 2015. Whilst Lync 2013 has now been renamed Skype for Business 2015, it is generally backwards compatible with Lync Server 2013.
It is recommended that you look all the papers listed below for a background into Skype for Business and a detailed explanation about the Codecs, Protocols, Procedures and some of the available solutions.
- Part 1: How Skype for Business 2015 - (Lync 2013) can be Deployed.
- Part 2: Skype for Business 2015 Servers, Roles and their Functions.
- Part 3: Networks & Protocols used by Skype for Business 2015 - (Lync 2013).
- Part 4: Lifesize Cloud integration with Skype for Business 2015 - (Lync 2013).
- Part 5: Polycom Endpoints Native Integration with Skype for Business 2015.
- Part 6: Polycom RealConnect Interoperability with Skype for Business 2015.
- Appendix A: H.264 Video Codecs and UCConfig Modes.
- Appendix C: Video and Audio Codecs used by H.323 and SIP Compliant VC systems.
- Appendix D: How to check what Skype for Business codecs your PC supports.
- Appendix E: How well does your PC support Skype for Business 2015 H.264/SVC.
Microsoft Lync is an evolutionary product for Unified Communications (UC). The initial product; Live Communications Server 2003, was only an Instant Messaging (IM) server. This then evolved through several interactions of Live Communications Server to Office Communications Server and then to Lync Server 2010; when a PBX replacement function was added. It then evolved even further to Lync Server 2013 which added much more including video conferencing, web and audio conferencing, softphone and PBX replacement and/or integration. Now, Microsoft have renamed Lync to Skype for Business.
Skype for Business Video and Audio Codecs:
We now know that for native integration, a third-party endpoint must share common audio and video codecs with Skype for Business. So let's take a closer look at the latest codecs in Lync 2013 and Skype for Business 2015.
Determining what codecs are used by Skype for Business:
From Appendix A, we know that there are 5 UCConfig Modes (levels) and that Microsoft will be using the H.264 SVC technology developed by Polycom, but in its implementation of H.264 SVC for Skype for Business 2015 and Lync 2013, Microsoft appear to only use Mode 1.
According to the Microsoft document 184.108.40.206 Video Source Request (VSR) that relates to their H.264 SVC, there is a parameter that sets the maximum UCConfig Mode the receiver can support. This documents states that the values of 2 or higher MUST NOT be used. It also states that a value of 0 MUST NOT be used; therefore the only valid UCConfing Mode is 1; SVC Temporal Scalability with Hierarchical P.
Furthermore, from Appendix D, we know that in a communications trace of the latest Skype for Business 2015 (15.0.4737.1000), the SDP SIP INVITE statement captured in the SfB clients Lync-UccApi.UccApilog log file lists all the supported audio and video codecs.
Real-Time Protocol Audio Video Profile - RTP/AVP:
In the RTP/AVP (RTP Audio Video Profile) statement, the format is:
The number assigned to each codec is referred to as its PT or Payload Type. They identify the actual codec against its name. These numbers range between 0 - 127. These Payload Type numbers fit into categories as being either reserved; assigned to a specific codec; unassigned or dynamic. The dynamic range is 96-127, so as the name implies, the PT number allocated to a specific codec in this range could change between different conference sessions. So for example, in the below trace, 122 relates to X-H264UC which is Microsoft's specific version of H.264/SVC. But as it's in the dynamic range, be aware that in another trace, any number in the 96-127 range could then relate to X-H264UC.
Trace of the Video codecs:
The video trace would typically look like:
Microsofts H.264 SVC:
m=video 52884 RTP/AVP 122 121 123 indicates the order of preference to use video codecs is 122 121 123
a=rtpmap:122 X-H264UC indicates that 122 is the PT for Microsofts specific version of H.264 SVC
/90000 indicates the clock rate and is used by all the listed Lync video codecs.
packetization-mode=1 indicates that UCConfig Mode 1 is the maximum supported SVC scalability mode.
UCConfig Mode 1 is the ability to encode separate temporal layers within a single video stream per resolution.
However, whilst both Lync 2013 and Skype for Business 2015 prefer to use Microsofts H.264 SVC that currently supports maximum UCConfig Mode 1, future updates and versions of Skype for Business might support higher UCConfig Modes.
The maximum resolution that H.264 SVC supports is 1920x1080 (1080p) at 30 fps. But this is not advertised in the SDP statement and there is no easy way to identity what this maybe. H.264 SVC requires encoding and decoding by the respective SfB clients; so what resolution can be achieved is really dependent on the SfB clients hardware capabilities. Please see: Appendix E: How well does your PC support Skype for Business 2015 H.264/SVC
Microsofts Real-Time Video - RTV:
a=rtpmap:121 x-rtvc1 indicates that PT 121 is assigned to Microsofts Real-Time Video (RTV) codec and that if H.264/SVC is not supported by the receiving endpoint, then RTV as the second choice.
Unlike the a=rtpmap:122 X-H264UC line relating to H.264/SVC, the section on RTV provides much more details about the Skype for Business clients RTV capabilities in the associated a=x-caps: list.
These indicate that this (Skype for Business 2015) client can support RTV at the following resolutions:-
1920x1080; 1280x720; 640x480; 640x360; 352x288; 424x240 and 176x144
Note: Lync 2010 clients can only support RTV with a maximum 1280x720 resolution at 30 frames per second.
Uneven Level Protection FEC:
a=rtpmap:123 x-ulpfecuc/90000 indicates that PT 123 is assigned to Uneven Level Protection FEC. This is actually used by Lync 2013 clients for out of band forward error correction data separate from the main video stream.
Trace of the Audio codecs:
The audio trace would typically look like:
a=fmtp:104 useinbandfec=1; usedtx=0
a=fmtp:103 useinbandfec=1; usedtx=0
G.722 Stereo is a derivate of G.722 designed to support Lync Room devices that use two microphones for stereo audio pickup.
a=rtpmap:117 G722/8000/2 indicates that PT 117 is assigned to G.722 Stereo. The 8000 is the clock rate, but like G.722, the Internet Assigned Numbers Authority (IANA) records the clock rate as 8000 when the actual sampling rate is 16000Hz. This is caused by an error in RFC 1890, but to support compatibility, Skype for Business 2015 and Lync 2013 clients must declare 8000. The /2 indicates that it's got 2 separate audio channels for stereo.
SILK was developed specially for Skype to replace the SVOPC audio codec used by Skype clients prior to version 4.X and would be used for Lync 2013 <> Skype audio calls.
a=rtpmap:104 SILK/16000 indicates that PT 104 is assigned to SILK with a clock rate of 16000Hz. The associated a=fmtp:104 useinbandfec=1; usedtx=0 indicates that it supports in-band FEC, meaning that any additional error correction media packets are included within the media stream.
Similarly, a=rtpmap:103 SILK/8000 indicates that PT 103 is assigned to SILK with a clock rate of 8000Hz and its associated a=fmtp:103 useinbandfec=1; usedtx=0 indicate that it also supports in-band FEC.
Microsofts Real-Time Audio - RTA:
RTA is Microsofts proprietary audio codec and has both wide-band (16000Hz) and narrow-band (8000Hz) derivatives. Microsoft have made RTA available under license for use by third-party clients and devices.
a=rtpmap:114 x-msrta/16000 indicates that PT 114 is assigned to RTA wide-band (16000Hz). The wide-band version is commonly used for in-house in peer-to-peer calls.
a=rtpmap:115 x-msrta/8000 indicates that PT 115 is assigned to RTA narrow-band (8000Hz). This version is typically used by the Skype for Business 2015 (Lync 2013) client during outbound PSTN calls to the Mediation Server when the available bandwidth is limited; but this then requires the Mediation Server to transcode RTA<>G.711 for the PSTN connection.
If there is lots of bandwidth available, then the Skype for Business 2015 (Lync 2013) client would typically send G.711 directly to the Mediation Server so that this did not have to perform any audio transcoding. Even better, if Media Bypass can be applied, then the Skype for Business 2015 (Lync 2013) client can send G.711 audio directly to the relevant IP PBX, SBC or Media Gateway. For more details about Media Bypass and the Mediation Server, please see Part 2: Skype for Business 2015 Servers, Roles and their Functions.
G.722 is a freely available ITU-T standard 7 kHz wide-band audio codec operating at 48, 56 and 64 kbps, but in practice, data is typically encoded at 64kbps. G.722 is used for VoIP applications on local area networks where bandwidth is readily available. G.722 offers a significant improvement in audio quality when compared to narrow-band codecs such as G.711.
a=rtpmap:9 G722/8000 indicates that PT 9 is assigned to G.722; but as previously mentioned, the IANA records the clock rate as 8000 when it's actually 16000Hz.
G.722.1 is another ITU-T 7 kHz wide-band audio codec operating at 24 and 32 kbps. However, G.722.1 is not a derivative of G.722. It is actually based on Polycom's old Siren 7 codec that they used in the ViewStation.
a=rtpmap:112 G7221/16000 indicates that PT 112 is assigned to G.722.1 with a clock rate of 16000Hz.
SIREN is a family of patented audio codecs originally developed and licensed PictureTel; who were then acquired Polycom. There are currently three derivatives of SIREN, namely SIREN 7, SIREN 14 and SIREN 22 and as their name implies, they support sampling rates of 7, 14 and 22 kHz respectively.
In this example SIREN actually refers to SIREN 7 and as the successor to G.722.1, it provides 7 kHz audio, but at 16, 24 and 32 kbps. (G722.1 only operates at 24 and 32 kbps)
a=rtpmap:111 SIREN/16000 indicates that PT 111 is assigned to SIREN with a clock rate of 16000Hz.
G.711 is the ITU-T audio standard must be used and formed the basis under the umbrella of the H.320 and H.323 video conferencing standards. Also known as Pulse Code Modulation (PCM), G.711 is a commonly used audio codec were the 300-3400 Hz analogue audio is encode at a rate of 8000Hz to provide toll-quality audio in a 64 kbps stream. There are two versions, PCMU (µ-law) is mainly used in North America and PCMA (A-law) which is used in most other countries.
a=rtpmap:0 PCMU/8000 indicates that PT 0 is assigned to PCMU (µ-law) with a clock rate of 8000Hz.
a=rtpmap:8 PCMA/8000 indicates that PT 8 is assigned to PCMA (A-law) with a clock rate of 8000Hz.
G.726 is an ITU-T Adaptive Differential Pulse Code Modulation (ADPCM) speech codec that covers voice at 16, 24, 32 and 40 kbps. G.726 supersedes both G.721 (ADPCM at 32 kbps) and G.723 (ADPCM at 24 & 40 kbps). The most common rate is 32 kbps, which effectively halves the required bandwidth compared to G.711. G.726 is mainly used by international trucks in phone networks and is the audio codec for DECT wireless phones.
a=rtpmap:116 AAL2-G726-32/8000 indicates that PT 116 is assigned to a G.726 stream running at 32 kbps with a clock rate of 8000Hz.
Redundant Audio Data - RED:
Redundant Audio Data - RED is used for out of band forward error correction.
a=rtpmap:97 RED/8000 indicates that PT 97 is assigned to RED with a clock rate of 8000Hz.
Comfort Noise (CN):
Comfort Noise (CN) is synthetic background noise used in audio communications to fill the periods silence in a transmission to prevent users from mistakenly thinking that the connection might have been dropped.
a=rtpmap:13 CN/8000 indicates that PT 13 is assigned to Comfort Noise with a clock rate of 8000Hz.
a=rtpmap:118 CN/16000 indicates that PT 118 is assigned to Comfort Noise with a clock rate of 16000Hz.
DTMF (Dual-Tone Multi-Frequency):
DTMF (Dual-Tone Multi-Frequency) signals are used to support the telephone events (functions) associated with pushing the dial-pad buttons during a call.
There are 16 standard tones assigned to 0-9, *, # plus the four AUTOVON military tones defined as A, B, C and D. These unique tones created by each key are represented by the values 0-15. But it is unclear as to why Microsoft defines 17 tones (0-16) with Skype for Business 2015 and Lync 2013 clients as shown in the associate fmtp attribute.
a=rtpmap:101 telephone-event/8000: indicates that PT 101 is assigned to sending DTMF signals.
a=fmtp:101 0-16 indicates that there are 17 (0-16) unique tones.
Trace of the Application Sharing:
The applicationsharing trace would typically look like:
Microsofts Remote Desktop Protocol - RDP:
Microsoft developed their proprietary Remote Desktop Protocol - RDP and used it within Lync 2013 and Skype for Business 2015 for Application Sharing between clients. RDP is an extension of the ITU-T T.128 standard for Multipoint Applications Sharing that sits under the T.120 umbrella standard that was originally used by Microsofts NetMeeting application.
m=applicationsharing 58545 TCP/RTP/AVP 127 indicates that this particular applicationsharing stream is over IP Port 55545 using RTP that's embedded in TCP packets and assigned a Payload Type of 127
a=rtpmap:127 x-data/90000 indicates that PT 127 refers to sending the data at a clock rate of 90000Hz.
a=x-applicationsharing-media-type:rdp indicates that the actual media is RDP
Note: Other SIP and H.323 standards based videoconferencing applications typically DO NOT use RDP for Application Sharing. Hence, this presents one of the many challenges that needs to be overcome when integrating Skype for Business 2015 and Lync 2013 clients with these other videoconferencing applications, especially if you want to share the Skype for Business clients applications with the other SIP or H.323 participants.
Sharing the SIP or H.323 endpoints Desktop or Applications with the Skype for Business 2015 or Lync 2013 clients is a lesser issue as these typically use BFCP or H.329 that effectively sends the Desktop or Application as a second video stream that the Skype for Business 2015 or Lync 2013 client can understand and display in either a second window or in place of the 'talking heads' video whilst the sharing is active.
For a complete picture, please take a closer look at all the other papers in this series about Skype for Business 2015.
220.127.116.11 Video Source Request (VSR) "http://msdn.microsoft.com/en-us/library/hh659630.aspx"
Media Codecs in Lync 2013 "http://blog.schertz.name/2014/03/media-codecs-in-lync-2013/"
List of Codec "https://en.wikipedia.org/wiki/List_of_codecs"
Microsoft Lync Server 2013 Unleashed. ISBN-13 978-0-672-33615-7