General H.323 and SIP Firewall issues and Protocols:
The table above shows that H.323 and SIP require the use of specific static ports as well as a number of dynamic
ports within the range 1024-65535. For the H.323 and SIP to cross a firewall, the specific static ports and all
ports within the dynamic range must be opened for all traffic. This clearly causes a security issue that could
render a firewall ineffective.
There are several standards based transport protocols used within H.323 and SIP Conferencing. Generally, each
configures the data into packets, with each packet having a 'header' that identifies its contents. The protocol
used is usually determined by the need to have reliable or unreliable communications. Transmission Control Protocol
(TCP) is a reliable protocol designed for transmitting alphanumeric data; it can stop and correct itself
when data is lost. This protocol is used to guarantee sequenced, error-free transmission, but its very nature
can cause delays and reduced throughput. This can be annoying, especially with audio. User Datagram Protocol
(UDP) within the IP stack, is by contrast, an unreliable protocol in which data is lost in preference to
maintaining the flow. Real-Time Protocol (RTP) was developed to handle streaming audio and video and uses
IP Multicast. RTP is a derivative of UDP in which a time-stamp and sequence number is added to the
packet header. This extra information allows the receiving client to re-order out of sequence packets, discard
duplicates and synchronise audio and video after an initial buffering period. Real-Time Control Protocol
(RTCP) is used to control RTP.
Reliable transport is required for control signals and data because they must be received in the proper order and
cannot be lost. Consequently, TCP is used with the H.245 control channel and call control. Unreliable UDP is used
for RAS and H.225 call signalling as well as audio and video streams were time sensitive issues become a priority.
However, H.323 and SIP are not the same and should not be confused. They might share similar codecs such as H.264
video and G.722.1C audio; be supported on the same video conferencing endpoints and use the same IP ports for
media, but they are fundamentally different protocols that use different network and calling procedures (H.323
uses TCP on port 1720 whereas SIP uses UDP or TCP on port 5060 or TCP for TLS on port 5061) that require different
Firewall Traversal solutions.
H.323 endpoints use H.460 NAT/Firewall Traversal whilst SIP endpoints use a SIP Registrar
to cross firewalls (see below for more details).
H.323 and Intelligent Firewalls:
Q.931 is the call signalling protocol used in setting-up and terminating a call. H.323 uses TCP on port 1720 for
Q.931 and negotiates which dynamic port range to use between the endpoints for H.225 call signalling (UDP), H.245
call control parameters (TCP), data, audio and video (UDP). Clearly, to open all ports within the dynamic range
would cause security issues, so the firewall must be able to allow H.323 related traffic through on an intelligent
basis. Some special H.323 intelligent firewall can do this by snooping on the control channel to determine
which dynamic ports are being used and then only allowing these ports to pass traffic when the control channel is
busy. However, most firewalls that state they support H.323 just open port 1720 and you have to make additional
rules to open the endpoints specific TCP and UDP port ranges.
The latest releases of Polycom, Lifesize and
ClearOne endpoint software all allow you to specify the dynamic port ranges to
be used by TCP and UDP. This allows you to reduce the number of ports that need to be open, and hence the security
risk. Furthermore, these latest versions support 'Port Pinholing', so that inbound data can be returned using the
same port as the initiating outbound call. They also support H.460 NAT/Firewall Traversal (see below).
Using NAT to Enhance Security:
When H.323 terminals communicate directly with each other, they must have direct access to each other's IP address.
But this exposes key network information to a potential attacker. By locating the endpoints behind a firewall only
the public addresses are exposed, keeping the majority of address information hidden.
However, conferencing successfully through a firewall depends upon how well the firewall is capable of dealing with
the complexities of the H.323 protocol. If the firewall cannot provide dynamic access control based on looking at
the control channel status, then NAT inside the firewall can be used to map an endpoints internal non-routable IP
address a public IP address and hence provide access control.
When you specify that an endpoint should use NAT, it embeds the outside world IP address of the firewall into
its IP header. This is how the far end system knows the outside world IP address to return the call. The endpoint
cannot use its internal IP address as this is non-routable and you want it hidden. On receiving inbound traffic,
the firewall uses NAT to forward to the traffic to the endpoint. But using NAT can cause issues if you also want
to connect over a VPN (see below).
NAT by itself with H.323 endpoints has a major limitation. By definition, every H.323 endpoint uses port 1720 TCP
to initiate a call; but you can only NAT one internal address to one public address, so to use NAT by itself, you
would need a public IP address for every H.323 endpoint; which is clearly impractical if you want to deploy several
video conferencing devices.
This is where an H.323 Gatekeeper can be used. Since only the Gatekeeper, via RAS on port 1719 and Call Setup on
port 1720 are the only systems that interact with H.323 device outside the firewall, access rules in the firewall
can be set to pass traffic destined for the Gatekeeper or endpoint. But using an H.323 Gatekeeper by itself does
not provide a complete, secure solution. Ideally you need an H.460 NAT/Firewall Traversal solution that incorporates
an H.323 Gatekeeper. (see below)
Using VPN or H.235 Encryption:
Creating a Virtual Private Network (VPN) by definition provides you with your own private network, so as long as
you stay within this network, you do not need any firewalls. However, this is not always possible and you may
have a necessity to conference with others outside your own VPN. This can cause a problem as using NAT is
typically incompatible with routers setup for a VPN.
To call an H.323 endpoint over a VPN, you call it's IP address, which is usually on a different internal network
segment. With NAT enabled, the H.323 endpoint has the external IP address of the firewall in its IP header.
When you make a call over the VPN, this external address is still in the IP header, so the far end system on the
VPN will try to return the call to the external address via the outside world and not over the VPN. The call will
fail, typically with no audio and video. It will work to endpoints on the same internal network segment, but not
to endpoints on different segments. Disabling NAT on the endpoint will allow calls over the VPN, but then you
cannot call outside world endpoints! The solution is to use an H.460 NAT/Firewall Traversal device (see below).
When configuring the VPN, be wary of using a long key and hence applying too much encryption as this can cause an
unacceptable delay in the transmission between sites and impact the overall efficiency of the video conference.
Similarly, enabling H.235 compliant AES Encryption that is supported by most endpoints can have an impact on the
overall efficiency of the conference, especially if low bandwidths are used.
H.460 NAT/Firewall Traversal:
As mentioned above, when H.323 endpoints are set to use NAT, the outside world IP address of the firewall is
embedded in their IP header. This is done so that the far end system knows where to return the call. This is part
of complying to the H.323 protocol. However, this typically causes a problem if have several H.323 endpoints or
when you then want to call another H.323 endpoint over a VPN.
The solution is to implement H.460 NAT/Firewall Traversal or Session Border Controller (SBC). These
typically consist of a two boxes; one outside the firewall in the public domain and the other behind the firewall
on the internal network, which also incorporates an H.323 Gatekeeper function. That said, Edgewater Networks now
have a one box solution that provides both an H.323 Gatekeeper and SIP Registrar.
As depicted in the H.460 NAT/Firewall Traversal
solutions diagram below, the ClearOne Collaborate NetPoint outside the
firewall works in-conjunction with ClearOne's
Collaborate VCB behind the firewall to provide a two box H.460
NAT/Firewall Traversal solution with the Collaborate VCB including Collaborate Central as its embedded H.323
Similarly, Polycom's RealPresence Access Director (RPAD) outside the firewall works in-conjunction with
their Distributed Media Application (DMA) behind the firewall to provide an H.460 NAT/Firewall Traversal
solution with DMA also providing the H.323 Gatekeeper function. The Polycom DMA can also act as a Gateway and
transcode H.323 <> SIP calls.
Edgewater Networks have a one box solution with the
EdgeProtect 4550 that provides both an H.323 Gatekeeper
and SIP Registrar. This is ideally suited for a small office, home office or when there are only a small number
of video conferencing endpoints that need protecting.
Most vendors have now implemented H.460 support into their latest endpoint software revisions.
H.323 endpoints behind the firewall then do not use NAT; they simply register their H.323 ID with the Gatekeeper
using their current internally allocated IP address. H.323 endpoints behind the firewall can then call each other
using their unique H.323 ID, alias or E.164 number and it does not matter if they are on a VPN or not. External
(public) H.323 endpoints would initiate a conference to an endpoint behind the firewall by calling the public IP
address of the firewall solution along with the specific endpoints H.323 ID, alias or E.164 number.
Alternatively, some H.323 endpoints such as the Sony PCS-XG80 have two network interfaces, one that
supports NAT for connecting to the outside world and the other that doesn't for connecting internally.
SIP endpoints generally register using a secure login (User Name & Password) with a SIP Registrar.
This provides them with a unique URI that is then used to call the SIP endpoint. For example, a Polycom
HDX6000 might be allocated a URI of firstname.lastname@example.org which could then be called by other SIP
endpoints to initiate a conference.
The InGate SIParator models are SIP Registrars that provide a secure SIP firewall traversal solution.
They have several network interfaces and would typically reside outside the firewall or in the firewall's DMZ.
The public network interface would be allocated a public IP address and any internal network interfaces would
be allocated a non-routable IP address. Each User ID also defines which network interface it will use at login,
hence securely separating URI and devices on either side of the firewall. Only SIP traffic is routed through
the InGate SIParator and blocked by the firewall. Alternatively, you may use a hosted SIP Registrar from a
The Polycom Distributed Media Application (DMA) can also act as a SIP Registrar and when used
in-conjunction with a Polycom RealPresence Access Director (RPAD), can provide a SIP Firewall Traversal
As depicted in the diagram above, the Edgewater Networks
EdgeProtect 4550 is a one box solution that includes both
an H.323 Gatekeeper and SIP Registrar. It has several network interfaces and would typically reside outside the
firewall or in the firewall's DMZ. The EdgeProtect 4550 is ideally suited for a small office, home office
or when you require only a small number of simultaneous sessions.
SIP traffic is normally routed through the SIP Registrar, so it is this Registrar that determines which media
ports will be used along with which port and protocol is used for call signalling, setup and registration;
5060 UDP, 5060 TCP or if using TLS (Transport Layer Security), 5061 TCP.