SIPCORE D. Worley
Internet-Draft Ariadne Internet Services
Updates: RFC 3263 (if approved) July 6, 2016
Intended status: Standards Track
Expires: January 7, 2017

Contacting Session Initiation Protocol (SIP) Servers in a Dual-Stack IP Network
draft-worley-sipcore-dual-stack-00

Abstract

In a dual-stack (IPv4 and IPv6) environment, the procedures of RFC 3263 by which a Session Initiation Protocol (SIP) client contacts a server may not suffice to provide a good user experience. This document describes "Happy Eyeballs" modifications -- modifications of the procedures of RFC 3263, as well as additional client procedures -- which improve the SIP user experience in many circumstances.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on January 7, 2017.

Copyright Notice

Copyright (c) 2016 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.


Table of Contents

1. Introduction

The sections of this document cover a number of topics which arise in dual-stack environments. As this document matures, some of these topics may be split into seperate documents. The current text is a very rough draft, including proposed requirements, proposed solutions, observations about SIP systems in practice, and design discussions.

2. Target Selection

2.1. Requirements

2.2. Terminology

a "client" is the entity that wishes to send a SIP message

a "target" or "transport target" is an address/port/protocol triple that is an address for the transport layer of the stack. A target is derived from a SIP URI (for a request) or a host-port (for a response).

a "flow" is a sequence of related messages between the client and a target. If the protocol is connection-oriented, the flow encompasses the connection. If the protocol requires cryptographic setup, the flow encompasses the cryptographic session.

a "probe" is an operation executed on a flow by a client to determine whether it can successfully communicate with the target, without changing the SIP dialog state with the target. Probes can take many forms: [RFC5626] keep-alive methods are even lower overhead.)

Note that the sending an OPTIONS request can be used with any(!) protocol. If the OPTIONS reaches the target, the target is required to respond with either a 200 or 483 response (without forwarding it to another entity). Conveniently, a server can respond to such a request statelessly, so such requests are low-overhead. (Although the

2.3. Solution

The current state of the solution (as I know of it) is:

Note that some SIP messages are time-sensitive for the usesr experience (e.g., initial INVITEs), while others are not (e.g., a refreshing REGISTER). A client MAY choose not to apply the following rules for non-time-sensitive messages.

Devices MAY change the target order prescribed by RFC 3263/2782. The device SHOULD follow the Happy Eyeballs rules, viz.:

Devices MAY contact targets in any order, including those obtained via different SRV records, notwithstanding the priority/weight specified in the SRV records. But in doing this, they MUST approximate the behavior specified by RFC 3263, in this sense:

(Note that the relative traffic shares between targets that are *not* derived from different SRV records (e.g., alternative A records for a DNS name) are not constrained by this requirement.)

In general, this means that cached reachability information about targets should time out, causing the behavior of the client to revert to RFC 3263 over time.

(Beware that we have to define "reachable" above to include responsiveness -- a high-priority target that has a 5 sec RTT shouldn't be able to commandeer all of the traffic.)

If a client does not have recent reachability information for the flow to a given target, the client SHOULD probe the flow before sending a request to the target.

This is because in the worst case, sending a request commits the client to waiting for a timeout before it can send a duplicate request to another target. Note that probes do not change the SIP dialog state of any entity, so probes can be sent in parallel to multiple targets.

Reduce client transaction timeouts: Timer B and Timer F are currently 64*T1, which defaults to 32 seconds

It seems that reducing the default T1 from 500 msec to 100 msec suffices for this. It seems that RTT to arbitrary places on the Internet can take as long as 500 msec, but RTT to web servers generally takes 100 msec or less. That argues for reducing T1 to 100 msec, which makes timers B/F 6.4 sec. In practice, SIP servers are likely to have connectivity like web servers. But we want global public SIP to work (e.g., in peer-to-peer SIP), so SIP to arbitrary addresses should only rarely time out.

    1st send is at time 0
    2nd send is at time T1
    3rd send is at time 3*T1
    4th send is at time 7*T1
    5th send is at time 15*T1
    6th send is at time 31*T1
    7th send is at time 63*T1
    timer B fires at time 64*T1, terminating the transmission
    

The retransmission schedule specified by RFC 3261 is:

2.3.1. Problems with reducing T1

Brett Tate notes that there are problems with reducing T1:

3. Client-side NAT

3.1. Requirements

3.2. Discussion

It's clear to me that the problem is *solvable*, because existing SIP systems do handle the client-side NAT problem. E.g., the open-source sipX system has full client-side NAT support. That scheme doesn't require SIP Outbound support in the client at all. NAT support is triggered by the client's requests arriving from an address that is different than what is specified in the request. Support is implemented by manipulating the client's behavior, rewriting requests/responses to substitute IP addresses, and providing (essentially) a TURN server to relay media.

As far as I can remember, sipX's NAT support is recorded and implemented in the standard registration/redirect database. However, NAT support does depend on forcing the client to re-register frequently enough to be assured that the NAT mapping is not released. Since processing re-registrations is by far the bulk of the signaling traffic even without NAT support, this is not a trivial change.

My expectation is that almost all commercial SIP systems have NAT support of this sort.

One difference between this sort of NAT support and Outbound is that NAT support is done only at the registrar/proxy; if there is a separate edge proxy, it only passes UDP messages and can easily be stateless. This might be a significant factor in very large deployments.

Perhaps a significant problem with Outbound is that it has to be implemented in both the phone and the switch, leading to a network effect problem.

At this point, it seems to me that we need to get a better understanding of what people are doing in the market to deal with NATs and find out why they don't use Outbound. (Since Outbound is the standard method, I would think it has a strategic advantage in the technological competition.)

Roman notes:

4. Handoff between Interfaces

4.1. Requirements

Generally, loss of connectivity can be detected by loss of incoming RTCP packets. It looks like the expected RTCP interval is 5 seconds or longer. Intermittent loss of RTP due to network congestion is likely, but we may have to consider detecting loss of RTP as an indicator of loss of connectivity. We have to consider both symmetric loss of connectivity, in which traffic in both directions is lost simultaneously, and asymmetric loss of connectivity, in which traffic in one direction is lost while traffic in the other continues.

Restoring RTP (media) connectivity is straightforward once SIP (signaling) connectivity is restored, by executing a re-INVITE to renegotiate RTP listening ports, etc.

4.2. Restoring Signaling Connectivity

I can see two ways to restore SIP connectivity: (1) sending re-INVITE to perform a target refresh, changing the UA's target URI, and (2) initiating a new dialog by sending an INVITE-with-Replaces to the remote target URI in order to replace the dialog with a new dialog.

In either case, the UA should not attempt to modify/replace the dialog before sending an OPTIONS request and receiving a response from the new interface to the URI that will be targeted by the new INVITE. (The round-trip OPTIONS ensures that there is two-way signaling connectivity to the targeted URI.) If the UA has more than one interface that is still working, it probably needs to probe the target URI using each interface (in parallel), because some URIs may not be reachable from some interfaces.

Sending a re-INVITE is a good method if the UA knows that the first URI in the route set can be reached from the UA's new address (interface). It seems to me that this will often not be the case, particularly when handing off between a carrier mobile network and a private WiFi network.

If the route set of the current dialog cannot be maintained, it is possible to create an entirely new dialog by directing an INVITE-with-Replaces to the remote target URI of the dialog. In a perfect world, the remote target URI is a GRUU, and the connectivity of a new INVITE to the GRUU is assured. Unfortunately there is no guarantee that will work, either.

    A: UA's target
    B: record route URI 1
    C: record route URI 2
    D: record route URI 3
    E: remote target
      

The difficulty is that all the UA knows about the dialog is the route set, and there are no fixed conventions that allow the UA to extract from the route set a URI that can be targeted by an INVITE/Replaces. E.g., if the route set is:

One possibility is probing each route URI with an OPTIONS request. That may not be a reliable test if the URI contains an IP address, especially if the address is in private-use space, as the UA may send the OPTIONS request to a different server that has the same address. Though probably if the URI contains a DNS name, then if the OPTIONS succeeded, it probably reached the same server as the route URI indicates.

Absent any system for indicating which URIs are publicly routable (other than the "gr" parameter for GRUUs), we probably have to rely on the fact that most SIP telephones execute transfers using INVITE/Replaces requests that assume that the remote target URI that they see is publicly routable. As a consequence of this, SIP switches perform machinations to ensure that the remote target URIs seen by phones are publicly routable.

Assuming we can assume that remote target URIs are publicly routable, then we can safely recommend that UAs always use INVITE/Replaces to restore signaling.

4.3. Maintaining the CLIENT'S GRUU

Since we expect a UA to use a GRUU as its target URI so that remote UAs can target the GRUU to reestablish signaling, a UA must ensure that its GRUU routes to all the addresses by which it is reachable. Generally, this means that the UA must update its registration promptly whenever an interface becomes usable.

However, it looks like there may be some ugly consequences of maintaining multiple mappings for a UA's GRUU -- how does a request get routed to the GRUU, serially or parallely? Can one use "Request-Disposition: parallel" to force an OPTIONS request to fork parallely to all of the contacts of a GRUU? The executing UA does not need to know which of the contacts of the remote UA were accessible via the GRUU, but it does need to know quite promptly that some contact of the remote UA is accessible via the GRUU.

OTOH, when the INVITE/Replaces is processed, we don't want it to be delayed due to serial forking to contacts that are no longer accessible, because the timeouts prescribed in SIP are long relative to the time we want handouts to occur in. But perhaps "Request-Disposition: parallel" can be used here, as the first fork of an INVITE/Replaces to reach a UA will be acted upon and generate a 200 response, and any later arrivals from other forks will receive 481 responses.

4.4. Glare

One risk of reestablishing the dialog is that both UAs might attempt to reestablish the dialog at the same time. If both UAs attempt to re-INVITE at the same time, and the invites cross in transit, the "glare" rules will require each UA to reject the other UA's re-INVITE, back off, and resend, as described in RFC 3261 section 14.

If one or both UA uses INVITE/Replaces, various conflicts can occur. It seems to me that the correct way to fix this is to treat the state of "an INVITE/Replaces to revive the dialog is outstanding" as a glare-creating condition that is handled the same way as "a re-INVITE is outstanding".

4.5. Charging Information

Ideally, if a handoff does not take the call outside the domain of a single carrier, the carrier should be given enough information to determine that the new dialog is a logical continuation of the old dialog, so that it can combine the charging records of the two dialogs. In may cases, the carrier can probably determine from the INVITE/Replaces that the new dialog is related to the old dialog. But should there be a rule that requires that the new dialog copy some charging-related information from the old dialog?

5. GRUUs

An essential characteristic of a GRUU is that it's globally accessible. But if the device only implements one address family, or the intervening network carries only one protocol, then a URI isn't accessible to a device that only implements the *other* protocol.

It seems that the theoretical answer is to require a GRUU to be accessible in practice from the global Internet via either address family, but it seems like that would de-GRUU-ize probably most of the GRUUs that are being used in the universe.

This is particularly troublesome if we use GRUUs to solve, e.g., the handoff problem, since a handoff may involve a change of protocol.

6. Security Considerations

There probably aren't any security issues. Copy the security considerations section from draft-ietf-sipcore-dns-dual-stack.

7. IANA Considerations

This document does not require any actions by IANA.

8. Acknowledgments

So far: Brett, Roman

9. References

9.1. Normative References

[RFC3263] Rosenberg, J. and H. Schulzrinne, "Session Initiation Protocol (SIP): Locating SIP Servers", RFC 3263, DOI 10.17487/RFC3263, June 2002.

9.2. Informative References

[RFC5626] Jennings, C., Mahy, R. and F. Audet, "Managing Client-Initiated Connections in the Session Initiation Protocol (SIP)", RFC 5626, DOI 10.17487/RFC5626, October 2009.

Appendix A. Revision History

[Note to RFC Editor: Please remove this entire section upon publication as an RFC.]

Author's Address

Dale R. Worley Ariadne Internet Services 738 Main St. Waltham, MA 02451 US EMail: worley@ariadne.com