;-*-rfc-*-
INTERNET-DRAFT                                                  L. Coene
Internet Engineering Task Force                                  Siemens
Issued:  February 2002                                              
Expires: July 2002                                              


                           Multirouting 
                 <draft-coene-multi-route-00.txt>


Status of this Memo

    This document is an Internet-Draft and is in full conformance with
    all provisions of Section 10 of RFC2026. Internet-Drafts are working
    documents of the Internet Engineering Task Force (IETF), its areas,
    and its working groups.  Note that other groups may also distribute
    working documents as Internet-Drafts.

    Internet-Drafts are draft documents valid for a maximum of six
    months and may be updated, replaced, or obsoleted by other documents
    at any time.  It is inappropriate to use Internet-Drafts as
    reference material or to cite them other than as "work in progress."

    The list of current Internet-Drafts can be accessed at
    http://www.ietf.org/ietf/1id-abstracts.txt The list of
    Internet-Draft Shadow Directories can be accessed at
    http://www.ietf.org/shadow.html


Abstract

    This document describes a way to loadshare the different paths of a
    multihomed SCTP association at the same moment while keeping
    congestion control per path.  The document also describes a possible
    solution to multihoming which would require no routing tables on the
    host and which would try to guarantee non-overlapping multihomed
    paths. It could possibly reduce the growth of the routing table in a
    router. The selection of which link to take would be a local
    one. The solution is similar to the use of links and linksets within
    a routeset in SS7.


                           Table of Contents

Multirouting                                                          ii
Chapter 1: Introduction                                                2
Chapter 2: Loadsharing within a SCTP association                       3
Chapter 3: Multirouting packets in networks                            5
Chapter 4: Considerations                                              6
Chapter 5: Security considerations                                     8
Chapter 6: References and related work                                 9
Chapter 7: Acknowledgments                                             9

Coene                                                           [Page 1]

Draft                     multirouting                     February 2002

Chapter 8: Author's address                                            9


Editors note: this draft is going to be split up in 2 parts: 
- loadsharing within SCTP 
- multirouting packets in IP networks(depending on whether this
 technology is already existing)

1 Introduction

    Multihoming has the potential to solve some Quality-of-service (QOS)
    resilience and relialability problems that exist nowadays in the
    internet. In order to solve these problems, Multihoming must be
    able to use all the paths present in a single association at the
    same time/in parallel. The SCTP specification [RFC2960] only allows
    a single(=primary) path to be active at any given moment. Only when
    this path experience trouble(such as no transmission possible...),
    will another path be used for the transmission of the messages. This
    draft is a attempt to improve this behaviour.


2 Loadsharing within a SCTP association on the host

    A multihomed SCTP association on a host has always more than one
    path to send its traffic over it. The number of paths is dependant
    on the number of IP adddresses exchanged during the setup of the
    association. As each path can have different transmission
    characteristic(such as delay, bandwith, jitter ...etc), separate
    congestion control processing must be done for each path. (Note : in
    future IP addresses may be added and removed "on-the-fly" during the
    active lifetime of the association, this amounts to adding and
    removing of paths to the association [ADDIP]).

    At present, the congestion control information is already kept per
    path as is required in [RFC2960]. The information is updated for the
    primary path by the flow of the traffic and for the alternative
    paths by exchanging heartbeat messages. However the heartbeat timer
    can be very different from the timers used for the congestion
    control per path and retransmission, thus rendering the info from
    the heartbeat useless. Congestion control info concerning a single
    path decays if no traffic is send over that path. To keep the
    congestion info up to date, the timing of sending heartbeats must be
    in the same range as the congestion control timings, which may place
    a burden of not-so-usefull(= they are NOT carrying data) messages on
    the alternate paths.

    For each path within the association, a separate congestion control
    window is to be specified within the transport protocol, as for
    every path its congestion control characteristics may (and will) be
    different(example RTT). This will lead to a seperate congestion
    control per path. Each path should be seen(in TCP terms) as a
    separate TCP connection, with each TCP connection having a different
    path/route through the network.


Coene                                                           [Page 2]

Draft                     multirouting                     February 2002

    If all paths are in use(assuming enough traffic is sent/received),
    then all congestion control info for every path will remain up to
    date. This will make a change-over more smoothly and traffic can be
    distributed from the failed path to all the remaining active paths,
    thus smoothing the change-over. The present SCTP changeover works
    the following: one path active, all others in standby and a
    changeover is from the previously single active to a single standby
    path.  The scheme allows also the endpoints to choose whether all
    paths will be active in parallel or that there will be some standby
    paths in addition to the active paths.

    When all paths are in use it is up to some form of distributor
    function in SCTP to distribute the traffic across the different
    paths. The distributor function is a implementation dependant
    function which can have different, sometimes conflicting
    functions. Example the distributor can try to obtain a certain
    message transfer rate accross the complete association, another kind
    of distributor can try to load up all paths up till maximun capacity
    with all paths doing SCTP/TCP friendly congestion control. Other
    distributors may try to minimalise the delay or jitter. For that
    they would need some feedback from the remote side on top of the
    already existing SCTP congestion control mechanism. If that is the
    case then a SCTP extension may be needed.

    A SCTP implementation which does NOT support parallel usage of its
    paths must be able to communicate with an implementation which can
    support this. As no new additions to the SCTP protocol are required,
    that would mean that a SCTP full-path(meaning all paths are used in
    parallel)implementation would NOT break a SCTP single-path
    implementation. The single-path will answer the SACK the received
    messages to the source address of the messages. If a SACK is send
    back spanning multiple paths, each of the paths congestion control
    info will be updated per RFC2960.

    The application can do at present this by specifying the primary
    path before sendng a message to SCTP.


3 Multirouting  messages in the network


    In order to obtain the greatest advantages of multihoming, the paths
    within an association should be as distinct as possible. This cannot
    always be guaranteed, for example due to problems occuring in
    networks. As a path is really a collection of subsequent nodes and
    links between nodes, a path selection at the host really means
    taking a certain link towards a node.

    At the next hop a link is selected using the routing information
    present in the packet(wow the IP address). Multiple links can route
    towards the same, required destination. The way in which these links
    are selected can be diverse.


Coene                                                           [Page 3]

Draft                     multirouting                     February 2002

    In present IP networks, every path has a distinct IP address, thus
    the complete IP address(not really, the prefix instead) becomes the
    link selector. Because there are a lot of prefixes in the network,
    that would mean that there are a lot of link selections to be made,
    increasing the size of the routing/selection tables. This is however
    what is now happening with the present IP multihoming architecture.

    In order to keep the present multihoming solutions working, the
    proposed solution should not adversely impact the present
    multihoming architecture(using different IP addresses for each
    path).

    The solution should allow for the selection of the link on which to
    send out the message. The selection criterion can be contained in
    the:

    - IP address(example IPv4/v6 address prefix...)

    - outside the IP address(example IPv6 flowlabel, IPv4 TOS field)

    This would also leave the transport layer in control of which
    path/link to send the msg out on, thus preserving the end-to-end
    principle.

    A Link selection parameter would be in teh IP network layer used to
    specify the path to be taken in the host, this would implicitly
    specify the outgoing interface/link on the host.  On the transport
    level for each link selection, a separate congestion control window
    is to be specified within the transport protocol, as for every path
    its congestion control characteristics may (and will) be
    different(example RTT). This will lead to a seperate congestion
    control per Link selection and implicitely per path. However there
    is then the requirement for routers to do something to keep msg
    with the same (destination and source) IP address and link selection
    on the same path(see paragraph on routers).

    In order to limit the number of congestion control windows in the
    transport layer on the host, an upper limit may be specified on the
    Link selection field (example 16), so that in this example the
    transport TCB would have maximal 16 congestion control windows
    stored. If less than the maximal number of LS are used, then this
    would mean that not all possible paths may be used during message
    transmission.(example only LS 0..4 is used, because the host has only 5
    interfaces, then if somewhere in a router within the network, more
    than 5 links lead to the same prefix, the 6th and higher links will
    never be used by the traffic of this association).

    It is not envision to make this parameter LS a negotiated feature
    between the end points, as the endpoint has no view whatsoever on
    the number of links associated with a prefix at a router and thus
    may be underutilising the number of links avialable on its path.

    The possible Link selection choices are detailed in the following
    paragraphs.

Coene                                                           [Page 4]

Draft                     multirouting                     February 2002


3.1 a part of the IP address used as Link selection

    The selection based on the IP address can done in 2 ways: either use
    the most significant part of the IP address or use the least
    significant part of the IP address to make a distinction between 2
    or more links.

    The present way is to use the Most significant part(or better called
    a different prefix) of the IP address. Some of the more disturbing
    features of this solution are described in [SCTPMULTI] and [DRCN200].

    A alternative way is to use the least significant part of the IP
    address, meaning that the node would be addressed via a single IP
    address where for example the last 4 bits indicate which link the
    message has to go out on.

    The prefix used to send to this destination would be advertised in
    all of the networks this host is attached to and the routers would
    allow to route to this destination. This has particular difficulties
    which will be described in the next following paragraphs.

    The least 4 bits allow for up to different 16 links to be used. If
    more links are needed then the number of bits may be augmented but
    then the selection field will become variable from host to host and
    will generate more problems on the host and in the
    network. Therefore 4 bits as a fixed length is advisable and from
    the experience of other network which use similar technologies, 16
    is a good upper limit.

    If less than 16 links are avialable, then the bits may remain unused
    by the host or can be mapped by the host onto the actually present
    links. The bits themselves are not changed by the host, they can be
    used further down in the network for selecting a link towards the
    destination. Example of a mapping is : 2 links, 0 selects link 0, 1
    selects Link 1, 2 selects link 0, 3 selects link 1, etc...

    The link selection(LS) bit (or whatever name that is suited) must be
    be used to specify the path to be followed as otherwise transport
    layer congestion control algorithms may go haywire. If a router has
    more than one link towards a certain destination, and the message
    travels through a number of routers with this capability, that would
    necessary mean that there are(at least in theory) a infinite number
    of paths toward the destination which can be in used at the same
    time. This might be a problem for the present congestion control
    algorithms in TCP and SCTP(this has not yet exhaustivily researched,
    so this is at this present moment a typical research issue, see
    chapter x)

    The congestion control in the transport layer is done on a per path
    basis. If the linkselection is always used to select the same link
    from router to router(except in the case where the linked failed and
    other links have to take over the traffic) that would mean that for
    the address(prefix) with a certain link selection, it would take the

Coene                                                           [Page 5]

Draft                     multirouting                     February 2002

    same path through the network, giving the clasic SCTP(and TCP)
    congestion control algorithms its chance to do its job.  The LS must
    not be changed by a router as it would change the path taken through
    the network(contrary to SLS selection rotation in ANSI SS7
    networks). The result of this concept is that we get at most 2^n
    paths (n = number of LS bits used) paths through the network which
    would also limit the maximal number of congestion control variable
    sets used in SCTP. This is much more manageable than an infinite
    number of paths through the network.  If a changeover occurred, then
    the traffic of the failed link would be moved to another link and
    congestion would (surely) occur and the congeestion algorithm would
    deal with it via reducing the traffic. If the number of links is at
    most 50% of the link selction combinations(example: LS = 4 bits ->
    16 combinations and we have 8 links -> with a random distribution
    that would mean that every link gets the traffic of 2 LS -> if a
    link fails then the 2 LS go each to a different link, thus getting a
    traffic distributions of 50% on each takeover link(be reminded that
    this link has its own traffic to carry from 2 LS and it get extra
    traffic from 1 LS), easing the transient effects of the changeover.

    It is advisable to include the destination and source IP address in
    the link selection algorithm. This would distribute the traffic more
    evenly over the active links of the host or router. It should be
    noted that such algorithms are implementation dependant and they
    would not be the same on all routers.

3.2 Impact of Link selection on SCTP

3.2.1 LS using IP address

    If a LS uses the Most significant part of the IP address, then for
    every LS there is a different IP address (with a different prefix,
    of course). This allows the classical use of SCTP as SCTP at this
    present moment uses multihoming by specifying the different IP
    addresses.

    If a LS uses the least significant part of the IP address, then as
    in the previous case, there is for every LS a different IP
    addres (however now with the same prefix). This will still allow the
    classical use of SCTP.

3.2.2 LS outside the IP address

    If a field outside the IP address is used, then changes may be
    required to SCTP for transporting the different path selector(=
    link selector) between the 2 endnodes


    Editors note: 

    - take a look at OSPF which may have a similar feature , for routes
    with different metrics(1 versus 2 , traffic is distributed 66% - 33%
    or another distribution) See paragraphs on equal-cost multipaths in
    the OSPF spec.

Coene                                                           [Page 6]

Draft                     multirouting                     February 2002


    - take a look a the virtual router redundancy WG


4 Considerations.  

    The solution proposed has shown its merits in SS7 networks where it
    is heavely used. The reason why it works has to do with the
    transactional nature of the messages flowing through a SS7
    network. That means that congestion does only occur occasionaly and
    not like in internets, continously.

    The following extreme cases may happen when this scheme is put into
    operation: 

Congestion control

    - Negative:  every msg will follow a different path towards the
    remote end, it will be very difficult for the end-to-end congestion
    control of SCTP(and TCP) to do its proper congestion control. Up
    till this moment, SCTP executes its congestion control algorithm
    across the complete association with the explicit notion that only
    one source-destination transport address pair(= a single path) out
    of a bunch of multihomed addresses(= paths) is used for the data
    transfer. Thus the congestion control is in fact only active on the
    active path(and there is only one active path allowed according to
    [RFC2960]). There are exception for lost SACKs that they may take
    the alternate paths but these should be regarded as exceptions, not
    the rule.

    It could therefore be very interesting to use and study SCTP with
    loaddistribution across all its paths to see if expanding really
    the congestion control across all paths of the association would
    break end-to-end congestion control (or not), augment the
    throughput (or not). It would at least give a clue if end-to-end
    congestion control would continue to work in a enviroment where both
    hosts and routers would have multiple routes with loadsharing at the
    same time towards a certain destination/network.


    If all paths are in use (and no selection mechanism is used), then if
    along a path, messages get dropped, then congestion control will
    kick in and reduce traffic, not only for that path but for all
    paths. That means that on the other paths, traffic is reduced, even
    if there was no congestion on those paths. So the throughput will be
    reduced significantly. This would mean that the case without
    path selection has always less throughput than the case with
    path selection.
  
    - positive: the positive case should be the reverse of the negative
    case. End-to-end congestion control would be accross the congestion
    of the networks as a whole and not of the congestion of links and
    routers across a certain path. That would also indicate that the
    throughput could be higher(not lineair with the number of links but

Coene                                                           [Page 7]

Draft                     multirouting                     February 2002

    better. It would also put to better use spare capacity (if it existed
    in a network).

Addresses

    Some addres classes may simply not be suited for this approach as
    the last bits of the address are factory fabricated and thus may
    clash with adresses of other interfaces in the same host or router.

Routing protocols

    Routing protocols should be distributing prefixes according to
    routesets and not to links. A routeset may consists of one or more
    links. If you have 2 or more links in a routeset for a certain
    destination, if one link went down, then traffic would still flow
    uninterrupted across the other link or take a completely different
    route(and it would depend on the amount of traffic that went through
    this link or router). This would also mean that the flow throughput
    of the complete association would be reduced, not cut off, thus
    giving end-to-end congestion control algorithms a better change to
    react and the changeover would be far less disruptive.

    The routing protocols would then try to find an altenative link and
    add it to the routeset or wait till the failed link or router gets
    back into operation. This would be far less disruptive for any
    traffic coming through the neighbourhood.

Different path characteristics:

    If a stream with in-sequence delivery is required by SCTP, splitting
    the traffic up between 2 or more paths(with radical different
    transmission characteristic such as short versus long delays), may
    lead to large SACKs, due to the large number of Gap reports....

Editors note: elaborate on this further in the next version... 


5 Security considerations


    To be completed.

6 References and related work


    [RFC2960] Stewart, R. R., Xie, Q., Morneault, K., Sharp, C. , ,
    Schwarzbauer, H. J., Taylor, T., Rytina, I., Kalla, M., Zhang,
    L. and Paxson, V."Stream Control Transmission Protocol", RFC2960,
    October 2000.

    [ROUTER] Draves, R., "Default router preferences and more-specific
    routes",draft-ietf-ipngwg-router-selection-00.txt, work in progress

    [INGRES] Draves, R., "Ingress filtering, Site multihoming and source

Coene                                                           [Page 8]

Draft                     multirouting                     February 2002

    adddress selection", draft-draves-ipngwg-ingress-filtering-00.txt,
    work in progress

    [ADDRSEL] Draves, R., "Default Address selection for IPv6",
    draft-ietf-ipngwg-default-addr-select-00.txt, work in progress

    [SCTPMULTI] Coene, L(Ed.), "Multihoming issues in the Stream Control
    Transmission Protocol", draft-coene-sctp-multihome-03.txt, work in
    progress

    [DRSCN2000] http://www.sctp.de/papers/drcn2000.pdf


7 Acknowledgments

    The authors wish to thank M. Tuexen, ... and many others for their
    invaluable comments.


8 Author's address

    Lode Coene                  Phone: +32-14-252081
    Siemens Atea                EMail: lode.coene@siemens.atea.be
    Atealaan 34
    B-2200    Herentals
    Belgium


Coene                                                           [Page 9]