Internet-Draft MP-BGP Extension for 4map6 Advertisement October 2023
Xie, et al. Expires 24 April 2024 [Page]
Workgroup:
Network Working Group
Internet-Draft:
draft-xie-idr-mpbgp-extension-4map6-07
Published:
Intended Status:
Standards Track
Expires:
Authors:
C. Xie
China Telecom
G. Dong
China Telecom
X. Li
CERNET Center/Tsinghua University
G. Han
Indirection Network Inc.
L. Song
Alibaba Cloud

MP-BGP Extension and the Procedures for IPv4/IPv6 Mapping Advertisement

Abstract

This document defines MP-BGP extension and the procedures for IPv4 service delivery in multi-domain IPv6-only underlay networks. It defines a new BGP path attribute known as the "4map6" to be used in conjunction with the existing AFI/SAFI for IPv4 and IPv6. This attribute with associate an IPv4/IPv6 address mapping rule that will allow IPv4 traffic to cross IPv6-only domains. The behavior of each type of network (IPv4 and IPv6) also illustrated.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 24 April 2024.

Table of Contents

1. Introduction

The document [I-D.draft-ietf-v6ops-framework-md-ipv6only-underlay] proposes a framework for deploying IPv6-only as the underlay in multi-domain networks, in which IPv4 packets will be stateless translated or encapsulated into IPv6 ones for transmission across IPv6-only underlay domains. To achieve this goal, this framework introduces a specific data structure called IPv4/IPv6 address mapping rule to support stateless IPv4-IPv6 packet conversion at the edge of the network. For brevity, in the rest of the document, we will refer to the IPv4/IPv6 address mapping rule as mapping rule. For an incoming IPv4 packet, the mapping rules are used by the ingress PE to generate corresponding IPv6 source and destination addresses from the IPv4 source and destination address of the original IPv4 packet, and vice versa. Since the mapping rule for the destination IPv4 address can identify the right PE egress by providing the IPv6 mapping prefix, it gives the direction of IPv4 service data transmission throughout the IPv6-only network. It is obvious that the exchange of the mapping rule corresponding to the destination IPv4 address in a packet should precede to the process of IPv4 data transmission in IPv6-only network, otherwise, the data originated from IPv4 network will be dropped due to the absence of the IPv6 mapping prefix corresponding to its destination address.

When an ingress PE processes the incoming IPv4 packets, the mapping rule for the source address can be obtained locally, but for the mapping rule of the destination address, since it is not generated locally by the ingress PE, it needs corresponding methods to be obtained remotely. This document defines MP-BGP extension in which BGP update message contains the mapping rule for IPv4 service delivery. The extensions include new BGP Path Attribute known as the "4map6" corresponding to the NLRI and a set of related procedures.

1.1. Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14[RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

2. Terminology and Reference Topology

In the context of this document, multi-domain underlay networks refer to a network system composed of multiple autonomous systems (i.e., AS) interconnected, each AS can serve different scenarios. Multi-domain networks can be operated by one or more network operators. Consider the following scenarios, the network shown in figure 1 is typical multi-domain IPv6-only underlay networks, it is used as a basic scenario to illustrate the extension of the MP-BGP and its related procedures in this document. The whole network comprises of AS1, AS2 and AS3, it provides IPv4 services communications between IPv4 network N1 and IPv4 network N2, which have IPv4 address block IPv4 A1 and A2 respectively. It is consistent with section 6 of draft [I-D.ietf-v6ops-framework-md-ipv6only-underlay].


  IPv4 A1          +-------+       +-+       +-----+        IPv4 A2
 +---------+     /    AS1    \    /AS2\    /   AS3   \     +---------+
|   IPv4    |   |+--++  +---+ |  |+--+ |  | +--+ +--+ |   |   IPv4   |
| network N1|---||PE1|--| P1|-|--||P2|-|--|-|P3|-|PE2|-|---|network N2|
 +---------+    |+---+  +---+ |  |+--+ |  | +--+ +--+ |    +---------+
                 \           /    \   /    \         /
                   +-------+       +-+       +------+
 Figure 1.Topology of Typical Multi-domain IPv6-only Networks

PE and P routers are network devices which constitute the IPv6-only underlay. The definition of PE and P is consistent with that in draft [I-D.ietf-v6ops-framework-md-ipv6only-underlay]. It should be noted that in multi-domain networks, some ASBRs are not at the edge of the network. In this case, they run as P routers. On each PE router that the IPv4 address prefix is reachable through, there is a locally configured IPv6 virtual interface (VIF) address. The VIF address, as an ordinary global IPv6 /128 address, must also be injected into the IPv6 IGP so that it is reachable across the multi-domain transit core.

Although network diagram above contains only several autonomous systems, the approach in this document can be applied not only to the case of closed network(i.e. Walled Garden), but also to that of open Internet, as long as the network devices in the autonomous systems support the functions defined in this document.

The following term will be used in this document,

• Distance metric, the distance to the egress PE in terms of the number of ASes.

The extension of MP-BGP for mapping rule processing and transmission across domains in this document will involve PE and P routers. Each PE or P router maintains a Mapping rule Database (MD) as depicted in figure 2. The entry in the MD database consists of an IPv4 address prefix, IPv4 address prefix length, IPv6 mapping prefix of the PE, IPv6 mapping prefix length and the distance to the egress. It should be noted that the database here is just an example, and developers can design the structure of database according to the actual situation.

+----------+----------------+----------+---------------+------------+
|  IPv4    |  IPv4          |  IPv6    | IPv6          | Distance   |
|  Address |  Address       |  Mapping | Mapping       | to the     |
|  Prefix  |  Prefix Length |  Prefix  | Prefix Length | Egress     |
+----------+----------------+----------+---------------+------------+
     Figure 2: Entry of Mapping Rule Database

The IPv4 packet sent from IPv4 network N1 will traverse the IPv6-only network and reach the destination network, i.e., IPv4 network N2. Its source address and destination address are within IPv4 address block A1 and A2 respectively. Its ingress in the IPv6-only network is PE1 and the egress is PE2. Before the data packet is transmitted, the address mapping rules corresponding to IPv4 address block A2 should be transmitted from PE2 to PE1. During the mapping rule announcement and transmission process, it may pass through the intermediate nodes, such as P3, P2 and P1, and finally reaches PE1. For a given intermediate P node, it may receive advertisement messages of this mapping rule from multiple upstream intermediate nodes. In order to reduce the overall quantity of advertisement message, it needs to select and update the local MD database, generates new advertisement messages based on the selected mapping rule information and transmit them to downstream intermediate nodes or PE routers.

This mechanism is also in line with the requirements of emerging scenarios such as DCN for AI infra fabric, as described in Appendix A.

3. MP-BGP Protocol Extension

3.1. NLRI Encoding for Mapping Rule Advertisement

This document specifies a way in which BGP protocol can be used by a given PE to tell other PE, "If you need to send IPv4 packet whose destination address is within a given IPv4 address block, please send them to me, here's the information you need to properly transform the IPv4 packets into IPv6 ones". Multiprotocol BGP (MP-BGP) [RFC4760] specifies that the set of usable next-hop address families is determined by the Address Family Identifier (AFI) and the Subsequent Address Family Identifier (SAFI). [RFC8950] specifies the extensions to allow advertisement of IPv4 NLRI or VPN IPv4 NLRI with a next-hop address that belongs to the IPv6 protocol. This document specifies the extensions necessary to support the transmission of mapping rule from any egress PE to any ingress PE within and across domains. Since it is based on IPv6-only routing paradigm, it leverages the combination of AFI and SAFI, with the value of 2 and 1 respectively, which identifies Network Layer Reachability Information (NLRI) used for unicast forwarding in IPv6 network. In addition, in order to identify that this BGP update message is used for the transmission of the mapping rule, it needs to contain a newly defined BGP path attribute type -- the 4map6 attribute. With this attribute, the IPv6 mapping prefix and IPv4 address block can be identified from NLRI,other information can also be obtained to properly transform the IPv4 packets. The BGP update whose MP_REACH_NLRI attribute contains the AFI/SAFI combinations and 4map6 BGP path attribute specified above is called as 4map6 routing information. The use and meaning of the fields of MP_REACH_NLRI in this case are as follows:

– AFI = 2 (IPv6)

– SAFI = 1 (Unicast)

– Length of Next Hop

– Network Address of Next Hop = When a BGP speaker advertises the 4map6 NLRI via BGP, it uses its own address as the BGP next hop in the MP_REACH_NLRI.

– NLRI = Composite IPv6 address prefix, which is composed of a IPv6 mapping prefix, the original IPv4 address prefix, and the remaining bits are zero.

The NLRI field is encoded as shown in figure 3:

                  +----------------------------+
                  |       Length    1 octet    |
                  +----------------------------+
                  |       Prefix    variable   |
                  +----------------------------+
                 Figure 3: Format of NLRI Field

3.2. 4map6 BGP Path Attribute

As a new BGP path attribute defined in this document, 4map6 attribute is optional and transitive, it requires IANA to assign a new BGP path attribute value. The attribute is composed of a set of fields as below,

           +---------------------------------------------------+
           |     Length of IPv6 Mapping Prefix(1 octet)       |
           +---------------------------------------------------+
           |     Forwarding Type(1 octet)                     |
           +---------------------------------------------------+
           |     Address Origin Type(1 octet)                 |
           +---------------------------------------------------+
           |     IPv4 Original ASN (4 octets)                 |
           +---------------------------------------------------+

                 Figure 4:Encoding of the 4map6 attribute

The use and meaning of these fields are as follows:

a) Length of IPv6 Mapping Prefix

This is a 1-octet field whose value indicates the length of IPv6 mapping prefix.

b) Forwarding Type

This field identifies the IPv4/IPv6 forwarding capability of the egress PE, the data octet can assume the following values:

Value Meaning

0 Translation and encapsulation

1 Encapsulation

2 Translation

c) Address Origin Type

The data octet can assume the following value:

Value Meaning

0 Local

1 Relay

d) IPv4 Original ASN

This field is the copy of the Origin AS number in BGP update message received from IPv4 domain. The value of this field exists only when the value of "Address Origin Type" is 1, otherwise it is NULL.

In addition, when the value of IPv4 Original ASN is set, ATTR_ SET attribute(type code 128), defined in [RFC 6368], can be used to transfer the routing information of the IPv4 network in multi-domain IPv6-only networks.

3.3. Explicit Withdrawal of IPv4/IPv6 Mapping Rule

When a PE ceases to provide egress service for a given IPv4 address block, it may explicitly withdraw the mapping rules associated with it. Suppose a PE has announced, on a given BGP session, the mapping rule of a given IPv4 address prefix and it now wishes to withdraw that mapping rule. To do so, it may send a BGP UPDATE message with an MP_UNREACH_NLRI attribute.

This encoding of MP_UNREACH_NLRI attribute is used for explicitly withdrawing the mapping rule for a given IPv4 prefix (on a given BGP session). Note that IPv4 address prefix/IPv6 mapping prefix bindings that were not advertised on the given session can not be withdrawn by this method.

When using an MP_UNREACH_NLRI attribute to withdraw a IPv4 route whose NLRI was previously specified in an MP_REACH_NLRI attribute, the lengths and values of the respective prefixes must match, and the respective AFI/SAFIs must match. An explicit withdrawal in an AFI/SAFI UPDATE on a given BGP session not only withdraws the binding between the IPv4 address prefix and the IPv6 mapping prefix, it also withdraws the path to that prefix that was previously advertised in an UPDATE on that session.

4. Operation

4.1. Advertisement of Mapping Rule Update by egress PE

When a PE router learns IPv4 routing information from the locally attached IPv4 access networks, the control plane of the PE should process the information as follows:

1. Install and maintain local IPv4 routing information in the IPv4 routing database.

2. Install and maintain new entries in the MD database. Each entry should consist of the IPv4 address prefix and the local IPv6 mapping prefix.

3. Advertise the content of each entry in the local MD database in the form of BGP update advertisement to IPv6 peer routers. The process to generate IPv6 route advertisement with 4map6 attribute based on IPv4 route advertisement messages is as follows:

a) Set the values of AFI and SAFI in MP_REACH_NLRI to 2 and 1 respectively;

b) The IPv6 mapping prefix of the egress PE splices IPv4 address blocks in IPv4 routing advertisements to form a composite IPv6 address prefix with the length value denoted by L1. The composite IPv6 address prefix is copied to address prefix field of the NLRI structure in the MP_ REACH_NLRI, and the length field of the NLRI is set to L1, the structure of the composite IPv6 address prefix in NLRI is shown in figure 5. L2 is used to denote the length of the IPv6 mapping prefix of PE2, i.e. Pref6-2. When the value of L2 is available, the field of Length of IPv6 Mapping Prefix in the 4map6 attribute is set to L2.

c) The value of Origin ASN in the original IPv4 route advertisement is copied to the field of IPv4 Original ASN of 4map6 attribute, the values of Length of AS_ Path, AS_Path are copied to the corresponding fields of ATTR_ SET attribute respectively.

         |--------L2--------|
         +------------------+------------------+-------------+
         |  IPv6 Mapping    |   IPv4           |  ...0000... |
         |  Prefix of PE2   |   address prefix |             |
         +------------------+------------------+-------------+
         |-----------------L1------------------|
              Figure 5:Structure of IPv6 prefix in NLRI

4.2. Receiving Mapping Rule advertisement by P router

When a P router receives BGP update advertisement from neighboring P or PE routers and uses that information to populate the local MD database, the following procedures are used to update the MD database and send mapping rule advertisement to next equipment:

1. Validate the received BGP update advertisement as 4map6 routing information by finding the 4map6 attribute.

2. Extract the IPv4 address prefix which is encoded in positions L2 to L1-1 of the NLRI field and lookup its local MD database, if an entry which matches the IPv4 address prefix is found, then,

– Compare the distance metric in the 4map6 attribute of BGP advertisement and that of the entry found, if the former is less than the latter, then

• Update the entry found in the MD database with the attributes of BGP advertisement by extracting the IPv6 address prefix from the IPv6 mapping prefix field and place that as an associated entry next to the IPv4 network index.

• Advertise the updated content of the entry found in the form of MP_REACH_NLRI update information to IPv6 peer routers.

else then

• Keep the entry in the MD database unchanged.

• Advertise the content of the entry found in the form of BGP update advertisement to IPv6 peer routers.

else then

– Install and maintain a new entry in the MD database with the extracted IPv4 prefix, its corresponding IPv6 mapping prefix and distance metric to the egress.

– Advertise the content of the entry found in the form of BGP update advertisement to IPv6 peer routers.

It should be noted that this process does not change or affect the IPv6 FIB table of the P router.

4.3. Receiving Mapping Rule Update by Ingress PE

When a PE router receives BGP advertisement from neighboring P or PE routers and uses that information to populate the local MD database and the BGP routing database, the following procedures are used to update the MD database and send IPv4 routing information to its IPv4 peers.

1. Validate the received BGP update advertisement as 4map6 routing information by finding the 4map6 attribute.

2. Extract the IPv6 Mapping Prefix which is encoded in positions 0 to L2-1 of the NLRI field and compare the obtained IPv6 Mapping Prefix with its own IPv6 Mapping Prefix, and if the two match, proceed to the next step. Otherwise, this update will be announced to its other BGP Peers.

3. Extract the IPv4 address prefix which is encoded in positions L2 to L1-1 of the NLRI field and lookup in the MD database, if an entry which matches the IPv4 address prefix is found, then,

– Compare the distance metric in the BGP advertisement and that of the entry found, if the former is less than the latter, then

• Update the entry found in the MD database with the 4map6 attributes of BGP advertisement by extracting the IPv6 address prefix from the IPv6 mapping prefix field and place that as an associated entry next to the IPv4 network index.

• Redistribute the new 4map6 routing information to the local IPv4 routing table. Set the destination network prefix as the extracted IPv4 address prefix, set the Next Hop as Null, and set the OUTPUT Interface as the 4map6 VIF on the local PE router.

else then

• Keep the entry in the MD database unchanged.

else then

– Install and maintain a new entry in the MD database with the extracted IPv4 prefix, its corresponding IPv6 mapping prefix and distance metric to the egress.

– Redistribute the new 4map6 routing information to the local IPv4 routing table. Set the destination network prefix as the extracted IPv4 address prefix, set the Next Hop as Null, and set the OUTPUT Interface as the 4map6 VIF on the local PE router.

As mentioned in [I-D./draft-ietf-v6ops-framework-md-ipv6only-underlay], multi-domain IPv6-only networks support both translation and encapsulation technologies for IPv4 data delivery at the data forwarding layer. Take the encapsulation as an example, the reachability to the egress endpoint of tunnel may change over time, directly impacting the feasibility of the IPv4 service delivery. A tunnel that is not feasible at some moment may become feasible at later time when its egress endpoint address is reachable. The router may start using the newly feasible tunnel instead of an existing one. This may happen for translation-based data-path as well. How this decision is made is outside the scope of this document.

5. Error Handling

When a BGP speaker encounters an error while parsing the 4map6 path attribute, the speaker must treat the update as a withdrawal of existing routes to the included 4map6 SAFI NLRIs, or discard the update if no such routes exist. A log entry should be raised for local analysis.

6. IANA Considerations

With this document IANA is requested to allocate the following codes,

1)A code for 4map6 path attribute in the BGP “BGP Path Attributes” registry

2)Value xx for 4map6 in the BGP "Capability Codes" registry

All the codes above use this document as the reference.

7. Security Considerations

This extension to MP-BGP does not change the underlying security issues inherent in the existing MP-BGP. One case that needs to be considered is that the new mechanism supports the translation of IPv4 to IPv6 and back to IPv4, the packets may go around some filtering that exists in their original network. To address this issue, it is recommended to configure corresponding filtering in IPv6-only networks to handle packets converted from IPv4 packets. However, this is beyond the scope of this document and can be discussed in other documents.

8. References

8.1. Normative References

[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/info/rfc2119>.
[RFC4760]
Bates, T., Chandra, R., Katz, D., and Y. Rekhter, "Multiprotocol Extensions for BGP-4", RFC 4760, DOI 10.17487/RFC4760, , <https://www.rfc-editor.org/info/rfc4760>.
[RFC5492]
Scudder, J. and R. Chandra, "Capabilities Advertisement with BGP-4", RFC 5492, DOI 10.17487/RFC5492, , <https://www.rfc-editor.org/info/rfc5492>.
[RFC6368]
Marques, P., Raszuk, R., Patel, K., Kumaki, K., and T. Yamagata, "Internal BGP as the Provider/Customer Edge Protocol for BGP/MPLS IP Virtual Private Networks (VPNs)", RFC 6368, DOI 10.17487/RFC6368, , <https://www.rfc-editor.org/info/rfc6368>.
[RFC8174]
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <https://www.rfc-editor.org/info/rfc8174>.
[RFC8950]
Litkowski, S., Agrawal, S., Ananthamurthy, K., and K. Patel, "Advertising IPv4 Network Layer Reachability Information (NLRI) with an IPv6 Next Hop", RFC 8950, DOI 10.17487/RFC8950, , <https://www.rfc-editor.org/info/rfc8950>.

8.2. Informative References

[I-D.ietf-v6ops-framework-md-ipv6only-underlay]
Xie, C., Ma, C., Li, X., Mishra, G. S., Boucadair, M., and T. Graf, "Framework of Multi-domain IPv6-only Underlay Networks and IPv4-as-a-Service", Work in Progress, Internet-Draft, draft-ietf-v6ops-framework-md-ipv6only-underlay-03, , <https://datatracker.ietf.org/doc/html/draft-ietf-v6ops-framework-md-ipv6only-underlay-03>.

Appendix A. Contributors

The following people have contributed to this document:

Congxiao Bao
CERNET Center/Tsinghua University
Zhongfeng Guo
Alibaba Cloud

Appendix B. IPv6-only DCN for AI-infra fabric

There is enormous "East-West" traffic inside the data center network, which are the flows between DC devices and applications. Upgrading the DCN network firstly to dual-stack, then IPv6-only is nontrivial. One exmaple is building AI-infra fabric on IPv6 only fabric which reduce data plane encapsulation overhead, simplify forwarding chip's feature and improve data plane performance.

When DCN plans to transits from dual stack to IPv6-only, it is impossible to be done overnight. Considerations and plans should be made supporting legacy IPv4 servers and applications when the DCN is IPv6-only. The IPv6-only framework proposed in this memo provide availability for IPv4 service when the underlay Networks upgraded to IPv6-only.

As shown in Figure 6, Host 1 and Host 2 are legacy servers with only IPv4 capability. Traffic between Host 1 and Host 2 are carried by IPv6 network in the DCN. The access switch(ASW) have the function of ADPT which learns IPv4/IPv6 mapping rules and delivers the IPv4 service in IPv6-only network.


                         Internet
                             ^
                             |
      ^     +----------------+------------------+
      |     |         Data Center Network       |
      |     +----+-------------------------+----+
      |          |                         |
      |     +----+-------------------------+----+
      |     |                                   |
IPv6-only   |             PSW/R1                |AS2
      |     +----+--------------------------+---+
      |          |                          |
      |          |                          |
      v     +----+---+                 +----+---+
    ------- |        |                 |        |
      ^     |ASW/PE1 |AS1              |ASW/PE2 |AS1
      |     +----+---+                 +----+---+\
dualstack        |                          |     \
      |        +-+-+                      +-+-+   +---+
      v        | H1|IPv4              IPv4| H2|   | H3| IPv6
               +---+                      +---+   +---+

    Figure 6:IPv6-only DCN for AI infra fabric

Authors' Addresses

Chongfeng Xie
China Telecom
Beiqijia Town, Changping District
Beijing
102209
China
Guozhen Dong
China Telecom
Beiqijia Town, Changping District
Beijing
102209
China
Xing Li
CERNET Center/Tsinghua University
Shuangqing Road No.30, Haidian District
Beijing
100084
China
Guoliang Han
Indirection Network Inc.
Linjian Song
Alibaba Cloud
Wangjing Qiyang Rd, Chaoyang District
Beijing
100102
China