Internet Engineering Task Force R. E. Gilligan (Sun) INTERNET-DRAFT S. Thomson (Bellcore) J. Bound (Digital) March 13, 1995 IPv6 Program Interfaces for BSD Systems Abstract In order to implement the version 6 Internet Protocol (IPv6) [1] in an operating system based on Berkeley Unix (4.x BSD), changes must be made to the application program interface (API). TCP/IP applications written for BSD-based operating systems have in the past enjoyed a high degree of portability because most of the systems derived from BSD provide the same API, known informally as "the socket interface". We would like the same portability with IPv6. This memo presents a set of extensions to the BSD socket API to support IPv6. The changes include a new data structure to carry IPv6 addresses, new name to address translation library functions, new address conversion functions, and some new setsockopt() options. The extensions are designed to provide access to IPv6 features, while introducing a minimum of change into the system and providing complete compatibility for existing IPv4 applications. Status of this Memo This document is an Internet Draft. Internet Drafts are working documents of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute working documents as Internet Drafts. Internet Drafts are draft documents valid for a maximum of six months. This Internet Draft expires on September 13, 1995. Internet Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet Drafts as reference material or to cite them other than as a "working draft" or "work in progress." To learn the current status of any Internet-Draft, please check the 1id-abstracts.txt listing contained in the Internet-Drafts Shadow Directories on ds.internic.net, nic.nordu.net, ftp.isi.edu, or munnari.oz.au. Distribution of this memo is unlimited. draft-ietf-ipngwg-bsd-api-00.txt [Page 1] INTERNET-DRAFT IPv6 BSD API Spec March 1995 1. Introduction. While IPv4 addresses are 32-bits long, IPv6 nodes are identified by 128-bit addresses. The socket interface API make the size of an IP address quite visible to an application; virtually all TCP/IP applications for BSD-based systems have knowledge of the size of an IP address. Those parts of the API that expose the addresses need to be extended to accommodate the larger IPv6 address size. This paper defines a set of extensions to the socket interface API to support IPv6. This specification is preliminary. The API extensions are expected to evolve as we gain more implementation experience. 2. Design Considerations There are a number of important considerations in designing changes to this well-worn API: - The extended API should provide both source and binary compatibility for programs written to the original API. That is, existing program binaries should continue to operate when run on a system supporting the new API. In addition, existing applications that are re-compiled and run on a system supporting the new API should continue to operate. Simply put, the API changes for IPv6 should not break existing programs. - The changes to the API should be as small as possible in order to simplify the task of converting existing IPv4 applications to IPv6. - Where possible, applications should be able to use the extended API to interoperate with both IPv6 and IPv4 hosts. Applications should not need know which type of host they are communicating with. - IPv6 addresses carried in data structures should be 64-bit aligned. This is necessary in order to obtain optimum performance on 64-bit machine architectures. Because of the importance of providing IPv4 compatibility in the API, our extensions are explicitly designed to operate on machines that provide complete support for both IPv4 and IPv6. A subset of this API could probably be designed for operation on systems that support only IPv6. However, this is not addressed in this document. 2.1. Overview of Changes draft-ietf-ipngwg-bsd-api-00.txt [Page 2] INTERNET-DRAFT IPv6 BSD API Spec March 1995 The socket interface API consists of a few distinct components: - Core socket functions. - Address data structures. - Name-to-address translation functions. - Address conversion functions. The core socket functions -- those functions that deal with such things as setting up and tearing down TCP connections, and sending and receiving UDP packets -- were designed to be transport independent. Where protocol addresses are passed as function arguments, they are carried via opaque pointers. A protocol specific address data structure is defined for each protocol that the socket functions support. Applications must cast these protocol specific address structures into the generic "sockaddr" data type when using the socket functions. These functions need not change for IPv6, but a new IPv6 specific address data structure is needed. The "sockaddr_in" structure is the protocol specific data structure for IPv4. This data structure actually includes 8-octets of unused space, and it is tempting to try to use this space to adapt the sockaddr_in structure to IPv6. Unfortunately, the sockaddr_in structure is not large enough to hold the 16-octet IPv6 address as well as the other information (2-octet address family and 2-octet port number) that is needed. So a new address data structure must be defined for IPv6. The name-to-address translation functions in the socket interface are gethostbyname() and gethostbyaddr(). Gethostbyname() does not provide enough flexibility to accommodate more than one protocol family. To solve this problem, we introduced a new name-to-address translation function which is analogous to gethostbyname(), but supports addresses in both the IPv4 and IPv6 address families. Gethostbyaddr() does not, strictly speaking, need to be replaced since it carries an address family argument and can be extended to support both address families without introducing compatibility problems. However, we have chosen to introduce a new function to maintain symmetry with the replacement to gethostbyname(). The new functions both carry an address family parameter, so they can be extended to operate with other protocol families in addition to IPv4 and IPv6. The address conversion functions -- inet_ntoa() and inet_addr() -- convert IPv4 addresses between binary and printable form. These functions are quite specific to 32-bit IPv4 addresses. We have designed two analogous functions which convert both IPv4 and IPv6 addresses, and carry an address type parameter so that they can be extended to other draft-ietf-ipngwg-bsd-api-00.txt [Page 3] INTERNET-DRAFT IPv6 BSD API Spec March 1995 protocol families as well. Finally, a few miscellaneous features are needed to support IPv6. A new interface is needed in order to support the IPv6 flow label. New interfaces are needed in order to receive IPv6 multicast packets and control the sending of multicast packets. And an interface is necessary in order to pass IPv6 source route information between the application and the system. 3. Implementation Experience A few issues exposed in experimenting with prototype implementations of IPv6 helped to guide the design of this API. First, we discovered that, by providing a way to represent the addresses of IPv4 nodes as IPv6 addresses, we could greatly simplify the applications' task of providing IPv4 compatibility. New applications could interoperate with IPv4 nodes by using the new API and expressing the addresses of IPv4 nodes they interoperate with as IPv6 addresses. For example, a client application could open a TCP connection to an IPv4 server by giving the IPv6 representation of the server's IPv4 address in the connect() call. Most applications do not even need to know whether the peer is an IPv4 or IPv6 node. Such applications can simply treat IPv6 addresses as opaque values; They need not understand the "structure" by which IPv4 addresses are encoded within IPv6 addresses. Yet the structure can be decoded by those applications that do need to know whether the peer is IPv6 or IPv4. This should prove to be a significant simplification since most applications will need to interoperate with both IPv4 and IPv6 nodes for some time to come. Second, we learned that existing applications written to the IPv4 API could be made to interoperate with IPv6 nodes to a limited degree. This technique does not work for all applications, but does for certain applications, such as those that do not "look at" the peer address that is provided by the API. (e.g. the source address provided by the recvfrom() function when a UDP packet is received, or the client address returned by the accept() function.) Third, we learned that the common application practice of passing open socket descriptors between processes across an exec() call can cause problems. It is possible, for example, for an application using the extended API to pass an open socket to an older application using the original API. The old application could be confused if the socket functions return IPv6 address structures to it. The solution designed was to provide a mechanism by which applications could have explicit control over what form of addresses are returned. draft-ietf-ipngwg-bsd-api-00.txt [Page 4] INTERNET-DRAFT IPv6 BSD API Spec March 1995 4. Interface Specification 4.1. New Address Family A new address family macro, named AF_INET6, is defined in . The AF_INET6 definition is used to distinguish between the original sockaddr_in address data structure, and the new sockaddr_in6 data structure. A new protocol family macro, named PF_INET6, is defined in . Like most of the other protocol family macros, this will usually be defined to have the same value as the corresponding address family macro: #define PF_INET6 AF_INET6 The PF_INET6 is used in the first argument to the socket() function to indicate that an IPv6 socket is being created. 4.2. IPv6 Address Data Structure A new data structure to hold a single IPv6 address is defined in : struct in_addr6 { u_long s6_addr[4]; /* IPv6 address */ } This data structure contains an array of four 32-bit elements, which make up one 128-bit IPv6 address. The IPv6 address is stored in in network byte order. 4.3. Socket Address Structure for 4.3 BSD-Based Systems In the socket interface, a different protocol-specific data structure is defined to carry the addresses for each of the protocol suite. Each protocol-specific data structure is designed so it can be cast into a protocol-independent data structure -- the "sockaddr" structure. Each has a "family" field which overlays the "sa_family" of the sockaddr data structure. This field can be used to identify the type of the data structure. The sockaddr_in structure is the protocol-specific address data structure for IPv4. It is used to pass addresses between applications and the system in the socket functions. We have defined the following structure in to carry IPv6 addresses: draft-ietf-ipngwg-bsd-api-00.txt [Page 5] INTERNET-DRAFT IPv6 BSD API Spec March 1995 struct sockaddr_in6 { u_short sin6_family; /* AF_INET6 */ u_short sin6_port; /* Transport layer port # */ u_long sin6_flowlabel; /* IPv6 flow label */ struct in_addr6 sin6_addr; /* IPv6 address */ }; This structure is designed to be compatible with the sockaddr data structure used in the 4.3 BSD release. The sin6_family field is used to identify this as a sockaddr_in6 structure. This field is designed to overlay the sa_family field when the buffer is cast to a sockaddr data structure. The value of this field must be AF_INET6. The sin6_port field is used to store the 16-bit UDP or TCP port number. This field is used in the same way as the sin_port field of the sockaddr_in structure. The port number is stored in network byte order. The sin6_flowlabel field is a 32-bit field that is used to store the 28-bit IPv6 flow label. The IPv6 flow label is represented as the low-order 28-bits of a 32-bit value, which is stored in network byte order in the sin6_flowlabel field. The use of this field is explained in sec 4.8. The sin6_addr field is a single in_addr6 structure (defined in the previous section). This field holds one 128-bit IPv6 address. The address is stored in in network byte order. The ordering of elements in this structure is specifically designed so that the sin6_addr field will be aligned on a 64-bit boundary. This is done for optimum performance on 64-bit architectures. The data types of the structure elements given here and in the previous section are intended as examples only. System implementations may use other types if they are appropriate for the system they are used on. 4.4. Socket Address Structure for 4.4 BSD-Based Systems The 4.4 BSD release includes a small, but incompatible change to the socket interface. The "sa_family" field of the sockaddr data structure was changed from a 16-bit value to an 8-bit value, and the space saved used to hold a length field, named "sa_len". The sockaddr_in6 data structure given in the previous section can not be correctly cast into the newer sockaddr data structure. For this draft-ietf-ipngwg-bsd-api-00.txt [Page 6] INTERNET-DRAFT IPv6 BSD API Spec March 1995 reason, we have defined the following alternative IPv6 address data structure to be used on systems based on 4.4 BSD: #define SIN6_LEN struct sockaddr_in6 { u_char sin6_len; /* length of this struct */ u_char sin6_family; /* AF_INET6 */ u_short sin6_port; /* Transport layer port # */ u_long sin6_flowlabel; /* IPv6 flow label */ struct in_addr6 sin6_addr; /* IPv6 address */ }; This structure is defined in the header file. The only differences between this data structure and the 4.3 BSD variant are the inclusion of the length field, and the change of the family field to a 8-bit data type. The definitions of all the other fields are identical to the 4.3 BSD variant defined in the previous section. Systems that provide this version of the sockaddr_in6 data structure must include the SIN6_LEN macro definition in . This macro allows applications to determine whether they are being built on a system that supports the 4.3 BSD or 4.4 BSD variants of the data structure. Applications can be written to run on both systems by simply making their assignments and use of the sin6_len field conditional on the SIN6_LEN field. For example, to fill in an IPv6 address structure in an application, one might write: struct sockaddr_in6 sin6; bzero((char *) &sin6, sizeof(struct sockaddr_in6)); #ifdef SIN6_LEN sin6.sin6_len = sizeof(struct sockaddr_in6); #endif sin6.sin6_family = AF_INET6; sin6.sin6_port = 23; 4.5. The Socket Functions Applications use the socket() function to create a socket descriptor that represents a communication endpoint. The arguments to the socket() function tell the system which protocol to use, and what format address structure will be used in subsequent functions. For example, to create an IPv4/TCP socket, applications make the call: s = socket (PF_INET, SOCK_STREAM, 0); draft-ietf-ipngwg-bsd-api-00.txt [Page 7] INTERNET-DRAFT IPv6 BSD API Spec March 1995 To create an IPv4/UDP socket, applications make the call: s = socket (PF_INET, SOCK_DGRAM, 0); Applications may create IPv6/TCP and IPv6/UDP sockets by simply using the constant PF_INET6 instead of PF_INET in the first argument. For example, to create an IPv6/TCP socket, applications make the call: s = socket (PF_INET6, SOCK_STREAM, 0); To create an IPv6/UDP socket, applications make the call: s = socket (PF_INET6, SOCK_DGRAM, 0); Once the application has created a PF_INET6 socket, it must use the sockaddr_in6 address structure when passing addresses in to the system. The functions which the application uses to pass addresses into the system are: bind() connect() sendto() The system will use the sockaddr_in6 address structure to return addresses to applications that are using PF_INET6 sockets. The functions that return an address from the system to an application are: accept() recvfrom() getpeername() getsockname() No changes to the syntax of the socket functions are needed to support IPv6, since the all of the "address carrying" functions use an opaque address pointer, and carry an address length as a function argument. 4.6. Compatibility with IPv4 Applications In order to support the large base of applications using the original API, system implementations must provide complete source and binary compatibility with the original API. This means that systems must continue to support PF_INET sockets and the sockaddr_in addresses structure. Applications must be able to create IPv4/TCP and IPv4/UDP sockets using the PF_INET constant in the socket() function, as described in the previous section. Applications should be able to hold a combination of IPv4/TCP, IPv4/UDP, IPv6/TCP and IPv6/UDP sockets draft-ietf-ipngwg-bsd-api-00.txt [Page 8] INTERNET-DRAFT IPv6 BSD API Spec March 1995 simultaneously within the same process. Applications using the original API should continue to operate as they did on systems supporting only IPv4. That is, they should continue to interoperate with IPv4 nodes. It is not clear, though, how, or even if, those IPv4 applications should interoperate with IPv6 nodes. The open issues section (section 7) discusses some of the alternatives. 4.7. Compatibility with IPv4 Nodes The API also provides a different type of compatibility: the ability for applications using the extended API to interoperate with IPv4 nodes. This feature uses the IPv4-mapped IPv6 address format defined in the IPv6 addressing architecture specification [3]. This address format allows the IPv4 address of an IPv4 node to be represented as an IPv6 address. The IPv4 address is encoded into the low-order 32-bits of the IPv6 address, and the high-order 96-bits hold the fixed prefix 0:0:0:0:0:FFFF. IPv4-mapped addresses are written as follows: ::FFFF: Applications may use PF_INET6 sockets to open TCP connections to IPv4 nodes, or send UDP packets to IPv4 nodes, by simply encoding the destination's IPv4 address as an IPv4-mapped IPv6 address, and passing that address, within a sockaddr_in6 structure, in the connect() or sendto() call. When applications use PF_INET6 sockets to accept TCP connections from IPv4 nodes, or receive UDP packets from IPv4 nodes, the system returns the peer's address to the application in the accept(), recvfrom(), or getpeername() call using a sockaddr_in6 structure encoded this way. We expect that few applications will need to know which type of node they are interoperating with. However, for those applications that do need to know, the following function is provided: int is_ipv4_addr (const struct in_addr6 *ap); The "ap" argument to this function points to a buffer holding an IPv6 address in network byte order. The function returns true (non-zero) if that address is an IPv4-mapped address, and returns 0 otherwise. When an application using the extended API accepts a TCP connection, or receives a UDP packet, it may determine whether the peer is an IPv4 node by applying the is_ipv4_addr() function to the address returned by accept() or recvfrom(). 4.8. Sockets Passed Across exec() draft-ietf-ipngwg-bsd-api-00.txt [Page 9] INTERNET-DRAFT IPv6 BSD API Spec March 1995 Unix allows open sockets to be passed across an exec() call. It is a relatively common application practice to pass open sockets across exec() calls. Because of this, it is possible for an application using the original API to pass an open PF_INET socket to an application that is expecting to receive a PF_INET6 socket. Similarly, it is possible for an application using the extended API to pass an open PF_INET6 socket to an application using the original API, which would be equipped only to deal with PF_INET sockets. Either of these cases could cause problems, because the application which is passed the open socket might not know how to decode the address structures returned in subsequent socket functions. To remedy this problem, we have defined a new setsockopt() option that allows an application to "transform" a PF_INET6 socket into a PF_INET socket and vice-versa. An IPv6 application that is passed an open socket from an unknown process may use the IP_ADDRFORM setsockopt() option to "convert" the socket to PF_INET6. Once that has been done, the system will return sockaddr_in6 address structures in subsequent socket functions. Similarly, an IPv6 application that is about to pass an open PF_INET6 socket to a program that may not be IPv6 capable may "downgrade" the socket to PF_INET before calling exec(). After that, the system will return sockaddr_in address structures to the application that was exec()'ed. The macro definition for IP_ADDRFORM is in . The IP_ADDRFORM option is at the IPPROTO_IP level. The only valid option values are PF_INET6 and PF_INET. For example, to convert a PF_INET6 socket to PF_INET, a program would call: int addrform = PF_INET; if (setsockopt(s, IPPROTO_IP, IP_ADDRFORM, (char *) &addrform, sizeof(addrform)) == -1) perror("setsockopt IP_ADDRFORM"); An application may use IP_ADDRFORM in the getsckopt() function to learn whether an open socket is a PF_INET of PF_INET6 socket. For example: int addrform; int len = sizeof(int); if (getsockopt(s, IPPROTO_IP, IP_ADDRFORM, (char *) &addrform, &len) == -1) perror("getsockopt IP_ADDRFORM"); if (addrform == PF_INET) draft-ietf-ipngwg-bsd-api-00.txt [Page 10] INTERNET-DRAFT IPv6 BSD API Spec March 1995 printf("This is an IPv4 socket.\n"); else if (addrform == PF_INET6) printf("This is an IPv6 socket.\n"); else printf("This system is broken.\n"); 4.9. Flow Label The IPv6 header has a 28-bit field to hold a "flow label". Applications have control over what flow label value is used in packets that they originate, and have access to the flow label value of packets that they send. The sin6_flowlabel field of the sockaddr_in6 structure is used to carry the flow label between the application and the system. An application may specify a flow label to use in the transmitted packets of an actively opened TCP connection by setting the sin6_flowlabel field of the destination address sockaddr_in6 structure passed in the connect() function. An application may specify the flow label to use in transmitted UDP packets by setting the sin6_flowlabel field of the destination address sockaddr_in6 structure passed in the sendto() function. If an application does not care what flow label is used, it should set the flowlabel value to zero. An application may specify the flow label to use in transmitted packets of a passively accepted TCP connection, by setting the sin6_flowlabel field of the address passed in the bind() function. The flow label that appeared in received UDP packets is passed up to the application in the sin6_flowlabel field of the source address sockaddr_in6 structure that is returned in the recvfrom() call. The flow label that appeared in the received SYN segment of a passively accepted TCP connection is returned to the application in the source address sin6_flowlabel field of the sockaddr_in6 structure that is passed in the accept() call. 4.10. Handling IPv6 Source Routes IPv6 makes more use of the source routing mechanism than IPv4. In order for source routing to operate properly, the node receiving a request packet that bears a source route must reverse that source route when sending the reply. In the case of TCP, the reversal can be done in the transport protocol implementation transparently to the application. But in the case of UDP, the application must perform the reversal itself. The transport protocol code can not perform the reversal for UDP packets because a UDP application may receive a number of requests and generate replies asynchronously. A "reply" sent by an application may not match the "request" most recently passed up to the application. draft-ietf-ipngwg-bsd-api-00.txt [Page 11] INTERNET-DRAFT IPv6 BSD API Spec March 1995 The API for source routing has two components: providing a source route to be used with originated traffic -- actively opened TCP connections and UDP packets being sent -- and retrieving the source route of received traffic -- passively accepted TCP connections and received UDP packets. An application may always provide a source route with TCP connections being originated and UDP packets being sent. But to receive source routes, the application must enable an option. To provide a source route, an application simply provides an array of sockaddr_in6 data structures in the address argument of the sendto() function (when sending a UDP packet), or the connect() function (when actively opening a TCP connection). The length argument of the function is the total length, in octets, of the array. The elements of the array represent the full source route, including both source and destination identifying address. The elements of the array are ordered from destination to source. That is, the first element of the array represents the destination identifying address, and the last element of the array represents the source identifying address. If the application provides a source route, the source identifying address can not be omitted. The sin6_addr field of the source identifying address may be set to zero, however, in which case the system will select an appropriate source address. The sin6_port field of the destination identifying address must be assigned. The sin_port field of the source identifying address may be set to zero, in which case the system will select an appropriate source port number. The sin6_port and sin6_flowlabel fields of the intermediate addresses must be set to zero. The arrangement of the address structures in the address buffer passed to connect() or sendto() is shown in the figure below: draft-ietf-ipngwg-bsd-api-00.txt [Page 12] INTERNET-DRAFT IPv6 BSD API Spec March 1995 +--------------------+ | | | sockaddr_in6[0] | Destination Identifying Address | | +--------------------+ | | | sockaddr_in6[1] | Last Source-Route Hop Address | | +--------------------+ . . . . . . +--------------------+ | | | sockaddr_in6[N-1] | First Source-Route Hop Address | | +--------------------+ | | | sockaddr_in6[N] | Source Identifying Address | | +--------------------+ Address buffer when sending a source route The IP_RCVSRCRT setsockopt() option controls the reception of source routes. The option is disabled by default. Applications must explicitly enable the option using the setsockopt() function in order to receive source routes. The macro definition for IP_RCVSRCRT is in . The IP_RCVSRCRT option is at the IPPROTO_IP level. An example of how an application might use this option is: int on = 1; /* value == 1 means enable the option */ if (setsockopt(s, IPPROTO_IP, IP_RCVSRCRT, (char *) &on, sizeof(on)) == -1) perror("setsockopt IP_RCVSRCRT"); When the IP_RCVSRCRT option is disabled, only a single sockaddr_in6 address structure is returned to applications in the address argument of the recvfrom() and accept() functions. This address represents the source identifying address of the UDP packet received or the TCP connection accepted. When the IP_RCVSRCRT option is enabled, the address argument of the recvfrom() function (when receiving UDP packets) and the accept() draft-ietf-ipngwg-bsd-api-00.txt [Page 13] INTERNET-DRAFT IPv6 BSD API Spec March 1995 functions (when passively accepting TCP connections) points to an array of sockaddr_in6 structures. When the function returns, the array will hold two elements -- source and destination address -- when the received UDP packet or TCP SYN packet does not carry a source route. The array will hold more than two elements when the received packet carries a source route. The addresses in the array are ordered from source to destination. That is, the first element of the array holds source identifying address of the received packet. Following this in the array are the intermediary hops. And the last element of the array holds the destination identifying address. Note that this is the opposite of the order specified for sending. This ordering was chosen so that the address array received in a recvfrom() call can be used in a subsequent sendto() call without requiring the application to re-order the addresses in the array. Similarly, the address array received in an accept() call can be used unchanged in a subsequent connect() call. The address length argument of the recvfrom() and accept() functions indicate the length, in octets, of the full address array. This argument is a value-result parameter. The application sets the maximum size of the address buffer when it makes the call, and the system modifies the value to return the actual size of the buffer to the application. The sin6_port field of the first and last array elements (source and destination identifying address) will hold the source and destination UDP or TCP port number of the received packet. The sin6_port field of the intermediate elements of the array will be zero. The address buffer returned to the application in the recvfrom() or accept() functions when the IP_RCVSRCRT option is enabled is shown below: draft-ietf-ipngwg-bsd-api-00.txt [Page 14] INTERNET-DRAFT IPv6 BSD API Spec March 1995 +--------------------+ | | | sockaddr_in6[0] | Source Identifying Address | | +--------------------+ | | | sockaddr_in6[1] | First Source-Route Hop Address | | +--------------------+ . . . . . . +--------------------+ | | | sockaddr_in6[N-1] | Last Source-Route Hop Address | | +--------------------+ | | | sockaddr_in6[N] | Destination Identifying Address | | +--------------------+ Address buffer when receiving a source route Since IPv6 allows the number of elements in a source route to be very large, it is impractical for all applications that have enabled the reception of source routes to provide buffer space to hold the maximum number of elements. Some applications may choose a buffer size that is appropriate for their own use. This means that it is possible that a received source route may be too large to fit into the buffer provided by the application. In this circumstance, the system should return only a single address element -- the source identifying address -- to the application. This case is clearly distinguishable to the application because in all other cases, the system returns at least two address elements -- the source and destination identifying addresses. 4.11. Unicast Hop Limit A new setsockopt() option is used to control the hop limit used in outgoing unicast IPv6 packets. The name of this option is IP_UNICAST_HOPS, and it is used at the IPPROTO_IP layer. The macro definition for IP_UNICAST_HOPS resides in the header file. The following example illustrates how it is used: int hoplimit = 10; if (setsockopt(s, IPPROTO_IP, IP_UNICAST_HOPS, (char *) &hoplimit, sizeof(hoplimit)) == -1) draft-ietf-ipngwg-bsd-api-00.txt [Page 15] INTERNET-DRAFT IPv6 BSD API Spec March 1995 perror("setsockopt IP_UNICAST_HOPS); When the IP_UNICAST_HOPS option is set with setsockopt(), the option value given is used as the hop limit for all subsequent unicast packets sent via that socket. If the option is not set, the system selects a default value. The IP_UNICAST_HOPS option may be used in the getsockopt() function to determine the hop limit value that the system will use for subsequent unicast packets sent via that socket. For example: int hoplimit; int len = sizeof(hoplimit); if (getsockopt(s, IPPROTO_IP, IP_UNICAST_HOPS, (char *) &hoplimit, &len) == -1) perror("getsockopt IP_UNICAST_HOPS); else printf("Using %d for hop limit.\n", hoplimit); 4.12. Sending and Receiving Multicast Packets IPv6 applications may send UDP multicast packets by simply specifying an IPv6 multicast address in the address argument of the sendto() function. A few setsockopt options at the IPPROTO_IP layer are used to control some of the parameters of sending multicast packets. These options are optional: applications may send multicast packets without using these options. The setsockopt() options for controlling the sending of multicast packets are summarized below: IP_MULTICAST_IF Set the interface to use for outgoing multicast packets. IP_MULTICAST_HOPS Set the hop limit to use for outgoing multicast packets. (Note a separate option - IP_UNICAST_HOPS - is provided to set the hop limit to use for outgoing unicast packets.) IP_MULTICAST_LOOP Controls whether outgoing multicast packets sent should be delivered back to the local application. A toggle. The reception of multicast packets is controlled by the two setsockopt() options summarized below: draft-ietf-ipngwg-bsd-api-00.txt [Page 16] INTERNET-DRAFT IPv6 BSD API Spec March 1995 IP_ADD_MEMBERSHIP Join a multicast group. Requests that multicast packets sent to a particular multicast address be delivered to this socket. IP_DROP_MEMBERSHIP Leave a multicast group. Requests that multicast packets sent to a particular multicast address no longer be delivered to this socket. 4.13. Name-to-Address Translation Functions We have defined two new functions analogous to gethostbyname() and gethostbyaddr() which support addresses in both the IPv4 and IPv6 address families. The names of the new functions are hostname2addr() and addr2hostname(). These functions were designed to have semantics similar to gethostbyname() and gethostbyaddr(), so that existing IPv4 applications can be easily ported to IPv6. Hostname2addr() is defined similarly to gethostbyname(), but enables applications to specify the type of address to be looked up: struct hostent *hostname2addr(const char *name, int af); This new function looks up the given name in the name service and returns the completed hostent structure if the lookup succeeds, and NULL otherwise. The name argument is the domain name of the host to look up. The af argument specifies the type of the address -- IPv4 (AF_INET) or IPv6 (AF_INET6) -- to return to the caller in the h_addr_list field of the hostent structure. If the af argument is AF_INET, hostname2addr() queries the name service for IPv4 addresses and, if any are found, returns a hostent structure that includes an array of IPv4 addresses. Each IPv4 address is encoded in network byte order. If the af argument is AF_INET6, the processing is as follows: the hostname2addr() function first queries the name service for IPv6 addresses. If IPv6 addresses are found, they are returned in an array in the hostent structure. If no IPv6 addresses are found, the function queries the name service for IPv4 addresses. If IPv4 addresses are found, they are returned as IPv4-mapped IPv6 addresses. As in IPv4, each IPv6 address returned in the hostent structure is encoded in network byte order. The second new function, called addr2hostname(), is defined in exactly the same way as the gethostbyaddr() function, except that it now supports both the IPv4 and IPv6 address families: draft-ietf-ipngwg-bsd-api-00.txt [Page 17] INTERNET-DRAFT IPv6 BSD API Spec March 1995 struct hostent *addr2hostname(const void *addr, int len, int af); addr2hostname() performs an address-to-name lookup on the address specified, returning a completed hostent structure if the lookup succeeds, or NULL, if the lookup fails. This function supports both the AF_INET and AF_INET6 address families. If the af argument is AF_INET, then len must be specified to be 4-octets and addr must refer to an IPv4 address. If af is AF_INET6, then len must be specified as 16-octets and addr must refer to an IPv6 address. If the addr argument is an IPv4-mapped IPv6 address, an IPv4 address-to-name lookup is performed on the embedded IPv4 address. A new name-to-address translation library function is now under development at Berkeley [2]. This new function, named getconninfo(), will subsume the functionality of gethostbyname(), hostname2addr(), as well as the getservbyname() and getservbyport() functions. The new function is specifically designed to be "transport independent", so it should be directly usable by IPv6 applications. System implementations should provide the addr2hostname() and hostname2addr() functions in order to simplify the porting of existing IPv4 applications to IPv6. System implementations may also provide the getconninfo() function, once it is defined, so that newly written applications can be transport independent. The getconninfo() function is expected to be published as a separate specification document, not included in this spec. Implementations must retain the BSD gethostbyname() and gethostbyaddr() functions in order to provide source and binary compatibility for existing applications. 4.14. Address Conversion Functions BSD Unix provides two functions, inet_addr() and inet_ntoa(), to convert an IPv4 address between binary and printable form. IPv6 applications need similar functions. We have defined the following two functions to convert both IPv6 and IPv4 addresses: int ascii2addr(int af, const char *cp, void *ap); and char *addr2ascii(int af, const void *ap, int len, char *cp); The first function converts an ascii string to an address in the address family specified by the af argument. Currently AF_INET and AF_INET6 draft-ietf-ipngwg-bsd-api-00.txt [Page 18] INTERNET-DRAFT IPv6 BSD API Spec March 1995 address families are supported. The cp argument points to the ascii string being passed in. The ap argument points to a buffer into which the function stores the address. Ascii2addr() returns the length of the address in octets if the conversion succeeds, and -1 otherwise. The function does not modify the storage pointed to by ap if the conversion fails. The application must ensure that the buffer referred to by ap is large enough to hold the converted address. If the af argument is AF_INET, the function accepts a string in the standard IPv4 dotted decimal form: ddd.ddd.ddd.ddd where ddd is a one to three digit decimal number between 0 and 255. If the af argument is AF_INET6, then the function accepts a string in one of the standard IPv6 printing forms defined in the addressing architecture specification [3]. The second function converts an address into a printable string. The af argument specifies the form of the address. This can be AF_INET or AF_INET6. The ap argument points to a buffer holding an IPv4 address if the af argument is AF_INET, and an IPv6 address if the af argument is AF_INET6. The len field specifies the length in octets of the address pointed to by ap, and must be 4 if af is AF_INET, or 16 if af is AF_INET6. The cp argument points to a buffer that the function can use to store the ascii string. If the cp argument is NULL, the function uses its own private static buffer. If the application specifies a cp argument, it must be large enough to hold the ascii conversion of the address specified as an argument, including the terminating null octet. For IPv6 addresses, the buffer must be at least 46-octets. For IPv4 addresses, the buffer must be at least 16-octets. The addr2ascii() function returns a pointer to the buffer containing the ascii string if the conversion succeeds, and NULL otherwise. The function does not modify the storage pointed to by cp if the conversion fails. 5. Security Considerations IPv6 provides a number of new security mechanisms, many of which need to be accessible to applications. A companion document detailing the extensions to the socket interfaces to support IPv6 security is being written [4]. At some point in the future, that document and this one may be merged into a single API specification. 6. Changes from October 1994 Edition draft-ietf-ipngwg-bsd-api-00.txt [Page 19] INTERNET-DRAFT IPv6 BSD API Spec March 1995 - Added variant of sockaddr_in6 for 4.4 BSD-based systems (sa_len compatibility). - Removed references to SIT transition specification, and added reference to addressing architecture document, for definition of IPv4-mapped addresses. - Added a solution to the problem of the application not providing enough buffer space to hold a received source route. - Moved discussion of IPv4 applications interoperating with IPv6 nodes to open issues section. - Added length parameter to addr2ascii() function to be consistent with addr2hostname(). - Changed IP_MULTICAST_TTL to IP_MULTICAST_HOPS to match IPv6 terminology, and added IP_UNICAST_HOPS option to match IP_MULTICAST_HOPS. - Removed specification of numeric values for AF_INET6, IP_ADDRFORM, and IP_RCVSRCRT, since they need not be the same on different implementations. - Added a definition for the in_addr6 IPv6 address data structure. Added this so that applications could use sizeof(struct in_addr6) to get the size of an IPv6 address, and so that a structured type could be used in the is_ipv4_addr(). 7. Open Issues A few open issues for IPv6 socket interface API specification remain, including: - The multicast API needs to be documented in more detail. - Should we add a timeout parameter to hostname2addr() and addr2hostname()? DNS lookups need to be given some finite timeout interval, so it might be nice to let the application specify that interval. - Can existing IPv4 applications interoperate with IPv6 nodes? 7.1. IPv4 Applications Interoperating with IPv6 Nodes This problem primarily has to do with the how IPv4 applications represent addresses of IPv6 nodes. What address should be returned to draft-ietf-ipngwg-bsd-api-00.txt [Page 20] INTERNET-DRAFT IPv6 BSD API Spec March 1995 the application when an IPv6/UDP packet is received, or an IPv6/TCP connection is accepted? The peer's address could be any arbitrary 128-bit IPv6 address. But the application is only equipped to deal with 32-bit IPv4 addresses encoded in sockaddr_in data structures. We have not discovered any solution that provides complete transparent interoperability with IPv6 nodes for applications using the original IPv4 API. However, two techniques that partially solve the problem are: 1) Prohibit communication between IPv4 applications and IPv6 nodes. Only UDP packets received from IPv4 nodes would be passed up to the application, and only TCP connections received from IPv4 nodes would be accepted. UDP packets from IPv6 nodes would be dropped, and TCP connections from IPv6 nodes would be refused. 2) The system could generate a local 32-bit cookie to represent the full 128-bit IPv6 address, and pass this value to the application. The system would maintain a mapping from cookie value into the 128-bit IPv6 address that it represents. When the application passed a cookie back into the system (for example, in a sendto() or connect() call) the system would use the 128-bit IPv6 address that the cookie represents. The cookie would have to be chosen so as to be an invalid IPv4 address (e.g. an address on net 127.0.0.0), and the system would have to make sure that these cookie values did not escape into the Internet as the source or destination addresses of IPv4 packets. Both of these techniques have drawbacks. This is an area for further study. System implementors may use one of these techniques or implement another solution. Acknowledgments Thanks to the many people who made suggestions and provided feedback to earlier revisions of this document. Comments were provided by: Richard Stevens, Dan McDonald, Christian Huitema, Steve Deering, Andrew Cherenson, Charles Lynn, Ran Atkinson, Erik Nordmark, Glenn Trewitt, Fred Baker, Robert Elz, Dean D. Throop, and Francis Dupont. Craig Partridge suggested the addr2ascii() and ascii2addr() functions. Ramesh Govindan made a number of contributions and co-authored an earlier version of this paper. References draft-ietf-ipngwg-bsd-api-00.txt [Page 21] INTERNET-DRAFT IPv6 BSD API Spec March 1995 [1] R. Hinden. "Internet Protocol, Version 6 (IPv6) Specification". Internet Draft. October 1994. [2] K. Sklower. Private communication. [3] R. Hinden. "IP Next Generation Addressing Architecture". Internet Draft. October 1994. [4] D. McDonald. "IPv6 Security API for BSD Sockets". Internet Draft. 30 January 1995. Authors' Address Jim Bound Digital Equipment Corporation 110 Spitbrook Road ZK3-3/U14 Nashua, NH 03062-2698 Phone: +1 603 881 0400 Email: bound@zk3.dec.com Susan Thomson Bell Communications Research MRE 2P-343, 445 South Street Morristown, NJ 07960 Telephone: +1 201 829 4514 Email: set@thumper.bellcore.com Robert E. Gilligan Sun Microsystems, Inc. 2550 Garcia Avenue Mailstop UMTV05-44 Mountain View, CA 94043-1100 Phone: +1 415 336 1012 Email: bob.gilligan@eng.sun.com draft-ietf-ipngwg-bsd-api-00.txt [Page 22]