TCP Maintenance and Minor Extensions (tcpm) L. Guo Internet-Draft F. Xue Intended status: Experimental J. Y. B. Lee Expires: 19 November 2026 CUHK 18 May 2026 Stateful-TCP: Bypassing TCP Slow-Start Using Cached Per-Destination Path Bandwidth draft-lee-tcpm-stateful-tcp-00 Abstract This document specifies Stateful-TCP, an experimental sender-side mechanism that accelerates the startup of TCP connections by reusing path-bandwidth information estimated from earlier connections to the same destination. When a usable estimate is available, the sender bypasses Slow-Start and instead enters a paced startup phase whose initial congestion window and pacing rate are derived from the cached estimate. When no usable estimate is available, the sender falls back to standard Slow-Start. Stateful-TCP is sender-only and does not require any change to the TCP receiver or to the wire format. It is orthogonal to the congestion control algorithm in use after the first round-trip time and may therefore be combined with existing TCP variants such as CUBIC. This document also specifies a Gap-Compensated Bandwidth Estimation (GCBE) procedure used to produce the cached per- destination estimate. Stateful-TCP extends the framework of TCP Control Block Interdependence (RFC 9040) by sharing additional per-destination state across connections. The mechanism is published as Experimental to enable independent implementation, evaluation, and review. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Guo, et al. Expires 19 November 2026 [Page 1] Internet-Draft Stateful-TCP May 2026 Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 19 November 2026. Copyright Notice Copyright (c) 2026 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Conventions and Definitions . . . . . . . . . . . . . . . . . 4 3. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3.1. Design Goals . . . . . . . . . . . . . . . . . . . . . . 5 3.2. Architecture . . . . . . . . . . . . . . . . . . . . . . 6 4. The State Cache . . . . . . . . . . . . . . . . . . . . . . . 6 4.1. Entry Format . . . . . . . . . . . . . . . . . . . . . . 6 4.2. Indexing . . . . . . . . . . . . . . . . . . . . . . . . 7 4.3. Lookup Outcomes . . . . . . . . . . . . . . . . . . . . . 7 4.4. Write Conditions . . . . . . . . . . . . . . . . . . . . 7 4.5. Expiry and Eviction . . . . . . . . . . . . . . . . . . . 8 5. Per-Connection State Machine . . . . . . . . . . . . . . . . 8 5.1. Startup Phase . . . . . . . . . . . . . . . . . . . . . . 9 5.2. Estimation Phase . . . . . . . . . . . . . . . . . . . . 10 5.3. Termination Phase . . . . . . . . . . . . . . . . . . . . 10 5.4. Receiver Window Suppression . . . . . . . . . . . . . . . 11 6. Gap-Compensated Bandwidth Estimation (GCBE) . . . . . . . . . 11 6.1. Rationale . . . . . . . . . . . . . . . . . . . . . . . . 11 6.2. Variables . . . . . . . . . . . . . . . . . . . . . . . . 11 6.3. Cycle Boundary . . . . . . . . . . . . . . . . . . . . . 12 6.4. Per-Cycle Estimate . . . . . . . . . . . . . . . . . . . 12 6.5. Stored Value . . . . . . . . . . . . . . . . . . . . . . 13 6.6. Pseudocode . . . . . . . . . . . . . . . . . . . . . . . 13 6.7. Edge Cases . . . . . . . . . . . . . . . . . . . . . . . 14 7. Operational Considerations . . . . . . . . . . . . . . . . . 15 Guo, et al. Expires 19 November 2026 [Page 2] Internet-Draft Stateful-TCP May 2026 7.1. Pacing . . . . . . . . . . . . . . . . . . . . . . . . . 15 7.2. Relation to RFC 9040 (TCB Interdependence) . . . . . . . 15 7.3. Relation to TCP Fast Open . . . . . . . . . . . . . . . . 15 7.4. Fairness . . . . . . . . . . . . . . . . . . . . . . . . 16 7.5. Aged Estimates . . . . . . . . . . . . . . . . . . . . . 16 8. Security Considerations . . . . . . . . . . . . . . . . . . . 16 8.1. Cache Poisoning . . . . . . . . . . . . . . . . . . . . . 16 8.2. Cache Exhaustion . . . . . . . . . . . . . . . . . . . . 17 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17 10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 17 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 17 11.1. Normative References . . . . . . . . . . . . . . . . . . 17 11.2. Informative References . . . . . . . . . . . . . . . . . 18 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 19 1. Introduction TCP [RFC9293] traditionally begins each new connection with a Slow- Start phase [RFC5681], during which the congestion window (cwnd) starts from a small initial value and grows exponentially as acknowledgments are received. Slow-Start is a conservative bandwidth-probing procedure designed to avoid overwhelming network paths whose capacity is unknown to the sender. On modern high bandwidth-delay-product (BDP) paths, however, the time spent in Slow-Start can dominate the completion time of short and medium-sized flows. Existing mitigations include increasing the initial window [RFC6928], refining the Slow-Start exit condition (Hystart++ [RFC9406]), and Limited Slow-Start [RFC3742]. These approaches improve but do not eliminate the underlying problem: a sender that has no information about the path is forced to ramp its sending rate up gradually, regardless of how much bandwidth is actually available. TCP Control Block (TCB) Interdependence [RFC9040] (which obsoletes [RFC2140]) already permits a sender to share a small set of TCB variables, including smoothed RTT and ssthresh, between connections to the same host. This document specifies an experimental extension to that framework. Specifically, an implementation of Stateful-TCP caches an estimated path bandwidth and a minimum RTT per destination and uses them, when available, to bypass Slow-Start on subsequent connections to that destination. The mechanism has three properties that distinguish it from prior work: Guo, et al. Expires 19 November 2026 [Page 3] Internet-Draft Stateful-TCP May 2026 * The startup phase uses both an enlarged initial cwnd _and_ sender pacing. Pacing the first round of transmissions at the previously estimated bottleneck rate prevents the line-rate burst that would otherwise occur and that is the principal cause of buffer overflow when an enlarged initial cwnd is used in isolation. * The receiver advertised window (rwnd) is suppressed during the startup and estimation phases (except when rwnd is zero, which retains its semantics from [RFC9293]) so that small rwnd at the start of a connection does not cap the sending rate. * The bandwidth estimation procedure (GCBE, Section 6) excludes inter-ACK gaps that arise from cwnd-limited transmission to avoid potential underestimation in classical bandwidth estimators during Slow-Start. The design and an evaluation of Stateful-TCP applied to CUBIC [RFC9438] -- referred to in the cited paper as S-Cubic -- are described in [STATEFUL-TCP]. That paper is the normative source of motivation, design rationale, and experimental results; this document defines the on-the-wire-equivalent specification suitable for independent implementation. 2. Conventions and Definitions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. The following terms and symbols are used throughout this document. Units are given in parentheses where applicable. MSS: TCP Maximum Segment Size, in bytes. cwnd: Sender congestion window, in bytes (or, equivalently, in units of MSS). IW: The default initial value of cwnd in the absence of any cached state, as defined by [RFC5681] and updated by [RFC6928]. rwnd: Receiver advertised window, in bytes, as carried in the TCP Window field [RFC9293]. SRTT: Smoothed round-trip time, computed as defined in [RFC6298], in seconds. Guo, et al. Expires 19 November 2026 [Page 4] Internet-Draft Stateful-TCP May 2026 RTTmin: Minimum observed RTT for the connection, in seconds. BDP: Bandwidth-Delay Product, expressed in bytes, equal to the product of an estimated bandwidth and an estimated RTT. bw_est: The path bandwidth estimate produced by GCBE (Section 6), in bytes per second. peer_IP: The IP address of the remote endpoint of a TCP connection, as observed by the sender. State Cache: The sender-local data structure described in Section 4 that maps peer_IP to a tuple {bw_est, RTTmin}. Startup phase: The phase of a connection running Stateful-TCP that extends from the completion of the three-way handshake until the first ACK that advances the cumulative acknowledgment number is received. See Section 5. Estimation phase: The phase that follows the Startup phase and lasts until connection termination, during which GCBE updates the bandwidth estimate. See Section 5. Termination phase: The processing performed when the connection is closed (whether gracefully via FIN or abnormally via RST), during which the cache is updated. See Section 5. 3. Overview 3.1. Design Goals Stateful-TCP is designed to satisfy the following goals. * Avoid the bandwidth underutilization caused by Slow-Start on paths whose capacity is already known to the sender from earlier connections. * Avoid the line-rate startup burst that occurs when an enlarged initial cwnd is used without pacing. * Require no change to the TCP receiver, no new TCP option, and no change to the wire format. * Coexist with any existing TCP congestion control algorithm; the mechanism takes effect only during the Startup phase, after which the connection behaves exactly as defined by its congestion control algorithm. Guo, et al. Expires 19 November 2026 [Page 5] Internet-Draft Stateful-TCP May 2026 * Degrade gracefully to standard Slow-Start when no usable cached state is available or when a cache lookup collides. 3.2. Architecture A Stateful-TCP implementation consists of two components that reside in the TCP sender: * A _State Cache_, which stores {bw_est, RTTmin} per peer_IP across connections and is described in Section 4. * A _per-connection state machine_, comprising the Startup, Estimation, and Termination phases described in Section 5, that consumes and updates the cache and that drives initial cwnd, pacing, and rwnd suppression. Bandwidth estimation during the Estimation phase is performed by GCBE, specified in Section 6. The mechanism extends the framework of [RFC9040]: bw_est and RTTmin are additional, per-destination cached items, used in addition to (not in place of) the items already shared under that framework. 4. The State Cache 4.1. Entry Format A State Cache entry MUST contain at least the following three fields: peer_IP: The IP address of the peer for which the entry was recorded. Implementations MUST store the full IP address (including address family) so that hash collisions can be detected as described in Section 4.3. bw_est: The most recent bandwidth estimate produced by GCBE for the connection that closed and updated this entry. RTTmin: The minimum RTT observed during that connection. Implementations MAY store additional fields, such as the last update time (used for expiry) or fields inherited from the TCB Interdependence framework [RFC9040]. Such fields are out of scope of this specification but MUST NOT alter the semantics of the three required fields. Guo, et al. Expires 19 November 2026 [Page 6] Internet-Draft Stateful-TCP May 2026 4.2. Indexing Implementations MAY implement the State Cache as a hash table indexed by an H-bit hash of peer_IP, where H is implementation-defined. Implementations SHOULD choose H, the table size, and the hash function so that the expected hash-collision rate is low for the workload anticipated. The choice of hash function is local to the sender and is not visible on the wire. 4.3. Lookup Outcomes A cache lookup performed at the start of the Startup phase produces exactly one of the following three outcomes: Miss: The hashed bucket is empty. The connection MUST proceed using standard Slow-Start [RFC5681]. Hit: The hashed bucket is non-empty and the stored peer_IP equals the peer_IP of the new connection. The connection MUST apply the Hit branch of the Startup phase as specified in Section 5.1. Collision: The hashed bucket is non-empty but the stored peer_IP differs from the peer_IP of the new connection. The connection MUST proceed using standard Slow-Start. In the Miss and Collision cases, the cache entry SHALL be updated by the new connection upon its termination, subject to the conditions in Section 4.4. 4.4. Write Conditions When a connection terminates (see Section 5.3), an implementation SHALL update the cache entry indexed by the peer_IP of that connection with the {bw_est, RTTmin} pair observed for the connection, except that the cache MUST NOT be updated when: * bw_est multiplied by RTTmin, expressed in MSS-sized segments, is less than or equal to IW; or * bw_est is not available (for example, the Estimation phase did not complete a single GCBE measurement cycle); or * RTTmin is not available (for example, no valid RTT sample was obtained). Guo, et al. Expires 19 November 2026 [Page 7] Internet-Draft Stateful-TCP May 2026 In the first case, retaining the entry is unnecessary because Stateful-TCP would not produce any benefit by setting the initial CWnd to below the default IW. In the latter two cases, the values would be invalid for subsequent connections. 4.5. Expiry and Eviction Implementations SHOULD associate each cache entry with a maximum age and MUST NOT use an entry whose age exceeds the configured maximum. The maximum age is implementation-defined and SHOULD be configurable. Implementations MUST bound the memory used by the cache. When the bound is reached, an implementation SHOULD evict entries using a least-recently-used (LRU) policy or an equivalent policy that preserves the entries most likely to be useful. This requirement is also a defence against the cache-exhaustion denial-of-service vector discussed in Section 8. 5. Per-Connection State Machine Each connection that uses Stateful-TCP transitions through three phases as illustrated in Figure 1. +---------------------+ | 3-way handshake | | completes | +----------+----------+ | v +---------------------+ | Cache lookup | +---+-------------+---+ | Miss / | Hit | Collision | v v +-------------------+ +---------------------+ | Standard | | Startup phase (Hit) | | Slow-Start | | cwnd <- max(bw_est * | | per RFC 5681 | | RTTmin, IW) | | | | | | | | pacing on at | | | | bw_est | | | | rwnd suppressed | | | | (except rwnd == 0) | +---------+---------+ +----------+----------+ | | | first ACK arrives | | Guo, et al. Expires 19 November 2026 [Page 8] Internet-Draft Stateful-TCP May 2026 | v | +-----------------------+ | | cwnd <- max(in_flight,| | | IW) | | | pacing off | | +-----------+-----------+ | | +----------+-----------+ | v +---------------------+ | Estimation phase | | (GCBE running; | | CC algorithm | | unchanged; | | rwnd suppressed | | except rwnd == 0) | +----------+----------+ | FIN / RST | v +---------------------+ | Termination phase | | (cache write, | | subject to write | | conditions) | +---------------------+ Figure 1: Stateful-TCP per-connection state machine. 5.1. Startup Phase The Startup phase begins immediately after the three-way handshake completes. The sender MUST perform a State Cache lookup as specified in Section 4.3. On a _Miss_ or _Collision_ outcome, the sender MUST proceed using standard Slow-Start [RFC5681] and the Startup phase ends. On a _Hit_ outcome, the sender MUST apply all of the following before transmitting any data segment: 1. Set cwnd to BDP, where BDP is computed as the cached bw_est multiplied by the cached RTTmin and expressed in bytes. The result MUST be at least IW. Guo, et al. Expires 19 November 2026 [Page 9] Internet-Draft Stateful-TCP May 2026 2. Suppress the receiver advertised window, as defined in Section 5.4. 3. Enable sender pacing for outgoing data segments at a rate equal to the cached bw_est. The Startup phase SHALL end when the first ACK that advances the cumulative acknowledgment number is received. At that point the sender MUST: 1. Disable pacing. 2. Set cwnd to the maximum of (a) the number of bytes currently in flight and (b) IW. The rationale for the cwnd update is that the bytes in flight at this instant approximate the path's BDP. Hystart-style premature exits from Slow-Start, including Hystart++ [RFC9406], are not applicable during the Startup phase of a Hit- branch connection because the connection is not in Slow-Start. An implementation MUST NOT apply Hystart-style exit triggers to the Hit- branch Startup phase. 5.2. Estimation Phase The Estimation phase begins immediately after the Startup phase ends and continues until connection termination. During the Estimation phase the connection's congestion control algorithm operates without modification. The sender MUST continue to suppress rwnd as defined in Section 5.4. The sender MUST run GCBE (Section 6) to maintain bw_est and MUST maintain RTTmin as the minimum of all valid RTT samples observed for the connection. 5.3. Termination Phase When the connection terminates -- whether gracefully via FIN exchange or abnormally via RST or local abort -- the sender SHALL attempt a cache update as specified in Section 4.4. The update MUST be applied to the cache bucket corresponding to the connection's peer_IP, regardless of whether the connection used the Hit, Miss, or Collision branch in its Startup phase, except that the update MUST NOT be performed if any of the conditions enumerated in Section 4.4 hold. Guo, et al. Expires 19 November 2026 [Page 10] Internet-Draft Stateful-TCP May 2026 5.4. Receiver Window Suppression During the Startup and Estimation phases, the sender MUST compute its effective send window as equal to cwnd, ignoring rwnd, except in the case described below. When rwnd equals zero, the sender MUST honour the zero window: no new data may be transmitted until the receiver advertises a non-zero rwnd, exactly as required by [RFC9293]. Stateful-TCP MUST NOT override the zero-window semantics. The justification for suppressing rwnd outside the zero-window case is that on contemporary end systems the receiver is, in practice, almost always able to consume incoming data faster than the network delivers it, so flow-control back-pressure is rarely required; allowing a small initial rwnd to cap the sending rate would limit the performance gains of Stateful-TCP. 6. Gap-Compensated Bandwidth Estimation (GCBE) 6.1. Rationale A bandwidth estimator that simply divides the bytes acknowledged in a measurement window by the duration of that window may underestimate the path bandwidth if the sender's transmission has been paused due to cwnd exhaustion. The pause shows up as a long inter-ACK gap, which inflates the denominator of the estimate but not the numerator. This effect is most pronounced when cwnd is small, e.g., during Slow- Start. GCBE compensates by removing, from each measurement cycle, the single ACK whose inter-arrival time is the longest; both its acknowledged bytes and the gap it spans are excluded from the estimate. Note that GCBE does not replace the congestion control algorithm's bandwidth estimator. It operates in parallel and the estimated bandwidth is only used for cache update as described in Section 5.3. 6.2. Variables i: Index of an estimation cycle; i = 0, 1, 2, ... t_i: Wall-clock time at which cycle i begins. d: Current SRTT value, in seconds. n_i: Number of ACKs received during cycle i. h_{i,j}: Wall-clock arrival time of ACK j in cycle i, for j = 0, 1, Guo, et al. Expires 19 November 2026 [Page 11] Internet-Draft Stateful-TCP May 2026 ..., n_i - 1. a_{i,j}: Number of bytes newly acknowledged by ACK j in cycle i. z_i: Index, within cycle i, of the ACK whose inter-arrival time from its predecessor is the largest. c_i: Bandwidth estimate produced by cycle i, in bytes per second. L: Implementation-defined number of recent estimates retained. m: Total number of completed measurement cycles for the connection. bw_est: The bandwidth value that will be written to the cache on connection termination. 6.3. Cycle Boundary Cycle i begins at time t_i. When an ACK is received at time t such that t - t_i is greater than or equal to d (the current SRTT), the cycle SHALL end with that ACK. The next cycle SHALL begin immediately, with t_{i+1} set to t. 6.4. Per-Cycle Estimate Within cycle i, let z_i = argmax ( h_{i,j} - h_{i,j-1} ) j in [1, n_i - 1] The ACK at index 0 of the cycle is excluded from the estimate because the time at which the data it acknowledges was placed on the wire is not known to the sender. The ACK at index z_i is excluded because its long inter-arrival gap is taken to be a transmission-suspension gap. n_i - 1 ------- c_i = | sum a_{i,j} - a_{i, z_i} | j = 1 -------------------------------------------------- ( h_{i, n_i - 1} - h_{i, 0} ) - ( h_{i, z_i} - h_{i, z_i - 1} ) If n_i is less than or equal to an implementation-defined threshold, then the cycle does not contain enough samples to apply the formula above; the implementation MUST in that case skip the cycle (produce no estimate) and continue with the next cycle. Guo, et al. Expires 19 November 2026 [Page 12] Internet-Draft Stateful-TCP May 2026 6.5. Stored Value The implementation MUST retain the L most recent values of c_i. When the cache is updated at connection termination, the value bw_est SHALL be set to the maximum of the retained values: bw_est = max c_i i in last L cycles of the connection Taking the maximum, rather than the most recent value or a smoothed value, prevents an isolated congestion or loss event near the end of the connection from skewing the recorded estimate downward. If fewer than L cycles have completed by the time the connection terminates, the maximum is taken over the completed cycles only. If no cycle has completed, bw_est is unavailable and the cache MUST NOT be updated (see Section 4.4). 6.6. Pseudocode # Inputs: # - on_ack(ack): # called for every ACK that advances the cumulative # acknowledgment number, with: # ack.t = arrival time of the ACK # ack.bytes = bytes newly acknowledged by the ACK # srtt() = current SRTT in seconds # # Configurations: # minACK # minimum number of ACKs required # IW # initial congestion window (segments) # MSS # maximum segment size (bytes) # State (per connection): # t_i # start time of current cycle # acks[] # tuples (t, bytes) for the current cycle # ring # ring buffer of size L holding completed c_i # RTTmin # minimum RTT observed (UNAVAILABLE if none) initialize_gcbe(): t_i = now() acks = [] ring = empty ring buffer of capacity L on_ack(ack): acks.append( (ack.t, ack.bytes) ) if (ack.t - t_i) >= srtt(): # close the current cycle Guo, et al. Expires 19 November 2026 [Page 13] Internet-Draft Stateful-TCP May 2026 if length(acks) > minACK: # find z_i: index of the largest inter-arrival gap z = 1 max_gap = acks[1].t - acks[0].t for j in 2 .. length(acks) - 1: gap = acks[j].t - acks[j-1].t if gap > max_gap: max_gap = gap z = j sum_bytes = 0 for j in 1 .. length(acks) - 1: sum_bytes = sum_bytes + acks[j].bytes sum_bytes = sum_bytes - acks[z].bytes duration = acks[ length(acks) - 1 ].t - acks[0].t duration = duration - max_gap if duration > 0 and sum_bytes > 0: c_i = sum_bytes / duration ring.push(c_i) # start a new cycle t_i = ack.t acks = [] acks.append( (ack.t, ack.bytes) ) on_connection_close(): if ring is not empty: bw_est = max(ring) else: bw_est = UNAVAILABLE # Section 4.4 write conditions; cache MUST NOT be updated when: if bw_est == UNAVAILABLE: return NO_CACHE_UPDATE if RTTmin == UNAVAILABLE: return NO_CACHE_UPDATE if (bw_est * RTTmin) / MSS <= IW: return NO_CACHE_UPDATE return { bw_est, RTTmin } 6.7. Edge Cases * If SRTT changes during a cycle, the change applies starting from the next cycle. Guo, et al. Expires 19 November 2026 [Page 14] Internet-Draft Stateful-TCP May 2026 * If the path is non-bottlenecked during a cycle (cwnd never limits transmission), the longest inter-arrival gap identified by the formula is simply one of the regular inter-ACK gaps, and excluding it amounts to dropping a single sample. * If retransmissions occur within a cycle, the bytes newly acknowledged by an ACK (a_{i,j}) include only the bytes that the cumulative acknowledgment number advances past for the first time; bytes covered solely by SACK ranges [RFC2018] are not counted in a_{i,j}. Implementations that integrate GCBE with modern loss- detection mechanisms such as RACK-TLP [RFC8985] need to apply the same rule. 7. Operational Considerations 7.1. Pacing A Stateful-TCP sender MUST support sender pacing of outgoing data segments during the Hit branch of the Startup phase, at a rate equal to bw_est. In its absence, transmitting an enlarged initial cwnd at line rate is equivalent to the unpaced initial-cwnd schemes that motivated this work and would be expected to cause buffer overflow at the path bottleneck. 7.2. Relation to RFC 9040 (TCB Interdependence) [RFC9040] specifies a framework for sharing a small set of TCB variables, including SRTT and ssthresh, between connections to the same host. Stateful-TCP extends this framework with two additional shared items (bw_est and RTTmin) that are used specifically to drive the Startup phase of new connections. An implementation that already implements [RFC9040] MAY store the additional Stateful-TCP fields in the same per-destination state structure used for RFC 9040 sharing, provided that the conditions in Section 4.4 are observed for the additional fields. 7.3. Relation to TCP Fast Open TCP Fast Open (TFO, [RFC7413]) reduces startup latency by carrying data in the SYN. TFO and Stateful-TCP are orthogonal: TFO accelerates the first round-trip of a connection, while Stateful-TCP accelerates the bandwidth ramp-up that follows. Implementations MAY enable both simultaneously. Guo, et al. Expires 19 November 2026 [Page 15] Internet-Draft Stateful-TCP May 2026 7.4. Fairness A Hit-branch Stateful-TCP connection ramps to its target rate in one RTT instead of over many RTTs. When such a connection shares a bottleneck with one or more standard (Slow-Start) connections, the standard connections will experience the Stateful-TCP connection as a long-lived flow that is already at steady state, rather than as a newcomer that is still ramping up. This shifts the short-term throughput share towards the Stateful-TCP connection. The cited evaluation ([STATEFUL-TCP]) quantifies this effect for S-Cubic and reports that, at the link utilizations measured, both fairness among Stateful-TCP connections of the same kind and friendliness to standard connections remain within ranges considered acceptable for an experimental TCP variant. Implementers and operators SHOULD evaluate fairness in their own deployment environments before enabling Stateful-TCP at scale. 7.5. Aged Estimates The accuracy of a cached estimate degrades as time passes since the estimate was recorded, because path conditions may change. Implementations SHOULD bound the maximum age of an entry as required in Section 4.5. An overestimate larger than the true path bandwidth at the time a Hit-branch connection starts will be corrected by the connection's ordinary congestion control response after the first RTT, although it may cause additional queueing or retransmissions during the first RTT. An underestimate will reduce the bandwidth utilization of the Hit-branch connection but will not cause additional congestion. 8. Security Considerations Stateful-TCP introduces sender-local state that is updated and consumed without any additional information on the wire. It does not change the TCP authentication or integrity properties of the connection itself. Nevertheless, the introduction of cached, peer- keyed state raises several considerations. 8.1. Cache Poisoning The cache is keyed on the peer_IP that the sender observes for the connection that wrote the entry. An off-path attacker cannot directly write entries because doing so would require completing a TCP handshake with the sender while spoofing the victim peer's IP address, which is not possible without on-path or address-spoofing capability. Guo, et al. Expires 19 November 2026 [Page 16] Internet-Draft Stateful-TCP May 2026 An on-path attacker that can complete a TCP handshake from a spoofed peer_IP can induce the sender to record an arbitrary bw_est for that peer. The consequence on a subsequent connection is bounded: an inflated bw_est causes at most one round-trip of paced over- transmission to that peer before ordinary congestion control reasserts itself; a deflated bw_est causes at most one round-trip of under-utilization. Implementations SHOULD consider these bounds when choosing cache expiry policies in environments where on-path attackers are part of the threat model. 8.2. Cache Exhaustion A peer that initiates connections from many distinct source addresses can cause an unbounded cache to grow without bound. As required in Section 4.5, implementations MUST bound cache memory and SHOULD use an LRU or equivalent eviction policy. 9. IANA Considerations This document has no IANA actions. 10. Acknowledgments The mechanism specified here was originally proposed and evaluated in [STATEFUL-TCP]. 11. References 11.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, . [RFC6298] Paxson, V., Allman, M., Chu, J., and M. Sargent, "Computing TCP's Retransmission Timer", RFC 6298, DOI 10.17487/RFC6298, June 2011, . [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, . Guo, et al. Expires 19 November 2026 [Page 17] Internet-Draft Stateful-TCP May 2026 [RFC9040] Touch, J., Welzl, M., and S. Islam, "TCP Control Block Interdependence", RFC 9040, DOI 10.17487/RFC9040, July 2021, . [RFC9293] Eddy, W., Ed., "Transmission Control Protocol (TCP)", STD 7, RFC 9293, DOI 10.17487/RFC9293, August 2022, . [RFC9438] Xu, L., Ha, S., Rhee, I., Goel, V., and L. Eggert, Ed., "CUBIC for Fast and Long-Distance Networks", RFC 9438, DOI 10.17487/RFC9438, August 2024, . 11.2. Informative References [BBR] Cardwell, N., Cheng, Y., Gunn, C. S., Yeganeh, S. H., and V. Jacobson, "BBR: Congestion-Based Congestion Control", ACM Queue, vol. 14, no. 5, September 2016. [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP Selective Acknowledgment Options", RFC 2018, DOI 10.17487/RFC2018, October 1996, . [RFC2140] Touch, J., "TCP Control Block Interdependence", RFC 2140, DOI 10.17487/RFC2140, April 1997, . [RFC3742] Floyd, S., "Limited Slow-Start for TCP with Large Congestion Windows", RFC 3742, DOI 10.17487/RFC3742, March 2004, . [RFC6928] Chu, J., Dukkipati, N., Cheng, Y., and M. Mathis, "Increasing TCP's Initial Window", RFC 6928, DOI 10.17487/RFC6928, April 2013, . [RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014, . [RFC8985] Cheng, Y., Cardwell, N., Dukkipati, N., and P. Jha, "The RACK-TLP Loss Detection Algorithm for TCP", RFC 8985, DOI 10.17487/RFC8985, February 2021, . Guo, et al. Expires 19 November 2026 [Page 18] Internet-Draft Stateful-TCP May 2026 [RFC9406] Balasubramanian, P., Huang, Y., and M. Olson, "HyStart++: Modified Slow Start for TCP", RFC 9406, DOI 10.17487/RFC9406, May 2023, . [STATEFUL-TCP] Guo, L. and J. Y. B. Lee, "Stateful-TCP - A New Approach to Accelerate TCP Slow-Start", IEEE Access, vol. 8, pp. 195955-195970, DOI 10.1109/ACCESS.2020.3032208, October 2020, . [WESTWOOD] Casetti, C., Gerla, M., Mascolo, S., Sanadidi, M. Y., and R. Wang, "TCP Westwood: Bandwidth Estimation for Enhanced Transport over Wireless Links", Proc. ACM MobiCom, pp. 287-297, July 2001. Authors' Addresses Lingfeng Guo The Chinese University of Hong Kong Department of Information Engineering Shatin, N.T. Hong Kong Email: gl016@ie.cuhk.edu.hk Feiyu Xue The Chinese University of Hong Kong Department of Information Engineering Shatin, N.T. Hong Kong Email: xf024@ie.cuhk.edu.hk Jack Y. B. Lee The Chinese University of Hong Kong Department of Information Engineering Shatin, N.T. Hong Kong Email: jacklee@computer.org Guo, et al. Expires 19 November 2026 [Page 19]