Internet-Draft PipeStream March 2026
Rickert Expires 2 September 2026 [Page]
Workgroup:
Individual Submission
Internet-Draft:
draft-krickert-pipestream-01
Published:
Intended Status:
Standards Track
Expires:
Author:
K. Rickert
PipeStream AI

PipeStream: A Recursive Entity Streaming Protocol for Distributed Processing over QUIC

Abstract

This document specifies PipeStream, a recursive entity streaming protocol designed for distributed document processing over QUIC transport. PipeStream enables the decomposition ("dehydration") of documents into constituent entities, their transmission across distributed processing nodes, and subsequent rehydration at destination endpoints.

The protocol employs a dual-stream architecture consisting of a data stream for entity payload transmission and a control stream for tracking entity completion status and maintaining consistency. PipeStream defines four hierarchical data layers for entity representation: BlobBag for raw binary data, SemanticLayer for annotated content with metadata, ParsedData for structured extracted information, and CustomEntity for application-specific extensions.

PipeStream is organized into three protocol layers: Layer 0 (Core) provides basic streaming with dehydrate/rehydrate semantics; Layer 1 (Recursive) adds hierarchical scoping and digest propagation; Layer 2 (Resilience) adds yield/resume, claim checks, and completion policies. Implementations MUST support Layer 0 and MAY support Layers 1 and 2.

To ensure consistency across distributed processing pipelines, PipeStream implements checkpoint blocking, whereby processing nodes MUST synchronize at defined points before proceeding. This mechanism guarantees that all constituent parts of a dehydrated document are successfully processed before rehydration operations commence.

About This Document

This note is to be removed before publishing as an RFC.

Status information for this document may be found at https://datatracker.ietf.org/doc/draft-krickert-pipestream/.

Discussion of this document takes place on the Individual Group mailing list (mailto:kristian.rickert@pipestream.ai).

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 2 September 2026.

Table of Contents

1. Introduction

1.1. Problem Statement

Distributed document processing pipelines face significant challenges when handling large, complex documents that require multiple stages of transformation, analysis, and enrichment. Traditional batch processing approaches require entire documents to be loaded into memory, processed sequentially, and transmitted in their entirety between processing stages. This methodology introduces substantial latency, excessive memory consumption, and poor utilization of distributed computing resources.

Modern document processing workflows increasingly demand the ability to:

  • Process documents incrementally as data becomes available

  • Distribute processing load across heterogeneous worker nodes

  • Maintain consistency guarantees across parallel processing paths

  • Handle documents of arbitrary size without memory constraints

  • Support recursive decomposition where document parts may themselves be decomposed

  • Scale from single documents to collections of millions of documents

Current approaches based on batch processing and store-and-forward architectures are inefficient for large documents and fail to exploit the inherent parallelism available in distributed processing environments. Furthermore, existing streaming protocols do not provide the consistency semantics required for document processing where the integrity of the rehydrated output depends on the successful processing of all constituent parts.

1.2. PipeStream Overview

PipeStream addresses these challenges by defining a streaming protocol that enables incremental processing with strong consistency guarantees. The protocol is built upon QUIC [RFC9000] transport, leveraging its native support for multiplexed streams, low-latency connection establishment, and reliable delivery semantics.

The fundamental innovation of PipeStream is its treatment of documents as recursive compositions of entities. A document MAY be decomposed into multiple entities, each of which MAY itself be further decomposed, creating a tree structure of processing tasks. This recursive decomposition enables fine-grained parallelism while the protocol's control stream mechanism ensures that all branches of the decomposition tree are tracked and synchronized.

PipeStream employs a dual-stream design:

  1. Data Stream: Carries entity payloads through the processing pipeline. Entities flow through this stream with minimal buffering, enabling low-latency incremental processing.

  2. Control Stream: Carries control information tracking the status of entity decomposition and rehydration. The control stream ensures that all parts of a dehydrated document are accounted for before rehydration proceeds.

1.3. Design Philosophy

PipeStream implements a recursive scatter-gather pattern [scatter-gather] over QUIC streams. A document is "dehydrated" (scattered) at the source into constituent entities, these entities are transmitted and processed in parallel across distributed pipeline stages, and finally the entities are "rehydrated" (gathered) at the destination to reconstitute the complete processed document. The checkpoint blocking mechanism (Section 9.3) provides barrier synchronization semantics analogous to the barrier pattern in parallel computing.

This approach provides several advantages:

  • Incremental Processing: Processing nodes MAY begin work on early entities before the complete document has been transmitted.

  • Parallelism: Independent entities MAY be processed concurrently across multiple worker nodes.

  • Memory Efficiency: No single node is required to hold the complete document in memory.

  • Fault Isolation: Failures in processing individual entities can be detected, reported, and potentially retried without affecting other entities.

  • Consistency: The checkpoint blocking mechanism ensures that rehydration operations proceed only when all constituent parts have been successfully processed.

1.4. Protocol Layering

PipeStream is organized into three protocol layers to accommodate varying deployment requirements:

Table 1
Protocol Layer Name Description
Layer 0 Core Basic streaming, dehydrate/rehydrate, checkpoint
Layer 1 Recursive Hierarchical scopes, digest propagation, barriers
Layer 2 Resilience Yield/resume, claim checks, completion policies

Implementations MUST support Layer 0. Support for Layers 1 and 2 is OPTIONAL and negotiated during connection establishment.

1.5. Scope

This document specifies the PipeStream protocol including message formats, state machines, error handling, and the interaction between data and control streams. The document defines the four standard data layers but does not mandate specific processing semantics, which are left to application-layer specifications.

2. Terminology

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

2.1. Protocol Entities

Entity

The fundamental unit of data flowing through a PipeStream pipeline. An Entity represents either a complete document or a constituent part of a decomposed document. Each Entity possesses a unique identifier within its processing scope and carries payload data in one of the four defined Layer formats. Entities are immutable once created; transformations produce new Entities rather than modifying existing ones.

Document

A logical unit of content submitted to a PipeStream pipeline for processing. A Document enters the pipeline as a single root Entity and MAY be decomposed into multiple Entities during processing. The Document is considered complete when its root Entity (or the rehydrated result of its decomposition) exits the pipeline.

Scope

A hierarchical namespace for Entity IDs. Each scope maintains its own Entity ID space, cursor, and Assembly Manifest. Scopes enable collections to contain documents, documents to contain parts, and parts to contain jobs, each with independent ID management. (Protocol Layer 1)

2.2. Dehydration and Rehydration

Scatter-Gather

The distributed processing pattern implemented by PipeStream. A single input is "scattered" (dehydrated) into multiple parts for parallel processing, and the results are "gathered" (rehydrated) back into a single output. PipeStream extends classical scatter-gather with recursive nesting: any scattered part may itself be scattered further.

Dehydrate (Scatter)

The operation of decomposing a document or Entity into multiple constituent Entities for parallel or distributed processing. When an Entity is dehydrated, the originating node MUST create an Assembly Manifest entry recording the identifiers of all resulting sub-entities. The dehydration operation is recursive; a sub-entity produced by dehydration MAY itself be dehydrated, creating a tree of decomposition. Dehydration transitions data from a solid state (a single stored record) to a fluid state (multiple in-flight entities).

Rehydrate (Gather)

The operation of reassembling multiple Entities back into a single composite Entity or Document. A rehydrate operation MUST NOT proceed until all constituent Entities listed in the corresponding Assembly Manifest entry have been received and processed (or handled according to the Completion Policy). Rehydration transitions data from a fluid state back to a solid state.

Solid State

A document or Entity that exists as a complete, stored record -- either at rest in storage or as a single root Entity entering or exiting a pipeline. Contrast with "fluid state".

Fluid State

A document that has been decomposed into multiple in-flight Entities being processed in parallel across distributed nodes. A document is in the fluid state between dehydration and rehydration. Contrast with "solid state".

2.3. Consistency Mechanisms

Checkpoint

A synchronization point in the processing pipeline where all in-flight Entities MUST reach a consistent state before processing may continue. A checkpoint is considered "satisfied" when all Assembly Manifest entries created before the checkpoint have been resolved.

Barrier

A synchronization point scoped to a specific subtree. Unlike checkpoints which are global, barriers block only entities dependent on a specific parent's descendants. (Protocol Layer 1)

Control Stream

The control stream that tracks Entity completion status throughout the processing pipeline. The Control Stream is transmitted on a dedicated QUIC stream parallel to the data streams.

Assembly Manifest

A data structure within the Control Stream that tracks the relationship between a composite Entity and its constituent sub-entities produced by dehydration.

Cursor

A pointer to the lowest unresolved Entity ID within a scope. Entity IDs behind the cursor are considered resolved and MAY be recycled. The cursor enables efficient ID space management without global coordination.

2.4. Resilience Mechanisms (Protocol Layer 2)

Yield

A temporary pause in Entity processing, typically due to external dependencies (API calls, rate limiting, human approval). A yielded Entity carries a continuation token enabling resumption without reprocessing.

Claim Check

A detached reference to a deferred Entity that can be queried or resumed independently, potentially in a different session. Claim checks enable asynchronous processing patterns and retry queues.

Completion Policy

A configuration specifying how to handle partial failures during dehydration. Policies include STRICT (all must succeed), LENIENT (continue with partial results), BEST_EFFORT (complete with whatever succeeds), and QUORUM (require minimum success ratio).

2.5. Data Representation

Data Layer

One of four defined representations for Entity payload data:

  1. BlobBag: Raw binary data with minimal metadata

  2. SemanticLayer: Annotated content with structural and semantic metadata

  3. ParsedData: Structured information extracted from document content

  4. CustomEntity: Application-specific extension Layer

2.6. Additional Terms

Pipeline

A configured sequence of processing stages through which Entities flow.

Processor

A node in the mesh that performs operations on entities (e.g., transformation, dehydration, or rehydration).

Sink

A terminal stage in a pipeline where rehydrated documents are persisted or delivered to an external system.

Stage

A single processing step within a Pipeline.

Scope Digest

A cryptographic summary (Merkle root) of all Entity statuses within a completed scope, propagated to parent scopes for efficient verification. (Protocol Layer 1)

3. Protocol Layers

PipeStream defines three protocol layers that build upon each other. This layered approach allows simple deployments to use only the core protocol while complex deployments can leverage advanced features.

3.1. Layer 0: Core Protocol

Layer 0 provides the fundamental streaming capabilities:

  • Unified Control Frame (UCF) header (1-octet type)

  • Status frame (8-octet bit-packed frame)

  • Entity frame (header + payload)

  • Status codes: PENDING, PROCESSING, COMPLETE, FAILED, CHECKPOINT

  • Assembly Manifest for parent-child tracking

  • Cursor-based Entity ID recycling

  • Single-level dehydrate/rehydrate

  • Checkpoint blocking

All implementations MUST support Layer 0.

3.2. Layer 1: Recursive Extension

Layer 1 adds hierarchical processing capabilities:

  • Scoped Entity ID namespaces (collection -> document -> part -> job)

  • Explicit Depth tracking in status frames

  • SCOPE_DIGEST for Merkle-based subtree completion

  • BARRIER for subtree-scoped synchronization

  • Nested dehydration with depth tracking

Layer 1 is OPTIONAL. Implementations advertise Layer 1 support during capability negotiation.

3.3. Layer 2: Resilience Extension

Layer 2 adds fault tolerance and async processing:

  • YIELDED status with continuation tokens

  • DEFERRED status with claim checks

  • RETRYING, SKIPPED, ABANDONED statuses

  • Completion policies (STRICT, LENIENT, BEST_EFFORT, QUORUM)

  • Claim check query/response frames

  • Stopping point validation

Layer 2 is OPTIONAL and requires Layer 1. Implementations advertise Layer 2 support during capability negotiation.

3.4. Capability Negotiation

During CONNECT, endpoints exchange supported capabilities:

message Capabilities {
  bool layer0_core = 1;           // Always true
  bool layer1_recursive = 2;      // Scoped IDs, digests
  bool layer2_resilience = 3;     // Yield, claim checks
  uint32 max_scope_depth = 4;     // Default: 7 (8 levels, 0-7)
  uint32 max_entities_per_scope = 5;  // Default: 4,294,967,294
                                      // (2^32-2)
  uint32 max_window_size = 6;     // Default: 2,147,483,648 (2^31)
}

Peers negotiate down to common capabilities. If Layer 2 is requested but Layer 1 is not supported, Layer 2 MUST be disabled.

4. Protocol Overview

This section provides a high-level overview of the PipeStream protocol architecture, design principles, and operational model.

4.1. Design Goals

4.1.1. True Streaming Processing

PipeStream MUST enable true streaming document processing where entities are transmitted and processed incrementally as they become available. Implementations MUST NOT buffer complete documents before initiating transmission.

4.1.2. Recursive Decomposition

The protocol MUST support recursive decomposition of entities, wherein a single input entity MAY produce zero, one, or many output entities.

4.1.3. Checkpoint Consistency

PipeStream MUST provide checkpoint blocking semantics to maintain processing consistency across distributed workers.

4.1.4. Control and Data Plane Separation

The protocol MUST maintain strict separation between the control plane (control stream) and the data plane (entities).

4.1.5. QUIC Foundation

PipeStream MUST be implemented over QUIC [RFC9000] to leverage:

  • Native stream multiplexing without head-of-line blocking

  • Built-in flow control at both connection and stream levels

  • TLS 1.3 security by default

  • Connection migration capabilities

4.1.6. Multi-Layer Data Representation

The protocol MUST support four distinct data representation layers:

Table 2
Layer Name Description
0 BlobBag Raw binary data with metadata
1 SemanticLayer Annotated content with embeddings
2 ParsedData Structured extracted information
3 CustomEntity Application-specific extension

4.2. Architecture Summary

PipeStream uses a dual-stream architecture within a single QUIC connection between a Client (Producer) and Server (Consumer):

Table 3
Stream Type Plane Content
Stream 0 Bidirectional Control STATUS, SCOPE_DIGEST, BARRIER, CAPABILITIES, CHECKPOINT
Streams 2+ Unidirectional Data Entity frames (Header + Payload)

4.3. Connection Lifecycle

A PipeStream connection follows this lifecycle:

  1. Establishment: Client initiates QUIC connection with ALPN identifier "pipestream/1"

  2. Capability Exchange: Client and server exchange supported protocol layers and limits

  3. Control Stream Initialization: Client opens Stream 0 as bidirectional Control Stream

  4. Entity Streaming: Entities are transmitted per Sections 5 and 6

  5. Termination: Connection closes via QUIC CONNECTION_CLOSE or application-level shutdown

5. QUIC Stream Mapping

PipeStream leverages the native multiplexing capabilities of QUIC [RFC9000] to provide a clean separation between control coordination and data transmission.

5.1. Control Stream (Stream 0)

The Control Stream provides the control plane for PipeStream operations.

5.1.1. Stream Identification

The Control Stream MUST use QUIC Stream ID 0, which per RFC 9000 is a bidirectional, client-initiated stream.

5.1.2. Usage Rules

  1. The Control Stream MUST be opened immediately upon connection establishment.

  2. Capability negotiation (Section 3.4) MUST occur on Stream 0 before any Entity Streams are opened.

  3. Stream 0 MUST NOT carry entity payload data.

  4. Implementations SHOULD assign the Control Stream a high priority to ensure timely delivery of status updates.

5.1.3. Flow Control Considerations

The Control Stream carries small, bit-packed control frames. STATUS frames are 12 octets base. Implementations MUST ensure adequate flow control credits:

  • The initial MAX_STREAM_DATA for Stream 0 SHOULD be at least 8192 octets.

  • Implementations SHOULD NOT block Entity Stream transmission due to Control Stream flow control exhaustion.

5.1.4. Heartbeat Mechanism

QUIC already provides native transport liveness signals (for example, PING and idle timeout handling). Implementations SHOULD rely on those transport mechanisms for connection liveness.

PipeStream heartbeat frames are OPTIONAL and are intended for application-level responsiveness checks (for example, detecting stalled processing logic even when the transport remains healthy). When used, an endpoint sends a STATUS frame with all fields set to their heartbeat values:

Table 4
Field Value Description
Type 0x50 (STATUS)  
Stat 0x0 (UNSPECIFIED) Heartbeat signal
Entity ID 0xFFFFFFFF CONNECTION_LEVEL
Scope ID 0x0000 Root scope
Reserved 0x0000 MUST be zero

When no status updates have been transmitted for KEEPALIVE_TIMEOUT (default: 30 seconds), an endpoint MAY send a heartbeat frame. If no data is received on Stream 0 for 3 * KEEPALIVE_TIMEOUT, the endpoint SHOULD first apply transport-native liveness policy; it MAY close the connection with PIPESTREAM_IDLE_TIMEOUT (0x02) when application-level inactivity policy requires it.

5.1.5. Transport Session vs. Application Session Context

The session-id segment identifies application context for detached or resumable resources (for example, Layer 2 yield/claim-check flows). PipeStream Layer 0 streaming semantics do not depend on this URI scheme.

5.2. Entity Streams (Streams 2+)

Entity Streams carry the actual document entity data.

5.2.1. Unidirectional Data Flow

Entity Streams MUST be unidirectional streams:

Table 5
Stream Type Client to Server Server to Client
Client-Initiated 4n + 2 (n >= 0) 2, 6, 10, 14, ...
Server-Initiated 4n + 3 (n >= 0) 3, 7, 11, 15, ...

5.2.2. One Entity Per Stream

  1. Each Entity Stream MUST carry exactly one entity.

  2. The entity_id in the Entity Frame header MUST be unique within its scope.

  3. Once an entity has been completely transmitted, the sender MUST close the stream.

5.3. Transport Error Mapping

PipeStream error signaling on Stream 0 and QUIC transport signals are complementary. Endpoints SHOULD bridge them so peers receive both transport-level and protocol-level context.

  1. If an Entity Stream is aborted with RESET_STREAM or STOP_SENDING, the endpoint SHOULD emit a corresponding terminal status (FAILED, ABANDONED, or policy-driven equivalent) for that entity on Stream 0.

  2. If PipeStream determines a terminal entity error first (for example, checksum failure or invalid frame), the endpoint SHOULD abort the affected Entity Stream with an appropriate QUIC error and emit the corresponding PipeStream status/error context on Stream 0.

  3. If Stream 0 is reset or becomes unusable, endpoints SHOULD treat this as a control-plane failure and close the connection with PIPESTREAM_CONTROL_RESET (0x03).

  4. On QUIC connection termination (CONNECTION_CLOSE), entities without a previously observed terminal status MUST be treated as failed by local policy.

6. Frame Formats

This section defines the wire formats for PipeStream frames. All multi-octet integer fields are encoded in network byte order (big-endian).

6.1. Control Stream Framing (Stream 0)

To support mixed content (bit-packed frames and Protobuf messages) on the Control Stream, PipeStream uses a Unified Control Frame (UCF) header.

6.1.1. UCF Header

Every message on Stream 0 MUST begin with a 1-octet Frame Type.

Table 6
Value Frame Class Length Encoding Description
0x50-0x7F Fixed No length prefix Bit-packed control frames with type-defined sizes
0x80-0xFF Variable 4-octet Length + N Variable-size Protobuf-encoded control messages

For Fixed frames, the receiver determines frame size from the Frame Type value. For Variable frames, the Type is followed by a 4-octet unsigned integer (big-endian) indicating the length of the Protobuf message that follows.

Variable-frame Length (32 bits):

The payload length in octets, excluding the 1-octet Type and the 4-octet Length field. Receivers MUST reject lengths greater than 16,777,215 octets (16 MiB - 1) with PIPESTREAM_ENTITY_TOO_LARGE (0x06).

6.1.2. Fixed Frame Sizes

The following fixed-size frame types are defined by this document:

Table 7
Type Name Total Size Notes
0x50 STATUS 12 octets (base) 16 octets when C=1; larger when E=1 with extension data
0x54 SCOPE_DIGEST 68 octets Includes 32-octet Merkle root and 64-bit counters
0x55 BARRIER 8 octets No variable extension

6.2. Status Frames (Layer 0)

6.2.1. Status Frame Format (0x50)

The Status Frame reports lifecycle transitions for entities.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Type (0x50)  |Stat(4)|E|C|D(3) |        Flags (15 bits)      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                       Entity ID (32 bits)                     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |        Scope ID (16 bits)       |      Reserved (16 bits)     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Stat (4 bits):

Status code (see Section 6.2.2).

E (1 bit):

Extended frame flag. Additional extension data follows (Section 6.5).

C (1 bit):

Cursor update flag. A 4-octet cursor value follows (Section 6.2.3).

D (3 bits):

Explicit scope nesting depth (0-7). 0=Root. Layer 1.

Flags (15 bits):

Reserved for future use. MUST be zero when sent and MUST be ignored by receivers.

Entity ID (32 bits):

Unsigned integer identifying the entity.

Scope ID (16 bits):

Identifier for the scope to which this entity belongs.

Reserved (16 bits):

Reserved for future use. MUST be zero when sent and MUST be ignored by receivers.

6.2.2. Status Codes

Table 8
Value Name Layer Description
0x0 UNSPECIFIED - Protobuf default / heartbeat signal
0x1 PENDING 0 Entity announced, not yet transmitting
0x2 PROCESSING 0 Entity transmission in progress
0x3 COMPLETE 0 Entity successfully processed
0x4 FAILED 0 Entity processing failed
0x5 CHECKPOINT 0 Synchronization barrier
0x6 DEHYDRATING 0 Dehydrating into children
0x7 REHYDRATING 0 Rehydrating children
0x8 YIELDED 2 Paused with continuation token
0x9 DEFERRED 2 Detached with claim check
0xA RETRYING 2 Retry in progress
0xB SKIPPED 2 Intentionally skipped
0xC ABANDONED 2 Timed out

The base STATUS frame is 12 octets. When C=1, a 4-octet cursor value follows (total 16 octets before any E=1 extension data). When E=1, additional extension bytes follow as defined in Section 6.5.

6.2.3. Cursor Update Extension

When C=1, a 4-octet cursor update follows the status frame:

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                  New Cursor Value (32 bits)                   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
New Cursor Value (32 bits):

The numeric value of the new cursor. Entities with IDs lower than this value (modulo circular ID rules) are considered resolved and their IDs MAY be recycled.

6.3. Scope Digest Frame (0x54)

When Protocol Layer 1 is negotiated, a scope completion is summarized:

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Type (0x54)  |  Flags (8)      |        Scope ID (16)        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   |                   Entities Processed (64 bits)                |
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   |                   Entities Succeeded (64 bits)                |
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   |                    Entities Failed (64 bits)                  |
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   |                    Entities Deferred (64 bits)                |
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   |                    Merkle Root (256 bits)                     |
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Flags (8 bits):

Reserved for future use. MUST be zero when sent and MUST be ignored by receivers.

Scope ID (16 bits):

Identifier of the scope being summarized.

Entities Processed (64 bits):

The total number of entities that were processed within the scope.

Entities Succeeded (64 bits):

The number of entities that reached a terminal success state.

Entities Failed (64 bits):

The number of entities that reached a terminal failure state.

Entities Deferred (64 bits):

The number of entities that were deferred via claim checks.

Merkle Root (256 bits):

The SHA-256 Merkle root covering all entity statuses in the scope (see Section 9.4).

The SCOPE_DIGEST frame is 68 octets total. The Scope ID MUST match the 16-bit identifier defined in Section 6.2.1.

6.4. Barrier Frame (0x55)

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Type (0x55)  |S|  Reserved (7) |        Barrier ID (16)      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    Parent Entity ID (32 bits)                 |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
S (1 bit):

Status (0 = waiting, 1 = released).

Reserved (7 bits):

Reserved for future use. MUST be zero when sent and MUST be ignored by receivers.

Barrier ID (16 bits):

Identifier for the barrier within the scope.

Parent Entity ID (32 bits):

The identifier of the parent entity whose sub-tree is blocked by this barrier.

6.5. Yield and Claim Check Extensions (Layer 2)

When E=1 in a status frame, extension data follows. The length of extension data is determined by the Status code.

If E=1 is set for a Status code that does not define an extension layout in this specification (or a negotiated extension), the receiver MUST treat the frame as malformed and fail processing with PIPESTREAM_ENTITY_INVALID (0x05).

6.5.1. Yield Extension (Stat = 0x8)

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Yield Reason  |           Token Length (24 bits)              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   |                  Yield Token (variable)                       |
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Yield Reason (8 bits):

The reason for yielding (see Section 6.5.1.1).

Token Length (24 bits):

The length of the Yield Token in bytes (maximum 16,777,215).

Yield Token (variable):

The opaque continuation state.

6.5.1.1. Yield Reason Codes
Table 9
Value Name Description
0x1 EXTERNAL_CALL Waiting on external service
0x2 RATE_LIMITED Voluntary throttle
0x3 AWAITING_SIBLING Waiting for specific sibling
0x4 AWAITING_APPROVAL Human/workflow gate
0x5 RESOURCE_BUSY Semaphore/lock
0x0, 0x06-0xFF Reserved Reserved for future use

6.5.2. Claim Check Extension (Stat = 0x9)

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   |                    Claim Check ID (64 bits)                   |
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   |                Expiry Timestamp (64 bits, Unix micros)        |
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Claim Check ID (64 bits):

A cryptographically secure random identifier for the claim.

Expiry Timestamp (64 bits):

Unix epoch timestamp in microseconds when the claim expires.

6.6. Protobuf-Encoded Messages (0x80-0xFF)

Messages in this range are preceded by a 4-octet length field.

Table 10
Type Message Name Reference
0x80 Capabilities Section 3.4
0x81 Checkpoint Section 9.3

6.7. Entity Frames

Entity frames carry the actual document entity data on Entity Streams.

6.7.1. Entity Frame Structure

   +---------------------------+
   |    Header Length (4)      |   4 octets, big-endian uint32
   +---------------------------+
   |                           |
   |    Header (Protobuf)      |   Variable length
   |                           |
   +---------------------------+
   |                           |
   |    Payload                |   Variable length (per header)
   |                           |
   +---------------------------+
Header Length (4 octets):

The length of the Protobuf-encoded EntityHeader in bytes.

Header (Protobuf):

The serialized EntityHeader message (see Section 6.7.2).

Payload (variable):

The raw entity data.

6.7.2. Entity Header (Protobuf)

message EntityHeader {
  uint32 entity_id = 1;         // Scope-local identifier
  uint32 parent_id = 2;         // 0 for root entities
  uint32 scope_id = 3;          // Layer 1: scope identifier
  uint32 layer = 4;             // Data layer (0-3)
  string content_type = 5;      // MIME type
  uint64 payload_length = 6;
  bytes checksum = 7;           // SHA-256 (32 bytes)
  map<string, string> metadata = 8;
  ChunkInfo chunk_info = 9;
  CompletionPolicy completion_policy = 10; // Layer 2: failure handling
}

6.7.3. Checksum Algorithm

PipeStream uses SHA-256 [FIPS-180-4] for payload integrity verification. The checksum MUST be exactly 32 octets.

7. Entity Model

7.1. Core Fields

Every PipeStream entity is represented as a PipeDoc message:

Table 11
Field Type Requirement Description
doc_id string REQUIRED Unique document identifier (UUID recommended)
entity_id uint32 REQUIRED Scope-local identifier
ownership OwnershipContext OPTIONAL Multi-tenancy tracking

7.2. Four Data Layers

Each PipeDoc carries entity payload in one of four data layers:

Table 12
Layer Name Content
0 BlobBag Raw binary data: original document bytes, images, attachments
1 SemanticLayer Annotated content: text segments with vector embeddings, NLP annotations, NER, classifications
2 ParsedData Structured extraction: key-value pairs, tables, structured fields
3 CustomEntity Extension point: domain-specific protobuf via google.protobuf.Any

7.3. Cloud-Agnostic Storage Reference

message FileStorageReference {
  string provider = 1;           // Storage provider identifier
  string bucket = 2;             // Bucket/container name
  string key = 3;                // Object key/path
  string region = 4;             // Optional region hint
  map<string, string> attrs = 5; // Provider-specific attributes
  EncryptionMetadata encryption = 6;
}

message EncryptionMetadata {
  string algorithm = 1;          // "AES-256-GCM", "AES-256-CBC"
  string key_provider = 2;       // "aws-kms", "azure-keyvault",
                                 // "gcp-kms", "vault"
  string key_id = 3;             // Key ARN/URI/ID
  bytes wrapped_key = 4;         // Optional: client-side encrypted DEK
  bytes iv = 5;                  // Initialization vector
  map<string, string> context = 6; // Encryption context
}

8. Protocol Operations

This section defines the protocol-level operations that PipeStream endpoints perform during a session. These operations describe the phases of a PipeStream session lifecycle, from connection establishment through entity processing to terminal consumption.

8.1. Overview

A PipeStream session proceeds through four sequential actions:

                +---------------------------------------------+
                |           PipeStream Action Flow            |
                +---------------------------------------------+
                                     |
                                     v
                +---------------------------------------------+
                |                  CONNECT                    |
                |    (Session + Capability Negotiation)       |
                +---------------------------------------------+
                                     |
                                     v
                +---------------------------------------------+
                |                   PARSE                     |
                |        (Dehydration: 1:N possible)         |
                +---------------------------------------------+
                                     |
                       +-------------+-------------+
                       v             v             v
                +-----------+ +-----------+ +-----------+
                |  PROCESS  | |  PROCESS  | |  PROCESS  |
                |   (1:1)   | |   (1:1)   | |   (N:1)   |
                +-----------+ +-----------+ +-----------+
                       |             |             |
                       +-------------+-------------+
                                     |
                                     v
                +---------------------------------------------+
                |                   SINK                      |
                |          (Terminal Consumption)             |
                +---------------------------------------------+
Table 13
Phase Action Cardinality Description
1 CONNECT 1:1 Session establishment and capability negotiation
2 PARSE 1:N Dehydration: decompose input into entities
3 PROCESS 1:1 or N:1 Transform, rehydrate, aggregate, or pass through entities (parallel)
4 SINK N:1 Terminal consumption: index, store, or notify

8.2. CONNECT Action

The CONNECT action establishes the session with capability negotiation.

8.2.1. ALPN Identifier

ALPN Protocol ID: pipestream/1

8.2.2. Capability Exchange

Immediately after QUIC handshake, peers exchange Capabilities messages on Stream 0.

8.3. PARSE Action

The PARSE action performs dehydration with optional completion policy:

message CompletionPolicy {
  CompletionMode mode = 1;
  uint32 max_retries = 2;        // Default: 3
  uint32 retry_delay_ms = 3;     // Default: 1000
  uint32 timeout_ms = 4;         // Default: 300000 (5 min)
  float min_success_ratio = 5;   // For QUORUM mode
  FailureAction on_timeout = 6;
  FailureAction on_failure = 7;
}

enum CompletionMode {
  COMPLETION_MODE_UNSPECIFIED = 0;  // Default; treat as STRICT
  COMPLETION_MODE_STRICT = 1;       // All children MUST complete
  COMPLETION_MODE_LENIENT = 2;      // Continue with partial results
  COMPLETION_MODE_BEST_EFFORT = 3;  // Complete with whatever succeeds
  COMPLETION_MODE_QUORUM = 4;       // Need min_success_ratio
}

enum FailureAction {
  FAILURE_ACTION_UNSPECIFIED = 0;  // Default; treat as FAIL
  FAILURE_ACTION_FAIL = 1;         // Propagate failure up
  FAILURE_ACTION_SKIP = 2;         // Skip, continue with siblings
  FAILURE_ACTION_RETRY = 3;        // Retry up to max_retries
  FAILURE_ACTION_DEFER = 4;        // Create claim check, continue
}

8.4. PROCESS Action

Table 14
Mode Description
TRANSFORM 1:1 entity transformation
REHYDRATE N:1 merge of siblings from dehydration
AGGREGATE N:1 with reduction function
PASSTHROUGH Metadata-only modification

8.5. SINK Action

Table 15
Type Description
INDEX Search engine integration (Elasticsearch, Solr, etc.)
STORAGE Blob storage persistence (Object stores, Cloud storage)
NOTIFICATION Webhook/messaging triggers

9. Rehydration Semantics

9.1. Entity ID Lifecycle and Cursor

Entity IDs are managed using a cursor-based circular recycling scheme within the 32-bit ID space. The ID space is divided into three logical regions relative to the current cursor and last_assigned pointers:

Table 16
Region ID Range Description
Recyclable IDs behind cursor Resolved entities; IDs may be reused
In-flight cursor to last_assigned Active entities (PENDING, PROCESSING, etc.)
Free Beyond last_assigned Available for new entity assignment

The window size is computed as (last_assigned - cursor) mod 0xFFFFFFFD. If window_size >= max_window, the sender MUST apply backpressure and stop assigning new IDs until the cursor advances.

Rules:

  1. new_id = (last_assigned + 1) % 0xFFFFFFFD

  2. If new_id == 0, new_id = 1 (skip reserved NULL_ENTITY)

  3. If (new_id - cursor) % 0xFFFFFFFD >= max_window -> STOP, apply backpressure

  4. On COMPLETE/FAILED: mark resolved; if entity_id == cursor, advance cursor

  5. IDs behind cursor are implicitly recyclable

9.2. Assembly Manifest

Each Assembly Manifest entry tracks:

message AssemblyManifestEntry {
  uint32 parent_id = 1;
  uint32 scope_id = 2;           // Layer 1
  repeated uint32 children_ids = 3;
  repeated EntityStatus children_status = 4;
  CompletionPolicy policy = 5;   // Layer 2
  uint64 created_at = 6;
  ResolutionState state = 7;
}

enum ResolutionState {
  RESOLUTION_STATE_UNSPECIFIED = 0;
  RESOLUTION_STATE_ACTIVE = 1;
  RESOLUTION_STATE_RESOLVED = 2;
  RESOLUTION_STATE_PARTIAL = 3;      // Some children failed/skipped
  RESOLUTION_STATE_FAILED = 4;
}

9.3. Checkpoint Blocking

A checkpoint is satisfied when:

  1. All entities in the checkpoint scope with IDs less than checkpoint_entity_id (considering circular wrap) have reached terminal state.

  2. All Assembly Manifest entries within the checkpoint scope have been resolved.

  3. All nested checkpoints within the checkpoint scope have been satisfied.

CheckpointFrame (Section 6.6 / Appendix A) carries both:

message CheckpointFrame {
  string checkpoint_id = 1;
  uint64 sequence_number = 2;
  uint32 checkpoint_entity_id = 3;
  uint32 scope_id = 4;
  uint32 flags = 5;
  uint32 timeout_ms = 6;
}
  • checkpoint_id: an opaque identifier for logging and correlation.

  • checkpoint_entity_id: the numeric ordering key used for barrier evaluation.

Implementations MUST use checkpoint_entity_id (not checkpoint_id) when evaluating Condition 1.

For circular comparison in Condition 1, implementations MUST use the same modulo ordering as cursor management. Define MAX = 0xFFFFFFFD and:

is_before(a, b) = ((b - a + MAX) % MAX) < (MAX / 2)

An entity ID a is considered "less than checkpoint_entity_id b" iff is_before(a, b) is true.

9.4. Scope Digest Propagation (Layer 1)

When a scope completes, the endpoint MUST compute a Scope Digest and propagate it to the parent scope via a SCOPE_DIGEST frame (Section 6.3).

The Merkle root in the Scope Digest is computed as follows:

  1. For each entity in the scope, ordered by Entity ID (ascending), construct a 5-octet leaf value by concatenating:

    • The 4-octet big-endian Entity ID.

    • A 1-octet status field where the lower 4 bits contain the Stat code (Section 6.2.2) and the upper 4 bits are zero.

  2. Compute SHA-256 over each 5-octet leaf to produce leaf hashes.

  3. Build a binary Merkle tree by repeatedly hashing pairs of sibling nodes: SHA-256(left || right). If the number of nodes at any level is odd, the last node is promoted to the next level without hashing.

  4. The root of this tree is the merkle_root value in the SCOPE_DIGEST frame.

This construction is deterministic: any two implementations processing the same set of entity statuses MUST produce the same Merkle root.

9.5. Rehydration Readiness Tracking

Implementations MUST track Assembly Manifest resolution order using a mechanism that provides O(1) insertion and amortized O(log n) minimum extraction. The tracking mechanism MUST support efficient decrease-key operations to handle out-of-order status updates.

Implementations MAY choose any data structure that satisfies these complexity requirements. See the companion document REFERENCE_IMPLEMENTATION.md for a recommended approach using a Fibonacci heap.

9.6. Stopping Point Validation (Layer 2)

When yielding or deferring, include validation:

message StoppingPointValidation {
  bytes state_checksum = 1;      // Hash of processing state
  uint64 bytes_processed = 2;    // Progress marker
  uint32 children_complete = 3;
  uint32 children_total = 4;
  bool is_resumable = 5;
  string checkpoint_ref = 6;
}

10. Security Considerations

10.1. Transport Security

PipeStream inherits security from QUIC [RFC9000] and TLS 1.3 [RFC8446]. All connections MUST use TLS 1.3 or later. Implementations MUST NOT provide mechanisms to disable encryption.

10.2. Entity Payload Integrity

Each Entity MUST include a SHA-256 checksum in its EntityHeader.

To support true streaming of large entities, implementations MAY begin processing an entity payload before the complete payload has been received and verified. However, the final rehydration or terminal SINK operation MUST NOT be committed until the complete payload checksum has been verified.

If a checksum verification fails, the implementation MUST: 1. Reject the entity with PIPESTREAM_INTEGRITY_ERROR (0x04). 2. Discard any partial results or temporary state associated with the entity. 3. Propagate the failure according to the Completion Policy (Section 8.3).

Implementations that require immediate consistency SHOULD buffer the entire entity and verify the checksum before initiating processing.

10.3. Resource Exhaustion

Table 17
Limit Default Description
Max scope depth 7 Prevents recursive bombs (8 levels: 0-7)
Max entities per scope 4,294,967,294 Memory bounds
Max window size 2,147,483,648 Backpressure threshold
Checkpoint timeout 30s Prevents stuck state
Claim check expiry 86400s Garbage collection

Implementations MUST enforce all resource limits listed above. Exceeding any limit MUST result in the corresponding error code (see Section 11.4). Implementations SHOULD allow operators to configure stricter limits than the defaults shown here.

10.4. Amplification Attacks

A single dehydration operation can produce an arbitrary number of child entities from a small input, creating a potential amplification vector. To mitigate this:

  1. Implementations MUST enforce the max_entities_per_scope limit negotiated during capability exchange (Section 3.4). Any dehydration that would exceed this limit MUST be rejected.

  2. Implementations MUST enforce the max_scope_depth limit. A dehydration chain deeper than this limit MUST be rejected with PIPESTREAM_DEPTH_EXCEEDED (0x07).

  3. Implementations SHOULD enforce a configurable ratio between input entity size and total child entity count. A recommended default is no more than 1,000 children per megabyte of parent payload.

  4. The backpressure mechanism (Section 9.1) provides a natural throttle: when the in-flight window fills, no new Entity IDs can be assigned until existing entities complete and the cursor advances. Implementations MUST NOT bypass backpressure for dehydration-generated entities.

10.5. Privacy Considerations

PipeStream entity headers and control stream frames carry metadata that may reveal information about the documents being processed, even when payloads are encrypted at the application layer:

  1. Document structure leakage: The number of child entities produced by dehydration, the scope depth, and the Entity ID assignment pattern may reveal the structure of the document being processed (e.g., a document that dehydrates into 50 children is likely a multi-page document). Implementations that require structural privacy SHOULD pad dehydration counts or use fixed decomposition granularity.

  2. Metadata in headers: The content_type, metadata map, and payload_length fields in EntityHeader (Section 6.7) are transmitted in cleartext within the QUIC-encrypted stream. Implementations that require metadata confidentiality beyond transport encryption SHOULD encrypt EntityHeader fields at the application layer and use an opaque content_type such as application/octet-stream.

  3. Traffic analysis: The timing and size of status frames on the Control Stream may correlate with document processing patterns. Implementations operating in privacy-sensitive environments SHOULD send status frames at fixed intervals with padding to obscure processing timing.

  4. Identifiers: The doc_id field in PipeDoc (Section 7.1) and filenames in BlobBag entries are application-layer data but may be logged by intermediate processing nodes. Implementations SHOULD provide mechanisms to redact or pseudonymize identifiers at pipeline boundaries.

10.6. Replay and Token Reuse

10.6.1. Yield Token Replay

Yield tokens (Section 6.5.1) contain opaque continuation state that enables resumption of paused entity processing. A replayed yield token could cause an entity to be processed multiple times or to resume from a stale state. To prevent this:

  1. Implementations MUST associate each yield token with a stable application context identifier (for example, a session identifier) and Entity ID. In Layer 0-only operation, this context MAY be implicit in the active transport connection. For Layer 2 resumptions that can occur across reconnects or different nodes, the context identifier MUST remain stable across transport connections. A yield token MUST be rejected if presented in a different context than the one that issued it, unless the token was explicitly transferred via a claim check.

  2. Implementations MUST invalidate a yield token after it has been consumed for resumption. A second resumption attempt with the same token MUST be rejected.

  3. The StoppingPointValidation (Section 9.6) provides integrity checking at resume time. Implementations MUST verify the state_checksum field before accepting a resumed entity. If the checksum does not match the current state, the resumption MUST be rejected and the entity MUST be reprocessed from the beginning.

10.6.2. Claim Check Replay

Claim checks (Section 6.5.2) are long-lived references that can be redeemed in different sessions. To prevent misuse:

  1. Each claim check carries an expiry_timestamp (Unix epoch microseconds). Implementations MUST reject expired claim checks.

  2. Implementations MUST track redeemed claim check IDs and reject duplicate redemptions. The tracking state MUST persist for at least the claim check expiry duration.

  3. Claim check IDs MUST be generated using a cryptographically secure random number generator to prevent guessing.

10.7. Encryption Key Management

When using FileStorageReference with encryption:

  1. Key IDs MUST reference keys in approved providers.

  2. Wrapped keys MUST use approved envelope encryption.

  3. Key rotation MUST be supported via key_id versioning.

  4. Implementations MUST NOT log key material.

  5. Implementations MUST NOT include unwrapped data encryption keys in EntityHeader metadata or Control Stream frames.

11. IANA Considerations

This document requests the creation of several new registries and one ALPN identifier registration. All registries defined in this section use the "Expert Review" policy [RFC8126] for new assignments.

11.1. ALPN Identifier Registration

Table 18
Protocol Identification Sequence Reference
PipeStream Version 1 "pipestream/1" this document

11.2. PipeStream Frame Type Registry

IANA is requested to create the "PipeStream Frame Types" registry. Values are categorized into Fixed (type-sized, no length prefix) frames in 0x50-0x7F and Variable (4-octet length prefix) frames in 0x80-0xFF. Values 0xC0-0xFF are reserved for private use.

Table 19
Value Frame Type Name Class Size Layer Reference
0x50 STATUS Fixed 12 octets base 0 Section 6.2
0x54 SCOPE_DIGEST Fixed 68 octets 1 Section 6.3
0x55 BARRIER Fixed 8 octets 1 Section 6.4
0x56-0x7F Reserved Fixed - - this document
0x80 CAPABILITIES Var Length-prefixed 0 Section 3.4
0x81 CHECKPOINT Var Length-prefixed 0 Section 9.3
0x82-0xBF Reserved Var - - this document

11.3. PipeStream Status Code Registry

IANA is requested to create the "PipeStream Status Codes" registry. Status codes are 4-bit values (0x0-0xF). Values 0xD-0xF are reserved for future Standards Action.

Table 20
Value Name Layer Description
0x0 UNSPECIFIED - Protobuf default / heartbeat
0x1 PENDING 0 Entity announced
0x2 PROCESSING 0 In progress
0x3 COMPLETE 0 Success
0x4 FAILED 0 Failed
0x5 CHECKPOINT 0 Barrier
0x6 DEHYDRATING 0 Dehydrating into children
0x7 REHYDRATING 0 Rehydrating children
0x8 YIELDED 2 Paused
0x9 DEFERRED 2 Claim check issued
0xA RETRYING 2 Retry in progress
0xB SKIPPED 2 Intentionally skipped
0xC ABANDONED 2 Timed out

11.4. PipeStream Error Code Registry

IANA is requested to create the "PipeStream Error Codes" registry. Values in the range 0x00-0x3F are assigned by Expert Review. Values in the range 0x40-0xFF are reserved for private use.

Table 21
Value Name Description
0x00 PIPESTREAM_NO_ERROR Graceful shutdown
0x01 PIPESTREAM_INTERNAL_ERROR Implementation error
0x02 PIPESTREAM_IDLE_TIMEOUT Idle timeout
0x03 PIPESTREAM_CONTROL_RESET Control stream must reset
0x04 PIPESTREAM_INTEGRITY_ERROR Checksum failed
0x05 PIPESTREAM_ENTITY_INVALID Invalid format
0x06 PIPESTREAM_ENTITY_TOO_LARGE Size exceeded
0x07 PIPESTREAM_DEPTH_EXCEEDED Scope depth exceeded
0x08 PIPESTREAM_WINDOW_EXCEEDED Window full
0x09 PIPESTREAM_SCOPE_INVALID Invalid scope
0x0A PIPESTREAM_CLAIM_EXPIRED Claim check expired
0x0B PIPESTREAM_CLAIM_NOT_FOUND Claim check not found
0x0C PIPESTREAM_LAYER_UNSUPPORTED Protocol layer not supported

11.5. URI Scheme Registration

The session-id segment identifies application context for detached or resumable resources (for example, Layer 2 yield/claim-check flows). PipeStream Layer 0 streaming semantics do not depend on this URI scheme.

pipestream-URI = "pipestream://" authority "/" session-id
                 ["/" scope-path] ["/" entity-id]

scope-path = scope-id *("." scope-id)

Examples: - pipestream://processor.example.com/a1b2c3d4 - pipestream://processor.example.com:8443/a1b2c3d4/1.42/e5f6

12. References

12.1. Normative References

[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/rfc/rfc2119>.
[RFC8126]
Cotton, M., Leiba, B., and T. Narten, "Guidelines for Writing an IANA Considerations Section in RFCs", BCP 26, RFC 8126, DOI 10.17487/RFC8126, , <https://www.rfc-editor.org/rfc/rfc8126>.
[RFC8174]
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <https://www.rfc-editor.org/rfc/rfc8174>.
[RFC8446]
Rescorla, E., "The Transport Layer Security (TLS) Protocol Version 1.3", RFC 8446, DOI 10.17487/RFC8446, , <https://www.rfc-editor.org/rfc/rfc8446>.
[RFC9000]
Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based Multiplexed and Secure Transport", RFC 9000, DOI 10.17487/RFC9000, , <https://www.rfc-editor.org/rfc/rfc9000>.

12.2. Informative References

[FIPS-180-4]
National Institute of Standards and Technology, "Secure Hash Standard (SHS)", FIPS PUB 180-4, .
[scatter-gather]
Lea, D., "The Scatter-Gather Design Pattern", DOI 10.1007/978-1-4612-1260-6, , <https://doi.org/10.1007/978-1-4612-1260-6>.

Appendix A. Protobuf Schema Reference

A.1. Protocol-Level Messages

// Copyright 2026 PipeStream AI
//
// PipeStream Protocol - IETF draft protocol for recursive entity
// streaming over QUIC. Defines the wire-format messages for Layers 0-2
// of the PipeStream architecture: core streaming, recursive scoping,
// and resilience.
//
// Edition 2023 is used for closed enums (critical for wire-protocol
// safety) and implicit field presence (distinguishing "not set" from
// zero values). In this edition, all fields have explicit presence
// by default, making the 'optional' keyword unnecessary.

edition = "2023";

package pipestream.protocol.v1;

import "google/protobuf/any.proto";

// All enums in this file are CLOSED. Unknown enum values received on
// the wire MUST be rejected. This is essential because status codes
// are encoded as 4-bit values in the status frame wire format;
// accepting unknown values could cause undefined behavior in state
// machines and cursor advancement.
option features.enum_type = CLOSED;

// Capabilities describes the feature set supported by a PipeStream
// endpoint. Exchanged during the CONNECT handshake so that both
// sides can negotiate which protocol layers and resource limits
// apply to the session.
message Capabilities {
  // Whether the endpoint supports Layer 0 (core entity streaming).
  bool layer0_core = 1;

  // Whether the endpoint supports Layer 1 (recursive scoping and
  // dehydration).
  bool layer1_recursive = 2;

  // Whether the endpoint supports Layer 2 (resilience, yield, and
  // claim-check). Requires Layer 1 support; if layer1_recursive is
  // false, this MUST be false.
  bool layer2_resilience = 3;

  // Maximum nesting depth allowed for recursive scopes.
  // Default is 7 (8 levels: 0-7).
  uint32 max_scope_depth = 4;

  // Maximum number of entities permitted within a single scope.
  uint32 max_entities_per_scope = 5;

  // Maximum flow-control window size, in number of entities.
  uint32 max_window_size = 6;
}

// EntityHeader is sent at the beginning of each entity stream to
// describe the payload that follows. It carries identity, lineage,
// content metadata, chunking information, and the completion policy
// that governs how partial failures of this entity's children are
// handled.
message EntityHeader {
  // Scope-local entity identifier.
  uint32 entity_id = 1;

  // Identifier of the parent entity.
  uint32 parent_id = 2;

  // Identifier of the scope.
  uint32 scope_id = 3;

  // Data layer (0-3).
  uint32 layer = 4;

  // MIME content type.
  string content_type = 5;

  // Length in bytes of the complete entity payload, before any
  // chunking.
  uint64 payload_length = 6;

  // SHA-256 integrity checksum.
  bytes checksum = 7;

  // Arbitrary metadata.
  map<string, string> metadata = 8;

  // Chunking information.
  ChunkInfo chunk_info = 9;

  // Resilience completion policy.
  CompletionPolicy completion_policy = 10;
}

// ChunkInfo describes how a payload is divided into chunks.
message ChunkInfo {
  // Total number of chunks.
  uint32 total_chunks = 1;

  // Zero-based chunk index.
  uint32 chunk_index = 2;

  // Byte offset within the complete payload.
  uint64 chunk_offset = 3;
}

// CompletionPolicy controls Layer 2 resilience behavior.
message CompletionPolicy {
  // Mode for evaluating completion.
  CompletionMode mode = 1;

  // Maximum retry attempts.
  uint32 max_retries = 2;

  // Delay between retries in milliseconds.
  uint32 retry_delay_ms = 3;

  // Maximum wait time in milliseconds.
  uint32 timeout_ms = 4;

  // Minimum success ratio for QUORUM mode.
  float min_success_ratio = 5;

  // Action on timeout.
  FailureAction on_timeout = 6;

  // Action on failure.
  FailureAction on_failure = 7;
}

// CompletionMode specifies completion evaluation strategies.
enum CompletionMode {
  COMPLETION_MODE_UNSPECIFIED = 0;
  COMPLETION_MODE_STRICT = 1;
  COMPLETION_MODE_LENIENT = 2;
  COMPLETION_MODE_BEST_EFFORT = 3;
  COMPLETION_MODE_QUORUM = 4;
}

// FailureAction specifies error handling behaviors.
enum FailureAction {
  FAILURE_ACTION_UNSPECIFIED = 0;
  FAILURE_ACTION_FAIL = 1;
  FAILURE_ACTION_SKIP = 2;
  FAILURE_ACTION_RETRY = 3;
  FAILURE_ACTION_DEFER = 4;
}

// EntityStatus represents the lifecycle state of an entity.
enum EntityStatus {
  ENTITY_STATUS_UNSPECIFIED = 0;
  ENTITY_STATUS_PENDING = 1;
  ENTITY_STATUS_PROCESSING = 2;
  ENTITY_STATUS_COMPLETE = 3;
  ENTITY_STATUS_FAILED = 4;
  ENTITY_STATUS_CHECKPOINT = 5;
  ENTITY_STATUS_DEHYDRATING = 6;
  ENTITY_STATUS_REHYDRATING = 7;
  ENTITY_STATUS_YIELDED = 8;
  ENTITY_STATUS_DEFERRED = 9;
  ENTITY_STATUS_RETRYING = 10;
  ENTITY_STATUS_SKIPPED = 11;
  ENTITY_STATUS_ABANDONED = 12;
}

// StatusFrame is the Protobuf representation of a status transition.
message StatusFrame {
  // Identifier of the entity.
  uint32 entity_id = 1;

  // Identifier of the scope.
  uint32 scope_id = 2;

  // Current status.
  EntityStatus status = 3;

  // Optional extension data.
  google.protobuf.Any extended_data = 4;
}

// CheckpointFrame (Protobuf, Type 0x81)
message CheckpointFrame {
  // Unique checkpoint identifier.
  string checkpoint_id = 1;

  // Monotonic sequence number.
  uint64 sequence_number = 2;

  // Numeric ordering key for barrier evaluation.
  uint32 checkpoint_entity_id = 3;

  // Scope to which this checkpoint applies.
  uint32 scope_id = 4;

  // Checkpoint flags.
  uint32 flags = 5;

  // Maximum wait time in milliseconds.
  uint32 timeout_ms = 6;
}

// AssemblyManifestEntry tracks parent-child relationships.
message AssemblyManifestEntry {
  // Identifier of the parent entity.
  uint32 parent_id = 1;

  // Scope identifier.
  uint32 scope_id = 2;

  // Ordered child identifiers.
  repeated uint32 children_ids = 3;

  // Current status of each child.
  repeated EntityStatus children_status = 4;

  // Governing completion policy.
  CompletionPolicy policy = 5;

  // Creation timestamp (Unix epoch microseconds).
  uint64 created_at = 6;

  // Current resolution state.
  ResolutionState state = 7;
}

// ResolutionState tracks Assembly Manifest completion.
enum ResolutionState {
  RESOLUTION_STATE_UNSPECIFIED = 0;
  RESOLUTION_STATE_ACTIVE = 1;
  RESOLUTION_STATE_RESOLVED = 2;
  RESOLUTION_STATE_PARTIAL = 3;
  RESOLUTION_STATE_FAILED = 4;
}

// YieldToken captures continuation state for paused entities.
message YieldToken {
  // Reason for yielding.
  YieldReason reason = 1;

  // Opaque continuation state.
  bytes continuation_state = 2;

  // Validation data for resumption.
  StoppingPointValidation validation = 3;
}

// YieldReason describes why processing was yielded.
enum YieldReason {
  YIELD_REASON_UNSPECIFIED = 0;
  YIELD_REASON_EXTERNAL_CALL = 1;
  YIELD_REASON_RATE_LIMITED = 2;
  YIELD_REASON_AWAITING_SIBLING = 3;
  YIELD_REASON_AWAITING_APPROVAL = 4;
  YIELD_REASON_RESOURCE_BUSY = 5;
}

// ClaimCheck is a Layer 2 deferred-processing reference.
message ClaimCheck {
  // Unique claim identifier.
  uint64 claim_id = 1;

  // Identifier of the deferred entity.
  uint32 entity_id = 2;

  // Scope identifier.
  uint32 scope_id = 3;

  // Expiry timestamp (Unix epoch microseconds).
  uint64 expiry_timestamp = 4;

  // Validation data.
  StoppingPointValidation validation = 5;
}

// StoppingPointValidation captures a snapshot of progress.
message StoppingPointValidation {
  // Hash of internal state.
  bytes state_checksum = 1;

  // Payload bytes consumed.
  uint64 bytes_processed = 2;

  // Completed child count.
  uint32 children_complete = 3;

  // Total expected child count.
  uint32 children_total = 4;

  // Resumption capability flag.
  bool is_resumable = 5;

  // Last passed checkpoint reference.
  string checkpoint_ref = 6;
}

// ScopeDigest is a Layer 1 summary of a completed scope.
message ScopeDigest {
  // Identifier of the scope.
  uint32 scope_id = 1;

  // Total processed count.
  uint64 entities_processed = 2;

  // Total succeeded count.
  uint64 entities_succeeded = 3;

  // Total failed count.
  uint64 entities_failed = 4;

  // Total deferred count.
  uint64 entities_deferred = 5;

  // Merkle root hash.
  bytes merkle_root = 6;
}

// PipeDoc represents the top-level document envelope for an entity.
message PipeDoc {
  // Unique document identifier.
  string doc_id = 1;

  // Identifier of the entity.
  uint32 entity_id = 2;

  // Ownership and access context.
  OwnershipContext ownership = 3;
}

// OwnershipContext defines multi-tenancy and access control for
// entities.
message OwnershipContext {
  // Entity owner identifier.
  string owner_id = 1;

  // Group identifier.
  string group_id = 2;

  // List of access scopes.
  repeated string scopes = 3;
}

// FileStorageReference provides a location for external data.
message FileStorageReference {
  // Storage provider identifier.
  string provider = 1;

  // Bucket or container name.
  string bucket = 2;

  // Object key or path.
  string key = 3;

  // Optional region hint.
  string region = 4;

  // Provider-specific attributes.
  map<string, string> attrs = 5;

  // Encryption metadata.
  EncryptionMetadata encryption = 6;
}

// EncryptionMetadata defines encryption parameters.
message EncryptionMetadata {
  // Encryption algorithm.
  string algorithm = 1;

  // Key provider identifier.
  string key_provider = 2;

  // Encryption key identifier.
  string key_id = 3;

  // Optional wrapped DEK.
  bytes wrapped_key = 4;

  // Initialization vector.
  bytes iv = 5;

  // Additional encryption context.
  map<string, string> context = 6;
}

Appendix B. Protocol Layer Capability Matrix

Table 22
Feature Layer 0 Layer 1 Layer 2
Unified status frame (96-bit base) X X X
Entity streaming X X X
PENDING/PROCESSING/COMPLETE/FAILED X X X
Checkpoint blocking X X X
Assembly Manifest X X X
Cursor-based ID recycling X X X
Scoped status fields (Scope ID, depth)   X X
Hierarchical scopes   X X
Scope digest (Merkle)   X X
Barrier (subtree sync)   X X
YIELDED status     X
DEFERRED status     X
Claim checks     X
Completion policies     X
SKIPPED/ABANDONED statuses     X

Author's Address

Kristian Rickert
PipeStream AI