| Internet-Draft | PipeStream | March 2026 |
| Rickert | Expires 2 September 2026 | [Page] |
This document specifies PipeStream, a recursive entity streaming protocol designed for distributed document processing over QUIC transport. PipeStream enables the decomposition ("dehydration") of documents into constituent entities, their transmission across distributed processing nodes, and subsequent rehydration at destination endpoints.¶
The protocol employs a dual-stream architecture consisting of a data stream for entity payload transmission and a control stream for tracking entity completion status and maintaining consistency. PipeStream defines four hierarchical data layers for entity representation: BlobBag for raw binary data, SemanticLayer for annotated content with metadata, ParsedData for structured extracted information, and CustomEntity for application-specific extensions.¶
PipeStream is organized into three protocol layers: Layer 0 (Core) provides basic streaming with dehydrate/rehydrate semantics; Layer 1 (Recursive) adds hierarchical scoping and digest propagation; Layer 2 (Resilience) adds yield/resume, claim checks, and completion policies. Implementations MUST support Layer 0 and MAY support Layers 1 and 2.¶
To ensure consistency across distributed processing pipelines, PipeStream implements checkpoint blocking, whereby processing nodes MUST synchronize at defined points before proceeding. This mechanism guarantees that all constituent parts of a dehydrated document are successfully processed before rehydration operations commence.¶
This note is to be removed before publishing as an RFC.¶
Status information for this document may be found at https://datatracker.ietf.org/doc/draft-krickert-pipestream/.¶
Discussion of this document takes place on the Individual Group mailing list (mailto:kristian.rickert@pipestream.ai).¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 2 September 2026.¶
Copyright (c) 2026 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
Distributed document processing pipelines face significant challenges when handling large, complex documents that require multiple stages of transformation, analysis, and enrichment. Traditional batch processing approaches require entire documents to be loaded into memory, processed sequentially, and transmitted in their entirety between processing stages. This methodology introduces substantial latency, excessive memory consumption, and poor utilization of distributed computing resources.¶
Modern document processing workflows increasingly demand the ability to:¶
Process documents incrementally as data becomes available¶
Distribute processing load across heterogeneous worker nodes¶
Maintain consistency guarantees across parallel processing paths¶
Handle documents of arbitrary size without memory constraints¶
Support recursive decomposition where document parts may themselves be decomposed¶
Scale from single documents to collections of millions of documents¶
Current approaches based on batch processing and store-and-forward architectures are inefficient for large documents and fail to exploit the inherent parallelism available in distributed processing environments. Furthermore, existing streaming protocols do not provide the consistency semantics required for document processing where the integrity of the rehydrated output depends on the successful processing of all constituent parts.¶
PipeStream addresses these challenges by defining a streaming protocol that enables incremental processing with strong consistency guarantees. The protocol is built upon QUIC [RFC9000] transport, leveraging its native support for multiplexed streams, low-latency connection establishment, and reliable delivery semantics.¶
The fundamental innovation of PipeStream is its treatment of documents as recursive compositions of entities. A document MAY be decomposed into multiple entities, each of which MAY itself be further decomposed, creating a tree structure of processing tasks. This recursive decomposition enables fine-grained parallelism while the protocol's control stream mechanism ensures that all branches of the decomposition tree are tracked and synchronized.¶
PipeStream employs a dual-stream design:¶
Data Stream: Carries entity payloads through the processing pipeline. Entities flow through this stream with minimal buffering, enabling low-latency incremental processing.¶
Control Stream: Carries control information tracking the status of entity decomposition and rehydration. The control stream ensures that all parts of a dehydrated document are accounted for before rehydration proceeds.¶
PipeStream implements a recursive scatter-gather pattern [scatter-gather] over QUIC streams. A document is "dehydrated" (scattered) at the source into constituent entities, these entities are transmitted and processed in parallel across distributed pipeline stages, and finally the entities are "rehydrated" (gathered) at the destination to reconstitute the complete processed document. The checkpoint blocking mechanism (Section 9.3) provides barrier synchronization semantics analogous to the barrier pattern in parallel computing.¶
This approach provides several advantages:¶
Incremental Processing: Processing nodes MAY begin work on early entities before the complete document has been transmitted.¶
Parallelism: Independent entities MAY be processed concurrently across multiple worker nodes.¶
Memory Efficiency: No single node is required to hold the complete document in memory.¶
Fault Isolation: Failures in processing individual entities can be detected, reported, and potentially retried without affecting other entities.¶
Consistency: The checkpoint blocking mechanism ensures that rehydration operations proceed only when all constituent parts have been successfully processed.¶
PipeStream is organized into three protocol layers to accommodate varying deployment requirements:¶
| Protocol Layer | Name | Description |
|---|---|---|
| Layer 0 | Core | Basic streaming, dehydrate/rehydrate, checkpoint |
| Layer 1 | Recursive | Hierarchical scopes, digest propagation, barriers |
| Layer 2 | Resilience | Yield/resume, claim checks, completion policies |
Implementations MUST support Layer 0. Support for Layers 1 and 2 is OPTIONAL and negotiated during connection establishment.¶
This document specifies the PipeStream protocol including message formats, state machines, error handling, and the interaction between data and control streams. The document defines the four standard data layers but does not mandate specific processing semantics, which are left to application-layer specifications.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
The fundamental unit of data flowing through a PipeStream pipeline. An Entity represents either a complete document or a constituent part of a decomposed document. Each Entity possesses a unique identifier within its processing scope and carries payload data in one of the four defined Layer formats. Entities are immutable once created; transformations produce new Entities rather than modifying existing ones.¶
A logical unit of content submitted to a PipeStream pipeline for processing. A Document enters the pipeline as a single root Entity and MAY be decomposed into multiple Entities during processing. The Document is considered complete when its root Entity (or the rehydrated result of its decomposition) exits the pipeline.¶
A hierarchical namespace for Entity IDs. Each scope maintains its own Entity ID space, cursor, and Assembly Manifest. Scopes enable collections to contain documents, documents to contain parts, and parts to contain jobs, each with independent ID management. (Protocol Layer 1)¶
The distributed processing pattern implemented by PipeStream. A single input is "scattered" (dehydrated) into multiple parts for parallel processing, and the results are "gathered" (rehydrated) back into a single output. PipeStream extends classical scatter-gather with recursive nesting: any scattered part may itself be scattered further.¶
The operation of decomposing a document or Entity into multiple constituent Entities for parallel or distributed processing. When an Entity is dehydrated, the originating node MUST create an Assembly Manifest entry recording the identifiers of all resulting sub-entities. The dehydration operation is recursive; a sub-entity produced by dehydration MAY itself be dehydrated, creating a tree of decomposition. Dehydration transitions data from a solid state (a single stored record) to a fluid state (multiple in-flight entities).¶
The operation of reassembling multiple Entities back into a single composite Entity or Document. A rehydrate operation MUST NOT proceed until all constituent Entities listed in the corresponding Assembly Manifest entry have been received and processed (or handled according to the Completion Policy). Rehydration transitions data from a fluid state back to a solid state.¶
A document or Entity that exists as a complete, stored record -- either at rest in storage or as a single root Entity entering or exiting a pipeline. Contrast with "fluid state".¶
A document that has been decomposed into multiple in-flight Entities being processed in parallel across distributed nodes. A document is in the fluid state between dehydration and rehydration. Contrast with "solid state".¶
A synchronization point in the processing pipeline where all in-flight Entities MUST reach a consistent state before processing may continue. A checkpoint is considered "satisfied" when all Assembly Manifest entries created before the checkpoint have been resolved.¶
A synchronization point scoped to a specific subtree. Unlike checkpoints which are global, barriers block only entities dependent on a specific parent's descendants. (Protocol Layer 1)¶
The control stream that tracks Entity completion status throughout the processing pipeline. The Control Stream is transmitted on a dedicated QUIC stream parallel to the data streams.¶
A data structure within the Control Stream that tracks the relationship between a composite Entity and its constituent sub-entities produced by dehydration.¶
A pointer to the lowest unresolved Entity ID within a scope. Entity IDs behind the cursor are considered resolved and MAY be recycled. The cursor enables efficient ID space management without global coordination.¶
A temporary pause in Entity processing, typically due to external dependencies (API calls, rate limiting, human approval). A yielded Entity carries a continuation token enabling resumption without reprocessing.¶
A detached reference to a deferred Entity that can be queried or resumed independently, potentially in a different session. Claim checks enable asynchronous processing patterns and retry queues.¶
A configuration specifying how to handle partial failures during dehydration. Policies include STRICT (all must succeed), LENIENT (continue with partial results), BEST_EFFORT (complete with whatever succeeds), and QUORUM (require minimum success ratio).¶
One of four defined representations for Entity payload data:¶
A configured sequence of processing stages through which Entities flow.¶
A node in the mesh that performs operations on entities (e.g., transformation, dehydration, or rehydration).¶
A terminal stage in a pipeline where rehydrated documents are persisted or delivered to an external system.¶
A single processing step within a Pipeline.¶
A cryptographic summary (Merkle root) of all Entity statuses within a completed scope, propagated to parent scopes for efficient verification. (Protocol Layer 1)¶
PipeStream defines three protocol layers that build upon each other. This layered approach allows simple deployments to use only the core protocol while complex deployments can leverage advanced features.¶
Layer 0 provides the fundamental streaming capabilities:¶
Unified Control Frame (UCF) header (1-octet type)¶
Status frame (8-octet bit-packed frame)¶
Entity frame (header + payload)¶
Status codes: PENDING, PROCESSING, COMPLETE, FAILED, CHECKPOINT¶
Assembly Manifest for parent-child tracking¶
Cursor-based Entity ID recycling¶
Single-level dehydrate/rehydrate¶
Checkpoint blocking¶
All implementations MUST support Layer 0.¶
Layer 1 adds hierarchical processing capabilities:¶
Scoped Entity ID namespaces (collection -> document -> part -> job)¶
Explicit Depth tracking in status frames¶
SCOPE_DIGEST for Merkle-based subtree completion¶
BARRIER for subtree-scoped synchronization¶
Nested dehydration with depth tracking¶
Layer 1 is OPTIONAL. Implementations advertise Layer 1 support during capability negotiation.¶
Layer 2 adds fault tolerance and async processing:¶
YIELDED status with continuation tokens¶
DEFERRED status with claim checks¶
RETRYING, SKIPPED, ABANDONED statuses¶
Completion policies (STRICT, LENIENT, BEST_EFFORT, QUORUM)¶
Claim check query/response frames¶
Stopping point validation¶
Layer 2 is OPTIONAL and requires Layer 1. Implementations advertise Layer 2 support during capability negotiation.¶
During CONNECT, endpoints exchange supported capabilities:¶
message Capabilities {
bool layer0_core = 1; // Always true
bool layer1_recursive = 2; // Scoped IDs, digests
bool layer2_resilience = 3; // Yield, claim checks
uint32 max_scope_depth = 4; // Default: 7 (8 levels, 0-7)
uint32 max_entities_per_scope = 5; // Default: 4,294,967,294
// (2^32-2)
uint32 max_window_size = 6; // Default: 2,147,483,648 (2^31)
}
¶
Peers negotiate down to common capabilities. If Layer 2 is requested but Layer 1 is not supported, Layer 2 MUST be disabled.¶
This section provides a high-level overview of the PipeStream protocol architecture, design principles, and operational model.¶
PipeStream MUST enable true streaming document processing where entities are transmitted and processed incrementally as they become available. Implementations MUST NOT buffer complete documents before initiating transmission.¶
The protocol MUST support recursive decomposition of entities, wherein a single input entity MAY produce zero, one, or many output entities.¶
PipeStream MUST provide checkpoint blocking semantics to maintain processing consistency across distributed workers.¶
The protocol MUST maintain strict separation between the control plane (control stream) and the data plane (entities).¶
PipeStream MUST be implemented over QUIC [RFC9000] to leverage:¶
The protocol MUST support four distinct data representation layers:¶
| Layer | Name | Description |
|---|---|---|
| 0 | BlobBag | Raw binary data with metadata |
| 1 | SemanticLayer | Annotated content with embeddings |
| 2 | ParsedData | Structured extracted information |
| 3 | CustomEntity | Application-specific extension |
PipeStream uses a dual-stream architecture within a single QUIC connection between a Client (Producer) and Server (Consumer):¶
| Stream | Type | Plane | Content |
|---|---|---|---|
| Stream 0 | Bidirectional | Control | STATUS, SCOPE_DIGEST, BARRIER, CAPABILITIES, CHECKPOINT |
| Streams 2+ | Unidirectional | Data | Entity frames (Header + Payload) |
A PipeStream connection follows this lifecycle:¶
Establishment: Client initiates QUIC connection with ALPN identifier "pipestream/1"¶
Capability Exchange: Client and server exchange supported protocol layers and limits¶
Control Stream Initialization: Client opens Stream 0 as bidirectional Control Stream¶
Entity Streaming: Entities are transmitted per Sections 5 and 6¶
Termination: Connection closes via QUIC CONNECTION_CLOSE or application-level shutdown¶
PipeStream leverages the native multiplexing capabilities of QUIC [RFC9000] to provide a clean separation between control coordination and data transmission.¶
The Control Stream provides the control plane for PipeStream operations.¶
The Control Stream MUST use QUIC Stream ID 0, which per RFC 9000 is a bidirectional, client-initiated stream.¶
The Control Stream MUST be opened immediately upon connection establishment.¶
Capability negotiation (Section 3.4) MUST occur on Stream 0 before any Entity Streams are opened.¶
Stream 0 MUST NOT carry entity payload data.¶
Implementations SHOULD assign the Control Stream a high priority to ensure timely delivery of status updates.¶
The Control Stream carries small, bit-packed control frames. STATUS frames are 12 octets base. Implementations MUST ensure adequate flow control credits:¶
QUIC already provides native transport liveness signals (for example, PING and idle timeout handling). Implementations SHOULD rely on those transport mechanisms for connection liveness.¶
PipeStream heartbeat frames are OPTIONAL and are intended for application-level responsiveness checks (for example, detecting stalled processing logic even when the transport remains healthy). When used, an endpoint sends a STATUS frame with all fields set to their heartbeat values:¶
| Field | Value | Description |
|---|---|---|
| Type | 0x50 (STATUS) | |
| Stat | 0x0 (UNSPECIFIED) | Heartbeat signal |
| Entity ID | 0xFFFFFFFF | CONNECTION_LEVEL |
| Scope ID | 0x0000 | Root scope |
| Reserved | 0x0000 | MUST be zero |
When no status updates have been transmitted for KEEPALIVE_TIMEOUT (default: 30 seconds), an endpoint MAY send a heartbeat frame. If no data is received on Stream 0 for 3 * KEEPALIVE_TIMEOUT, the endpoint SHOULD first apply transport-native liveness policy; it MAY close the connection with PIPESTREAM_IDLE_TIMEOUT (0x02) when application-level inactivity policy requires it.¶
The session-id segment identifies application context for detached or resumable resources (for example, Layer 2 yield/claim-check flows). PipeStream Layer 0 streaming semantics do not depend on this URI scheme.¶
Entity Streams carry the actual document entity data.¶
Entity Streams MUST be unidirectional streams:¶
| Stream Type | Client to Server | Server to Client |
|---|---|---|
| Client-Initiated | 4n + 2 (n >= 0) | 2, 6, 10, 14, ... |
| Server-Initiated | 4n + 3 (n >= 0) | 3, 7, 11, 15, ... |
PipeStream error signaling on Stream 0 and QUIC transport signals are complementary. Endpoints SHOULD bridge them so peers receive both transport-level and protocol-level context.¶
If an Entity Stream is aborted with RESET_STREAM or STOP_SENDING, the endpoint SHOULD emit a corresponding terminal status (FAILED, ABANDONED, or policy-driven equivalent) for that entity on Stream 0.¶
If PipeStream determines a terminal entity error first (for example, checksum failure or invalid frame), the endpoint SHOULD abort the affected Entity Stream with an appropriate QUIC error and emit the corresponding PipeStream status/error context on Stream 0.¶
If Stream 0 is reset or becomes unusable, endpoints SHOULD treat this as a control-plane failure and close the connection with PIPESTREAM_CONTROL_RESET (0x03).¶
On QUIC connection termination (CONNECTION_CLOSE), entities without a previously observed terminal status MUST be treated as failed by local policy.¶
This section defines the wire formats for PipeStream frames. All multi-octet integer fields are encoded in network byte order (big-endian).¶
To support mixed content (bit-packed frames and Protobuf messages) on the Control Stream, PipeStream uses a Unified Control Frame (UCF) header.¶
Every message on Stream 0 MUST begin with a 1-octet Frame Type.¶
| Value | Frame Class | Length Encoding | Description |
|---|---|---|---|
| 0x50-0x7F | Fixed | No length prefix | Bit-packed control frames with type-defined sizes |
| 0x80-0xFF | Variable | 4-octet Length + N | Variable-size Protobuf-encoded control messages |
For Fixed frames, the receiver determines frame size from the Frame Type value. For Variable frames, the Type is followed by a 4-octet unsigned integer (big-endian) indicating the length of the Protobuf message that follows.¶
The payload length in octets, excluding the 1-octet Type and the 4-octet Length field. Receivers MUST reject lengths greater than 16,777,215 octets (16 MiB - 1) with PIPESTREAM_ENTITY_TOO_LARGE (0x06).¶
The following fixed-size frame types are defined by this document:¶
| Type | Name | Total Size | Notes |
|---|---|---|---|
| 0x50 | STATUS | 12 octets (base) | 16 octets when C=1; larger when E=1 with extension data |
| 0x54 | SCOPE_DIGEST | 68 octets | Includes 32-octet Merkle root and 64-bit counters |
| 0x55 | BARRIER | 8 octets | No variable extension |
The Status Frame reports lifecycle transitions for entities.¶
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type (0x50) |Stat(4)|E|C|D(3) | Flags (15 bits) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Entity ID (32 bits) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Scope ID (16 bits) | Reserved (16 bits) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
¶
Status code (see Section 6.2.2).¶
Extended frame flag. Additional extension data follows (Section 6.5).¶
Cursor update flag. A 4-octet cursor value follows (Section 6.2.3).¶
Explicit scope nesting depth (0-7). 0=Root. Layer 1.¶
Reserved for future use. MUST be zero when sent and MUST be ignored by receivers.¶
Unsigned integer identifying the entity.¶
Identifier for the scope to which this entity belongs.¶
Reserved for future use. MUST be zero when sent and MUST be ignored by receivers.¶
| Value | Name | Layer | Description |
|---|---|---|---|
| 0x0 | UNSPECIFIED | - | Protobuf default / heartbeat signal |
| 0x1 | PENDING | 0 | Entity announced, not yet transmitting |
| 0x2 | PROCESSING | 0 | Entity transmission in progress |
| 0x3 | COMPLETE | 0 | Entity successfully processed |
| 0x4 | FAILED | 0 | Entity processing failed |
| 0x5 | CHECKPOINT | 0 | Synchronization barrier |
| 0x6 | DEHYDRATING | 0 | Dehydrating into children |
| 0x7 | REHYDRATING | 0 | Rehydrating children |
| 0x8 | YIELDED | 2 | Paused with continuation token |
| 0x9 | DEFERRED | 2 | Detached with claim check |
| 0xA | RETRYING | 2 | Retry in progress |
| 0xB | SKIPPED | 2 | Intentionally skipped |
| 0xC | ABANDONED | 2 | Timed out |
The base STATUS frame is 12 octets. When C=1, a 4-octet cursor value follows (total 16 octets before any E=1 extension data). When E=1, additional extension bytes follow as defined in Section 6.5.¶
When C=1, a 4-octet cursor update follows the status frame:¶
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| New Cursor Value (32 bits) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
¶
The numeric value of the new cursor. Entities with IDs lower than this value (modulo circular ID rules) are considered resolved and their IDs MAY be recycled.¶
When Protocol Layer 1 is negotiated, a scope completion is summarized:¶
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type (0x54) | Flags (8) | Scope ID (16) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| Entities Processed (64 bits) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| Entities Succeeded (64 bits) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| Entities Failed (64 bits) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| Entities Deferred (64 bits) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| Merkle Root (256 bits) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
¶
Reserved for future use. MUST be zero when sent and MUST be ignored by receivers.¶
Identifier of the scope being summarized.¶
The total number of entities that were processed within the scope.¶
The number of entities that reached a terminal success state.¶
The number of entities that reached a terminal failure state.¶
The number of entities that were deferred via claim checks.¶
The SHA-256 Merkle root covering all entity statuses in the scope (see Section 9.4).¶
The SCOPE_DIGEST frame is 68 octets total. The Scope ID MUST match the 16-bit identifier defined in Section 6.2.1.¶
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type (0x55) |S| Reserved (7) | Barrier ID (16) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Parent Entity ID (32 bits) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
¶
Status (0 = waiting, 1 = released).¶
Reserved for future use. MUST be zero when sent and MUST be ignored by receivers.¶
Identifier for the barrier within the scope.¶
The identifier of the parent entity whose sub-tree is blocked by this barrier.¶
When E=1 in a status frame, extension data follows. The length of extension data is determined by the Status code.¶
If E=1 is set for a Status code that does not define an extension layout in this specification (or a negotiated extension), the receiver MUST treat the frame as malformed and fail processing with PIPESTREAM_ENTITY_INVALID (0x05).¶
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Yield Reason | Token Length (24 bits) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| Yield Token (variable) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
¶
The reason for yielding (see Section 6.5.1.1).¶
The length of the Yield Token in bytes (maximum 16,777,215).¶
The opaque continuation state.¶
| Value | Name | Description |
|---|---|---|
| 0x1 | EXTERNAL_CALL | Waiting on external service |
| 0x2 | RATE_LIMITED | Voluntary throttle |
| 0x3 | AWAITING_SIBLING | Waiting for specific sibling |
| 0x4 | AWAITING_APPROVAL | Human/workflow gate |
| 0x5 | RESOURCE_BUSY | Semaphore/lock |
| 0x0, 0x06-0xFF | Reserved | Reserved for future use |
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| Claim Check ID (64 bits) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| Expiry Timestamp (64 bits, Unix micros) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
¶
Messages in this range are preceded by a 4-octet length field.¶
| Type | Message Name | Reference |
|---|---|---|
| 0x80 | Capabilities | Section 3.4 |
| 0x81 | Checkpoint | Section 9.3 |
Entity frames carry the actual document entity data on Entity Streams.¶
+---------------------------+ | Header Length (4) | 4 octets, big-endian uint32 +---------------------------+ | | | Header (Protobuf) | Variable length | | +---------------------------+ | | | Payload | Variable length (per header) | | +---------------------------+¶
message EntityHeader {
uint32 entity_id = 1; // Scope-local identifier
uint32 parent_id = 2; // 0 for root entities
uint32 scope_id = 3; // Layer 1: scope identifier
uint32 layer = 4; // Data layer (0-3)
string content_type = 5; // MIME type
uint64 payload_length = 6;
bytes checksum = 7; // SHA-256 (32 bytes)
map<string, string> metadata = 8;
ChunkInfo chunk_info = 9;
CompletionPolicy completion_policy = 10; // Layer 2: failure handling
}
¶
PipeStream uses SHA-256 [FIPS-180-4] for payload integrity verification. The checksum MUST be exactly 32 octets.¶
Every PipeStream entity is represented as a PipeDoc message:¶
| Field | Type | Requirement | Description |
|---|---|---|---|
| doc_id | string | REQUIRED | Unique document identifier (UUID recommended) |
| entity_id | uint32 | REQUIRED | Scope-local identifier |
| ownership | OwnershipContext | OPTIONAL | Multi-tenancy tracking |
Each PipeDoc carries entity payload in one of four data layers:¶
| Layer | Name | Content |
|---|---|---|
| 0 | BlobBag | Raw binary data: original document bytes, images, attachments |
| 1 | SemanticLayer | Annotated content: text segments with vector embeddings, NLP annotations, NER, classifications |
| 2 | ParsedData | Structured extraction: key-value pairs, tables, structured fields |
| 3 | CustomEntity | Extension point: domain-specific protobuf via google.protobuf.Any
|
message FileStorageReference {
string provider = 1; // Storage provider identifier
string bucket = 2; // Bucket/container name
string key = 3; // Object key/path
string region = 4; // Optional region hint
map<string, string> attrs = 5; // Provider-specific attributes
EncryptionMetadata encryption = 6;
}
message EncryptionMetadata {
string algorithm = 1; // "AES-256-GCM", "AES-256-CBC"
string key_provider = 2; // "aws-kms", "azure-keyvault",
// "gcp-kms", "vault"
string key_id = 3; // Key ARN/URI/ID
bytes wrapped_key = 4; // Optional: client-side encrypted DEK
bytes iv = 5; // Initialization vector
map<string, string> context = 6; // Encryption context
}
¶
This section defines the protocol-level operations that PipeStream endpoints perform during a session. These operations describe the phases of a PipeStream session lifecycle, from connection establishment through entity processing to terminal consumption.¶
A PipeStream session proceeds through four sequential actions:¶
+---------------------------------------------+
| PipeStream Action Flow |
+---------------------------------------------+
|
v
+---------------------------------------------+
| CONNECT |
| (Session + Capability Negotiation) |
+---------------------------------------------+
|
v
+---------------------------------------------+
| PARSE |
| (Dehydration: 1:N possible) |
+---------------------------------------------+
|
+-------------+-------------+
v v v
+-----------+ +-----------+ +-----------+
| PROCESS | | PROCESS | | PROCESS |
| (1:1) | | (1:1) | | (N:1) |
+-----------+ +-----------+ +-----------+
| | |
+-------------+-------------+
|
v
+---------------------------------------------+
| SINK |
| (Terminal Consumption) |
+---------------------------------------------+
¶
| Phase | Action | Cardinality | Description |
|---|---|---|---|
| 1 | CONNECT | 1:1 | Session establishment and capability negotiation |
| 2 | PARSE | 1:N | Dehydration: decompose input into entities |
| 3 | PROCESS | 1:1 or N:1 | Transform, rehydrate, aggregate, or pass through entities (parallel) |
| 4 | SINK | N:1 | Terminal consumption: index, store, or notify |
The CONNECT action establishes the session with capability negotiation.¶
ALPN Protocol ID: pipestream/1¶
Immediately after QUIC handshake, peers exchange Capabilities messages on Stream 0.¶
The PARSE action performs dehydration with optional completion policy:¶
message CompletionPolicy {
CompletionMode mode = 1;
uint32 max_retries = 2; // Default: 3
uint32 retry_delay_ms = 3; // Default: 1000
uint32 timeout_ms = 4; // Default: 300000 (5 min)
float min_success_ratio = 5; // For QUORUM mode
FailureAction on_timeout = 6;
FailureAction on_failure = 7;
}
enum CompletionMode {
COMPLETION_MODE_UNSPECIFIED = 0; // Default; treat as STRICT
COMPLETION_MODE_STRICT = 1; // All children MUST complete
COMPLETION_MODE_LENIENT = 2; // Continue with partial results
COMPLETION_MODE_BEST_EFFORT = 3; // Complete with whatever succeeds
COMPLETION_MODE_QUORUM = 4; // Need min_success_ratio
}
enum FailureAction {
FAILURE_ACTION_UNSPECIFIED = 0; // Default; treat as FAIL
FAILURE_ACTION_FAIL = 1; // Propagate failure up
FAILURE_ACTION_SKIP = 2; // Skip, continue with siblings
FAILURE_ACTION_RETRY = 3; // Retry up to max_retries
FAILURE_ACTION_DEFER = 4; // Create claim check, continue
}
¶
| Mode | Description |
|---|---|
| TRANSFORM | 1:1 entity transformation |
| REHYDRATE | N:1 merge of siblings from dehydration |
| AGGREGATE | N:1 with reduction function |
| PASSTHROUGH | Metadata-only modification |
| Type | Description |
|---|---|
| INDEX | Search engine integration (Elasticsearch, Solr, etc.) |
| STORAGE | Blob storage persistence (Object stores, Cloud storage) |
| NOTIFICATION | Webhook/messaging triggers |
Entity IDs are managed using a cursor-based circular recycling scheme within the 32-bit ID space. The ID space is divided into three logical regions relative to the current cursor and last_assigned pointers:¶
| Region | ID Range | Description |
|---|---|---|
| Recyclable | IDs behind cursor
|
Resolved entities; IDs may be reused |
| In-flight |
cursor to last_assigned
|
Active entities (PENDING, PROCESSING, etc.) |
| Free | Beyond last_assigned
|
Available for new entity assignment |
The window size is computed as (last_assigned - cursor) mod 0xFFFFFFFD. If window_size >= max_window, the sender MUST apply backpressure and stop assigning new IDs until the cursor advances.¶
Rules:¶
Each Assembly Manifest entry tracks:¶
message AssemblyManifestEntry {
uint32 parent_id = 1;
uint32 scope_id = 2; // Layer 1
repeated uint32 children_ids = 3;
repeated EntityStatus children_status = 4;
CompletionPolicy policy = 5; // Layer 2
uint64 created_at = 6;
ResolutionState state = 7;
}
enum ResolutionState {
RESOLUTION_STATE_UNSPECIFIED = 0;
RESOLUTION_STATE_ACTIVE = 1;
RESOLUTION_STATE_RESOLVED = 2;
RESOLUTION_STATE_PARTIAL = 3; // Some children failed/skipped
RESOLUTION_STATE_FAILED = 4;
}
¶
A checkpoint is satisfied when:¶
All entities in the checkpoint scope with IDs less than checkpoint_entity_id (considering circular wrap) have reached terminal state.¶
All Assembly Manifest entries within the checkpoint scope have been resolved.¶
All nested checkpoints within the checkpoint scope have been satisfied.¶
CheckpointFrame (Section 6.6 / Appendix A) carries both:¶
message CheckpointFrame {
string checkpoint_id = 1;
uint64 sequence_number = 2;
uint32 checkpoint_entity_id = 3;
uint32 scope_id = 4;
uint32 flags = 5;
uint32 timeout_ms = 6;
}
¶
checkpoint_id: an opaque identifier for logging and correlation.¶
checkpoint_entity_id: the numeric ordering key used for barrier evaluation.¶
Implementations MUST use checkpoint_entity_id (not checkpoint_id) when evaluating Condition 1.¶
For circular comparison in Condition 1, implementations MUST use the same modulo ordering as cursor management. Define MAX = 0xFFFFFFFD and:¶
is_before(a, b) = ((b - a + MAX) % MAX) < (MAX / 2)¶
An entity ID a is considered "less than checkpoint_entity_id b" iff is_before(a, b) is true.¶
When a scope completes, the endpoint MUST compute a Scope Digest and propagate it to the parent scope via a SCOPE_DIGEST frame (Section 6.3).¶
The Merkle root in the Scope Digest is computed as follows:¶
For each entity in the scope, ordered by Entity ID (ascending), construct a 5-octet leaf value by concatenating:¶
Compute SHA-256 over each 5-octet leaf to produce leaf hashes.¶
Build a binary Merkle tree by repeatedly hashing pairs of sibling nodes: SHA-256(left || right). If the number of nodes at any level is odd, the last node is promoted to the next level without hashing.¶
The root of this tree is the merkle_root value in the SCOPE_DIGEST frame.¶
This construction is deterministic: any two implementations processing the same set of entity statuses MUST produce the same Merkle root.¶
Implementations MUST track Assembly Manifest resolution order using a mechanism that provides O(1) insertion and amortized O(log n) minimum extraction. The tracking mechanism MUST support efficient decrease-key operations to handle out-of-order status updates.¶
Implementations MAY choose any data structure that satisfies these complexity requirements. See the companion document REFERENCE_IMPLEMENTATION.md for a recommended approach using a Fibonacci heap.¶
When yielding or deferring, include validation:¶
message StoppingPointValidation {
bytes state_checksum = 1; // Hash of processing state
uint64 bytes_processed = 2; // Progress marker
uint32 children_complete = 3;
uint32 children_total = 4;
bool is_resumable = 5;
string checkpoint_ref = 6;
}
¶
PipeStream inherits security from QUIC [RFC9000] and TLS 1.3 [RFC8446]. All connections MUST use TLS 1.3 or later. Implementations MUST NOT provide mechanisms to disable encryption.¶
Each Entity MUST include a SHA-256 checksum in its EntityHeader.¶
To support true streaming of large entities, implementations MAY begin processing an entity payload before the complete payload has been received and verified. However, the final rehydration or terminal SINK operation MUST NOT be committed until the complete payload checksum has been verified.¶
If a checksum verification fails, the implementation MUST: 1. Reject the entity with PIPESTREAM_INTEGRITY_ERROR (0x04). 2. Discard any partial results or temporary state associated with the entity. 3. Propagate the failure according to the Completion Policy (Section 8.3).¶
Implementations that require immediate consistency SHOULD buffer the entire entity and verify the checksum before initiating processing.¶
| Limit | Default | Description |
|---|---|---|
| Max scope depth | 7 | Prevents recursive bombs (8 levels: 0-7) |
| Max entities per scope | 4,294,967,294 | Memory bounds |
| Max window size | 2,147,483,648 | Backpressure threshold |
| Checkpoint timeout | 30s | Prevents stuck state |
| Claim check expiry | 86400s | Garbage collection |
Implementations MUST enforce all resource limits listed above. Exceeding any limit MUST result in the corresponding error code (see Section 11.4). Implementations SHOULD allow operators to configure stricter limits than the defaults shown here.¶
A single dehydration operation can produce an arbitrary number of child entities from a small input, creating a potential amplification vector. To mitigate this:¶
Implementations MUST enforce the max_entities_per_scope limit negotiated during capability exchange (Section 3.4). Any dehydration that would exceed this limit MUST be rejected.¶
Implementations MUST enforce the max_scope_depth limit. A dehydration chain deeper than this limit MUST be rejected with PIPESTREAM_DEPTH_EXCEEDED (0x07).¶
Implementations SHOULD enforce a configurable ratio between input entity size and total child entity count. A recommended default is no more than 1,000 children per megabyte of parent payload.¶
The backpressure mechanism (Section 9.1) provides a natural throttle: when the in-flight window fills, no new Entity IDs can be assigned until existing entities complete and the cursor advances. Implementations MUST NOT bypass backpressure for dehydration-generated entities.¶
PipeStream entity headers and control stream frames carry metadata that may reveal information about the documents being processed, even when payloads are encrypted at the application layer:¶
Document structure leakage: The number of child entities produced by dehydration, the scope depth, and the Entity ID assignment pattern may reveal the structure of the document being processed (e.g., a document that dehydrates into 50 children is likely a multi-page document). Implementations that require structural privacy SHOULD pad dehydration counts or use fixed decomposition granularity.¶
Metadata in headers: The content_type, metadata map, and payload_length fields in EntityHeader (Section 6.7) are transmitted in cleartext within the QUIC-encrypted stream. Implementations that require metadata confidentiality beyond transport encryption SHOULD encrypt EntityHeader fields at the application layer and use an opaque content_type such as application/octet-stream.¶
Traffic analysis: The timing and size of status frames on the Control Stream may correlate with document processing patterns. Implementations operating in privacy-sensitive environments SHOULD send status frames at fixed intervals with padding to obscure processing timing.¶
Identifiers: The doc_id field in PipeDoc (Section 7.1) and filenames in BlobBag entries are application-layer data but may be logged by intermediate processing nodes. Implementations SHOULD provide mechanisms to redact or pseudonymize identifiers at pipeline boundaries.¶
Yield tokens (Section 6.5.1) contain opaque continuation state that enables resumption of paused entity processing. A replayed yield token could cause an entity to be processed multiple times or to resume from a stale state. To prevent this:¶
Implementations MUST associate each yield token with a stable application context identifier (for example, a session identifier) and Entity ID. In Layer 0-only operation, this context MAY be implicit in the active transport connection. For Layer 2 resumptions that can occur across reconnects or different nodes, the context identifier MUST remain stable across transport connections. A yield token MUST be rejected if presented in a different context than the one that issued it, unless the token was explicitly transferred via a claim check.¶
Implementations MUST invalidate a yield token after it has been consumed for resumption. A second resumption attempt with the same token MUST be rejected.¶
The StoppingPointValidation (Section 9.6) provides integrity checking at resume time. Implementations MUST verify the state_checksum field before accepting a resumed entity. If the checksum does not match the current state, the resumption MUST be rejected and the entity MUST be reprocessed from the beginning.¶
Claim checks (Section 6.5.2) are long-lived references that can be redeemed in different sessions. To prevent misuse:¶
Each claim check carries an expiry_timestamp (Unix epoch microseconds). Implementations MUST reject expired claim checks.¶
Implementations MUST track redeemed claim check IDs and reject duplicate redemptions. The tracking state MUST persist for at least the claim check expiry duration.¶
Claim check IDs MUST be generated using a cryptographically secure random number generator to prevent guessing.¶
When using FileStorageReference with encryption:¶
Key IDs MUST reference keys in approved providers.¶
Wrapped keys MUST use approved envelope encryption.¶
Key rotation MUST be supported via key_id versioning.¶
Implementations MUST NOT log key material.¶
Implementations MUST NOT include unwrapped data encryption keys in EntityHeader metadata or Control Stream frames.¶
This document requests the creation of several new registries and one ALPN identifier registration. All registries defined in this section use the "Expert Review" policy [RFC8126] for new assignments.¶
| Protocol | Identification Sequence | Reference |
|---|---|---|
| PipeStream Version 1 | "pipestream/1" | this document |
IANA is requested to create the "PipeStream Frame Types" registry. Values are categorized into Fixed (type-sized, no length prefix) frames in 0x50-0x7F and Variable (4-octet length prefix) frames in 0x80-0xFF. Values 0xC0-0xFF are reserved for private use.¶
| Value | Frame Type Name | Class | Size | Layer | Reference |
|---|---|---|---|---|---|
| 0x50 | STATUS | Fixed | 12 octets base | 0 | Section 6.2 |
| 0x54 | SCOPE_DIGEST | Fixed | 68 octets | 1 | Section 6.3 |
| 0x55 | BARRIER | Fixed | 8 octets | 1 | Section 6.4 |
| 0x56-0x7F | Reserved | Fixed | - | - | this document |
| 0x80 | CAPABILITIES | Var | Length-prefixed | 0 | Section 3.4 |
| 0x81 | CHECKPOINT | Var | Length-prefixed | 0 | Section 9.3 |
| 0x82-0xBF | Reserved | Var | - | - | this document |
IANA is requested to create the "PipeStream Status Codes" registry. Status codes are 4-bit values (0x0-0xF). Values 0xD-0xF are reserved for future Standards Action.¶
| Value | Name | Layer | Description |
|---|---|---|---|
| 0x0 | UNSPECIFIED | - | Protobuf default / heartbeat |
| 0x1 | PENDING | 0 | Entity announced |
| 0x2 | PROCESSING | 0 | In progress |
| 0x3 | COMPLETE | 0 | Success |
| 0x4 | FAILED | 0 | Failed |
| 0x5 | CHECKPOINT | 0 | Barrier |
| 0x6 | DEHYDRATING | 0 | Dehydrating into children |
| 0x7 | REHYDRATING | 0 | Rehydrating children |
| 0x8 | YIELDED | 2 | Paused |
| 0x9 | DEFERRED | 2 | Claim check issued |
| 0xA | RETRYING | 2 | Retry in progress |
| 0xB | SKIPPED | 2 | Intentionally skipped |
| 0xC | ABANDONED | 2 | Timed out |
IANA is requested to create the "PipeStream Error Codes" registry. Values in the range 0x00-0x3F are assigned by Expert Review. Values in the range 0x40-0xFF are reserved for private use.¶
| Value | Name | Description |
|---|---|---|
| 0x00 | PIPESTREAM_NO_ERROR | Graceful shutdown |
| 0x01 | PIPESTREAM_INTERNAL_ERROR | Implementation error |
| 0x02 | PIPESTREAM_IDLE_TIMEOUT | Idle timeout |
| 0x03 | PIPESTREAM_CONTROL_RESET | Control stream must reset |
| 0x04 | PIPESTREAM_INTEGRITY_ERROR | Checksum failed |
| 0x05 | PIPESTREAM_ENTITY_INVALID | Invalid format |
| 0x06 | PIPESTREAM_ENTITY_TOO_LARGE | Size exceeded |
| 0x07 | PIPESTREAM_DEPTH_EXCEEDED | Scope depth exceeded |
| 0x08 | PIPESTREAM_WINDOW_EXCEEDED | Window full |
| 0x09 | PIPESTREAM_SCOPE_INVALID | Invalid scope |
| 0x0A | PIPESTREAM_CLAIM_EXPIRED | Claim check expired |
| 0x0B | PIPESTREAM_CLAIM_NOT_FOUND | Claim check not found |
| 0x0C | PIPESTREAM_LAYER_UNSUPPORTED | Protocol layer not supported |
The session-id segment identifies application context for detached or resumable resources (for example, Layer 2 yield/claim-check flows). PipeStream Layer 0 streaming semantics do not depend on this URI scheme.¶
pipestream-URI = "pipestream://" authority "/" session-id
["/" scope-path] ["/" entity-id]
scope-path = scope-id *("." scope-id)
¶
Examples:
- pipestream://processor.example.com/a1b2c3d4
- pipestream://processor.example.com:8443/a1b2c3d4/1.42/e5f6¶
// Copyright 2026 PipeStream AI
//
// PipeStream Protocol - IETF draft protocol for recursive entity
// streaming over QUIC. Defines the wire-format messages for Layers 0-2
// of the PipeStream architecture: core streaming, recursive scoping,
// and resilience.
//
// Edition 2023 is used for closed enums (critical for wire-protocol
// safety) and implicit field presence (distinguishing "not set" from
// zero values). In this edition, all fields have explicit presence
// by default, making the 'optional' keyword unnecessary.
edition = "2023";
package pipestream.protocol.v1;
import "google/protobuf/any.proto";
// All enums in this file are CLOSED. Unknown enum values received on
// the wire MUST be rejected. This is essential because status codes
// are encoded as 4-bit values in the status frame wire format;
// accepting unknown values could cause undefined behavior in state
// machines and cursor advancement.
option features.enum_type = CLOSED;
// Capabilities describes the feature set supported by a PipeStream
// endpoint. Exchanged during the CONNECT handshake so that both
// sides can negotiate which protocol layers and resource limits
// apply to the session.
message Capabilities {
// Whether the endpoint supports Layer 0 (core entity streaming).
bool layer0_core = 1;
// Whether the endpoint supports Layer 1 (recursive scoping and
// dehydration).
bool layer1_recursive = 2;
// Whether the endpoint supports Layer 2 (resilience, yield, and
// claim-check). Requires Layer 1 support; if layer1_recursive is
// false, this MUST be false.
bool layer2_resilience = 3;
// Maximum nesting depth allowed for recursive scopes.
// Default is 7 (8 levels: 0-7).
uint32 max_scope_depth = 4;
// Maximum number of entities permitted within a single scope.
uint32 max_entities_per_scope = 5;
// Maximum flow-control window size, in number of entities.
uint32 max_window_size = 6;
}
// EntityHeader is sent at the beginning of each entity stream to
// describe the payload that follows. It carries identity, lineage,
// content metadata, chunking information, and the completion policy
// that governs how partial failures of this entity's children are
// handled.
message EntityHeader {
// Scope-local entity identifier.
uint32 entity_id = 1;
// Identifier of the parent entity.
uint32 parent_id = 2;
// Identifier of the scope.
uint32 scope_id = 3;
// Data layer (0-3).
uint32 layer = 4;
// MIME content type.
string content_type = 5;
// Length in bytes of the complete entity payload, before any
// chunking.
uint64 payload_length = 6;
// SHA-256 integrity checksum.
bytes checksum = 7;
// Arbitrary metadata.
map<string, string> metadata = 8;
// Chunking information.
ChunkInfo chunk_info = 9;
// Resilience completion policy.
CompletionPolicy completion_policy = 10;
}
// ChunkInfo describes how a payload is divided into chunks.
message ChunkInfo {
// Total number of chunks.
uint32 total_chunks = 1;
// Zero-based chunk index.
uint32 chunk_index = 2;
// Byte offset within the complete payload.
uint64 chunk_offset = 3;
}
// CompletionPolicy controls Layer 2 resilience behavior.
message CompletionPolicy {
// Mode for evaluating completion.
CompletionMode mode = 1;
// Maximum retry attempts.
uint32 max_retries = 2;
// Delay between retries in milliseconds.
uint32 retry_delay_ms = 3;
// Maximum wait time in milliseconds.
uint32 timeout_ms = 4;
// Minimum success ratio for QUORUM mode.
float min_success_ratio = 5;
// Action on timeout.
FailureAction on_timeout = 6;
// Action on failure.
FailureAction on_failure = 7;
}
// CompletionMode specifies completion evaluation strategies.
enum CompletionMode {
COMPLETION_MODE_UNSPECIFIED = 0;
COMPLETION_MODE_STRICT = 1;
COMPLETION_MODE_LENIENT = 2;
COMPLETION_MODE_BEST_EFFORT = 3;
COMPLETION_MODE_QUORUM = 4;
}
// FailureAction specifies error handling behaviors.
enum FailureAction {
FAILURE_ACTION_UNSPECIFIED = 0;
FAILURE_ACTION_FAIL = 1;
FAILURE_ACTION_SKIP = 2;
FAILURE_ACTION_RETRY = 3;
FAILURE_ACTION_DEFER = 4;
}
// EntityStatus represents the lifecycle state of an entity.
enum EntityStatus {
ENTITY_STATUS_UNSPECIFIED = 0;
ENTITY_STATUS_PENDING = 1;
ENTITY_STATUS_PROCESSING = 2;
ENTITY_STATUS_COMPLETE = 3;
ENTITY_STATUS_FAILED = 4;
ENTITY_STATUS_CHECKPOINT = 5;
ENTITY_STATUS_DEHYDRATING = 6;
ENTITY_STATUS_REHYDRATING = 7;
ENTITY_STATUS_YIELDED = 8;
ENTITY_STATUS_DEFERRED = 9;
ENTITY_STATUS_RETRYING = 10;
ENTITY_STATUS_SKIPPED = 11;
ENTITY_STATUS_ABANDONED = 12;
}
// StatusFrame is the Protobuf representation of a status transition.
message StatusFrame {
// Identifier of the entity.
uint32 entity_id = 1;
// Identifier of the scope.
uint32 scope_id = 2;
// Current status.
EntityStatus status = 3;
// Optional extension data.
google.protobuf.Any extended_data = 4;
}
// CheckpointFrame (Protobuf, Type 0x81)
message CheckpointFrame {
// Unique checkpoint identifier.
string checkpoint_id = 1;
// Monotonic sequence number.
uint64 sequence_number = 2;
// Numeric ordering key for barrier evaluation.
uint32 checkpoint_entity_id = 3;
// Scope to which this checkpoint applies.
uint32 scope_id = 4;
// Checkpoint flags.
uint32 flags = 5;
// Maximum wait time in milliseconds.
uint32 timeout_ms = 6;
}
// AssemblyManifestEntry tracks parent-child relationships.
message AssemblyManifestEntry {
// Identifier of the parent entity.
uint32 parent_id = 1;
// Scope identifier.
uint32 scope_id = 2;
// Ordered child identifiers.
repeated uint32 children_ids = 3;
// Current status of each child.
repeated EntityStatus children_status = 4;
// Governing completion policy.
CompletionPolicy policy = 5;
// Creation timestamp (Unix epoch microseconds).
uint64 created_at = 6;
// Current resolution state.
ResolutionState state = 7;
}
// ResolutionState tracks Assembly Manifest completion.
enum ResolutionState {
RESOLUTION_STATE_UNSPECIFIED = 0;
RESOLUTION_STATE_ACTIVE = 1;
RESOLUTION_STATE_RESOLVED = 2;
RESOLUTION_STATE_PARTIAL = 3;
RESOLUTION_STATE_FAILED = 4;
}
// YieldToken captures continuation state for paused entities.
message YieldToken {
// Reason for yielding.
YieldReason reason = 1;
// Opaque continuation state.
bytes continuation_state = 2;
// Validation data for resumption.
StoppingPointValidation validation = 3;
}
// YieldReason describes why processing was yielded.
enum YieldReason {
YIELD_REASON_UNSPECIFIED = 0;
YIELD_REASON_EXTERNAL_CALL = 1;
YIELD_REASON_RATE_LIMITED = 2;
YIELD_REASON_AWAITING_SIBLING = 3;
YIELD_REASON_AWAITING_APPROVAL = 4;
YIELD_REASON_RESOURCE_BUSY = 5;
}
// ClaimCheck is a Layer 2 deferred-processing reference.
message ClaimCheck {
// Unique claim identifier.
uint64 claim_id = 1;
// Identifier of the deferred entity.
uint32 entity_id = 2;
// Scope identifier.
uint32 scope_id = 3;
// Expiry timestamp (Unix epoch microseconds).
uint64 expiry_timestamp = 4;
// Validation data.
StoppingPointValidation validation = 5;
}
// StoppingPointValidation captures a snapshot of progress.
message StoppingPointValidation {
// Hash of internal state.
bytes state_checksum = 1;
// Payload bytes consumed.
uint64 bytes_processed = 2;
// Completed child count.
uint32 children_complete = 3;
// Total expected child count.
uint32 children_total = 4;
// Resumption capability flag.
bool is_resumable = 5;
// Last passed checkpoint reference.
string checkpoint_ref = 6;
}
// ScopeDigest is a Layer 1 summary of a completed scope.
message ScopeDigest {
// Identifier of the scope.
uint32 scope_id = 1;
// Total processed count.
uint64 entities_processed = 2;
// Total succeeded count.
uint64 entities_succeeded = 3;
// Total failed count.
uint64 entities_failed = 4;
// Total deferred count.
uint64 entities_deferred = 5;
// Merkle root hash.
bytes merkle_root = 6;
}
// PipeDoc represents the top-level document envelope for an entity.
message PipeDoc {
// Unique document identifier.
string doc_id = 1;
// Identifier of the entity.
uint32 entity_id = 2;
// Ownership and access context.
OwnershipContext ownership = 3;
}
// OwnershipContext defines multi-tenancy and access control for
// entities.
message OwnershipContext {
// Entity owner identifier.
string owner_id = 1;
// Group identifier.
string group_id = 2;
// List of access scopes.
repeated string scopes = 3;
}
// FileStorageReference provides a location for external data.
message FileStorageReference {
// Storage provider identifier.
string provider = 1;
// Bucket or container name.
string bucket = 2;
// Object key or path.
string key = 3;
// Optional region hint.
string region = 4;
// Provider-specific attributes.
map<string, string> attrs = 5;
// Encryption metadata.
EncryptionMetadata encryption = 6;
}
// EncryptionMetadata defines encryption parameters.
message EncryptionMetadata {
// Encryption algorithm.
string algorithm = 1;
// Key provider identifier.
string key_provider = 2;
// Encryption key identifier.
string key_id = 3;
// Optional wrapped DEK.
bytes wrapped_key = 4;
// Initialization vector.
bytes iv = 5;
// Additional encryption context.
map<string, string> context = 6;
}
¶
| Feature | Layer 0 | Layer 1 | Layer 2 |
|---|---|---|---|
| Unified status frame (96-bit base) | X | X | X |
| Entity streaming | X | X | X |
| PENDING/PROCESSING/COMPLETE/FAILED | X | X | X |
| Checkpoint blocking | X | X | X |
| Assembly Manifest | X | X | X |
| Cursor-based ID recycling | X | X | X |
| Scoped status fields (Scope ID, depth) | X | X | |
| Hierarchical scopes | X | X | |
| Scope digest (Merkle) | X | X | |
| Barrier (subtree sync) | X | X | |
| YIELDED status | X | ||
| DEFERRED status | X | ||
| Claim checks | X | ||
| Completion policies | X | ||
| SKIPPED/ABANDONED statuses | X |