Skip to content

QUIC REALITY Kernel DCO Vision Roadmap

Last updated: 2026-05-10

Purpose

This document describes the long-term plan for a fully kernel-resident QUIC REALITY data channel offload for NexusNet. The goal is to reach WireGuard or OpenVPN-DCO class packet forwarding performance while preserving true wire-compatible QUIC REALITY packets on the public network.

The immediate L3 tunnel plan remains user-space first. This DCO roadmap is a future architecture target that should shape today's ABI, key-management, and packet-format decisions.

Definition

"QUIC REALITY Kernel DCO" means:

  • User space performs policy, configuration, REALITY authentication, TLS 1.3 handshake, and QUIC transport-parameter negotiation.
  • Kernel space owns the established 1-RTT QUIC DATAGRAM data path:
    • L3 packet ingress/egress.
    • UDP encapsulation.
    • QUIC short-header packet protection.
    • Header protection.
    • Packet numbers.
    • DATAGRAM frame encode/decode.
    • ACK range generation.
    • ACK processing.
    • RTT/loss/congestion/pacing.
    • MTU/GSO/GRO.
    • Per-peer stats and drop accounting.

This is still DCO, not a full TLS implementation in the kernel. Moving the REALITY/TLS handshake into the kernel is explicitly out of scope until the data channel is proven and there is a hard reason to do it.

Why This Exists

User-space QUIC REALITY has unavoidable packet costs:

  • TUN read/write crosses the user/kernel boundary.
  • UDP socket send/receive crosses the user/kernel boundary.
  • Each packet runs QUIC packet protection and header protection in user space.
  • Each packet touches packet-number, ACK, loss, congestion, pacing, and frame parsing state.
  • DATAGRAM is unreliable, but it is still part of a congestion-controlled QUIC connection.

OpenVPN DCO gains speed by moving encryption and packet handling into the kernel. WireGuard is fast because it is a kernel L3 tunnel with a compact protocol and direct skb processing. QUIC REALITY DCO needs the same kind of data path shift, but the protocol surface is larger than WireGuard and OpenVPN data channel.

Non-Negotiable Goal

The public wire image must remain valid QUIC.

That means:

  • UDP payloads are parseable as QUIC v1 packets.
  • Established data packets use legal QUIC short headers.
  • Packet protection follows QUIC-TLS key derivation and AEAD rules.
  • Header protection is applied and removed correctly.
  • DATAGRAM frames are encoded according to RFC 9221.
  • DATAGRAM frames are ack-eliciting but are not retransmitted on loss.
  • Congestion control and pacing are enforced before transmitting DATAGRAM packets.
  • ACK frames and ACK ranges are legal and useful to a standard QUIC peer.
  • Key updates follow QUIC rules and do not create timing side channels.

If we choose a faster non-QUIC kernel encapsulation later, it must be named separately, for example L3PTP-DCO, and must not be called QUIC REALITY.

Relationship to Current Roadmaps

  • l3ptp-reality.md: user-space L3 point-to-point tunnel.
  • quic-reality-datagram.md: user-space DATAGRAM performance work.
  • quic-reality.md: QUIC REALITY transport and STREAM/TCP work.
  • This document: future kernel data channel offload target.

Key Architecture Decision

Start with a Linux kernel module and a user-space control agent.

Practical language choice:

  • Kernel module hot path should be C first unless Rust-for-Linux networking APIs are mature enough for skb, UDP tunnel, crypto, netdevice, NAPI, and generic netlink work in the target kernel versions.
  • User-space control, tests, key derivation checks, and golden vectors remain Rust.
  • The ABI must be language-neutral: generic netlink messages, fixed-size binary structs, and explicit versioning.

Rust should still define the source-of-truth model in NexusNet user space, but we should not block DCO on Rust kernel API availability.

Kernel/User-Space Split

User Space Owns

  • Backend policy and config.
  • REALITY credentials and X25519 identity.
  • QUIC Initial and Handshake packet handling.
  • TLS 1.3 transcript.
  • REALITY ClientHello authentication.
  • Server certificate/certificate verify behavior.
  • QUIC transport parameter negotiation.
  • Connection admission control.
  • Peer routing policy.
  • L3 network lifecycle.
  • Kernel DCO session creation/update/deletion.
  • Recovery fallback to user-space QUIC when DCO is unavailable.

Kernel Owns

  • Virtual L3 netdevice.
  • Per-peer QUIC 1-RTT state.
  • UDP socket or UDP tunnel transmit/receive.
  • QUIC short-header packet open/seal.
  • DATAGRAM frame parse/build.
  • ACK range bookkeeping.
  • Loss and congestion state.
  • Pacing and batching.
  • MTU and fragmentation avoidance.
  • Key phase and key update state.
  • GSO/GRO and multi-queue scheduling.
  • Per-connection and per-peer counters.

Shared Contract

User space installs an established QUIC session into the kernel only after the REALITY/QUIC handshake completes.

The installed session includes:

  • Direction: client or server.
  • QUIC version.
  • Local and peer UDP addresses.
  • Local and peer connection IDs.
  • Current packet number.
  • Largest received packet number.
  • Current key phase.
  • 1-RTT TX secret or derived key/IV/header-protection key.
  • 1-RTT RX secret or derived key/IV/header-protection key.
  • Optional next RX/TX secrets for key update.
  • Negotiated cipher suite.
  • Negotiated max_datagram_frame_size.
  • max_udp_payload_size.
  • ACK delay exponent.
  • Max ACK delay.
  • Initial RTT estimate.
  • Congestion profile.
  • MTU.
  • L3 network ID.
  • Peer allowed source prefixes.
  • Route epoch.

Logical Components

text
nexus-agent userspace
  ├─ REALITY / QUIC handshake
  ├─ policy and route manager
  ├─ DCO generic-netlink client
  └─ fallback user-space dataplane

nexus_quic_dco.ko
  ├─ generic netlink family: nexus_quic_dco
  ├─ virtual netdevice: nexusq<N>
  ├─ peer/session table
  ├─ UDP tunnel socket layer
  ├─ QUIC packet protection engine
  ├─ DATAGRAM codec
  ├─ ACK/loss/congestion/pacing
  ├─ GSO/GRO and multi-queue hooks
  └─ stats/debugfs/tracepoints

Kernel ABI

Use generic netlink for control. The ABI must be versioned from day one.

Suggested commands:

text
NQD_CMD_GET_CAPS
NQD_CMD_CREATE_DEVICE
NQD_CMD_DELETE_DEVICE
NQD_CMD_ADD_SESSION
NQD_CMD_UPDATE_SESSION
NQD_CMD_DELETE_SESSION
NQD_CMD_ADD_ROUTE
NQD_CMD_DELETE_ROUTE
NQD_CMD_SET_KEYS
NQD_CMD_KEY_UPDATE
NQD_CMD_GET_STATS
NQD_CMD_FLUSH_STATS
NQD_CMD_SET_DEBUG

Suggested async events:

text
NQD_EVENT_SESSION_UP
NQD_EVENT_SESSION_DOWN
NQD_EVENT_KEY_UPDATE_NEEDED
NQD_EVENT_KEY_UPDATE_DONE
NQD_EVENT_PATH_MTU_CHANGED
NQD_EVENT_ROUTE_EPOCH_MISMATCH
NQD_EVENT_REPLAY_OR_DUPLICATE_DROP
NQD_EVENT_CONGESTION_STATE
NQD_EVENT_FATAL_ERROR

ABI rules:

  • Every message has ABI version.
  • Every config object has a generation number.
  • Secrets are write-only from user space to kernel.
  • Secrets are never returned in stats or debug dumps.
  • Unknown mandatory attributes fail closed.
  • Unknown optional attributes are ignored.
  • All object deletion is idempotent.
  • Every session is scoped to a network namespace.

Data Plane

TX Path

text
kernel IP stack
  -> nexusq netdevice xmit
  -> route/policy lookup
  -> peer/session lookup
  -> MTU check
  -> QUIC DATAGRAM frame build
  -> ACK/PING coalescing if needed
  -> packet number allocation
  -> AEAD seal
  -> header protection
  -> UDP skb build
  -> pacing/qdisc/GSO
  -> physical NIC

TX invariants:

  • One IP packet maps to one QUIC DATAGRAM frame unless future segmentation is explicitly negotiated.
  • Oversized IP packets are dropped with counters and, where feasible, ICMP Packet Too Big is generated.
  • DATAGRAM payload loss is not retransmitted.
  • Congestion controller can drop or delay DATAGRAM frames.
  • ACK-only packets are not mixed into L3 payload accounting.

RX Path

text
physical NIC
  -> UDP receive
  -> QUIC CID/session lookup
  -> header protection removal
  -> packet number reconstruction
  -> AEAD open
  -> duplicate/replay check
  -> ACK range update
  -> frame parse
  -> DATAGRAM context decode
  -> IP source-prefix validation
  -> skb inject into nexusq receive path
  -> kernel IP stack

RX invariants:

  • Decryption, packet-number recovery, and duplicate handling must not leak useful timing side channels.
  • Invalid packets are dropped silently unless rate-limited debug is enabled.
  • Peer source-prefix policy is enforced before injecting skb into the host stack.
  • ACK ranges are updated even when the DATAGRAM payload is dropped after QUIC validation.

QUIC Subset

Initial DCO subset:

  • QUIC v1.
  • 1-RTT short-header packets only.
  • DATAGRAM frames.
  • ACK frames.
  • PING frames.
  • CONNECTION_CLOSE frames.
  • PATH_CHALLENGE/PATH_RESPONSE only after first path-management phase.
  • Key phase bit and key update.

Explicitly excluded from DCO v1:

  • Initial packet generation.
  • Handshake packets.
  • CRYPTO frames.
  • STREAM frames.
  • 0-RTT.
  • Retry.
  • Connection migration.
  • Multipath.
  • HTTP/3.

Future DCO versions can add:

  • Multiple connection IDs.
  • Connection migration.
  • Multipath striping.
  • CONNECT-IP HTTP Datagram context mapping.
  • Optional STREAM offload for TCP-over-QUIC if the DATAGRAM path is stable.

Crypto Design

The kernel must implement QUIC packet protection, not TLS records.

Required primitives:

  • AES-128-GCM and AES-256-GCM AEAD.
  • ChaCha20-Poly1305 AEAD if negotiated.
  • AES-based header protection.
  • ChaCha20-based header protection.
  • HKDF expand for key update if secrets, not raw keys, are installed.
  • Constant-time key selection where feasible.
  • Explicit key zeroization on session teardown.

Key installation options:

  1. Install derived packet keys:
    • Simpler kernel.
    • User space handles HKDF.
    • Key update requires user-space coordination or preinstalled next keys.
  2. Install QUIC traffic secrets:
    • Kernel can derive next keys.
    • Larger kernel crypto surface.
    • Better for autonomous key update.

Preferred path:

  • DCO v1 installs derived current keys and precomputed next receive keys.
  • DCO v2 installs traffic secrets and lets kernel perform QUIC key update.

ACK, Loss, and Congestion

DATAGRAM frames are not retransmitted, but their packets still drive QUIC recovery and congestion behavior.

Kernel state must include:

  • Received packet-number ranges.
  • ACK delay timer.
  • Largest acknowledged.
  • In-flight packet metadata.
  • RTT sample state.
  • Loss detection.
  • PTO timer.
  • Congestion window.
  • Pacing rate.
  • ECN accounting if enabled.

Congestion profiles:

  • fixed_rate_test: only for lab tests.
  • cubic_like: conservative default.
  • bbr_like: long-term high-throughput target.

Acceptance rule:

  • DCO must never bypass congestion control for public-network mode.
  • Lab-only modes must require explicit config and clear metrics labels.

Netdevice Model

Expose one virtual L3 device per L3 network:

text
nexusq0
nexusq1
...

The device behaves like a routed L3 tunnel:

  • ndo_start_xmit handles outbound IP skb.
  • RX path calls netif_rx/NAPI equivalent for inbound IP skb.
  • Multi-queue is enabled for per-flow parallelism.
  • Queue selection uses a stable flow hash.
  • Device MTU reflects negotiated QUIC DATAGRAM budget.
  • Device stats include QUIC and L3 counters.

Why netdevice instead of user-space TUN:

  • No TUN user-space copy.
  • Kernel IP stack can route directly into DCO.
  • qdisc, pacing, GSO/GRO, and CPU queueing become available.
  • It matches WireGuard/OpenVPN-DCO style deployment.

Performance Target

Local target:

  • Single tunnel: at least 3 Gbit/s full-duplex on commodity x86.
  • Aggregate across multiple tunnels/queues: 10 Gbit/s class target.
  • Packet size: MTU-safe 1100-1200 byte QUIC DATAGRAM payloads.
  • Loss under local controlled 3G test: below 0.5%.
  • CPU: materially lower than user-space QUIC REALITY at the same bitrate.

WAN target:

  • Correctness and congestion safety first.
  • No dependence on fragmented UDP.
  • Stable pacing under loss/reorder.
  • Predictable behavior under NAT rebinding once migration is implemented.

Observability

Required counters:

  • L3 TX/RX packets and bytes.
  • QUIC TX/RX packets and bytes.
  • DATAGRAM frames TX/RX.
  • ACK frames TX/RX.
  • AEAD seal/open failures.
  • Header protection failures.
  • Packet-number decode failures.
  • Duplicate packet drops.
  • Replay-window drops.
  • Source-prefix policy drops.
  • MTU drops.
  • Congestion drops.
  • Pacing delays.
  • Key update count.
  • Key update failures.
  • Route epoch drops.
  • Queue drops by queue.
  • GSO packets/segments.
  • GRO coalesced packets.

Required debug surfaces:

  • Generic netlink stats.
  • ip -s link show nexusq0.
  • Optional debugfs per-session dump without secrets.
  • Tracepoints for packet drop reasons.
  • Rate-limited kernel logs for control-plane errors.

Security Boundaries

  • Only CAP_NET_ADMIN can manage DCO devices and sessions.
  • Sessions are scoped to Linux network namespaces.
  • Secrets are never readable after installation.
  • Kernel logs must never print keys, secrets, private CIDs, or raw payloads.
  • User-space control plane must bind peer identity to session install.
  • Kernel validates peer endpoint tuple unless migration is enabled.
  • Anti-spoofing applies before skb injection.
  • DCO must fail closed on malformed config.
  • DCO must support immediate session deletion for credential revocation.

Phased Plan

Phase 0: Specification and Golden Vectors

Deliverables:

  • Write formal kernel/user ABI.
  • Define session object and key object layouts.
  • Generate QUIC short-header packet protection golden vectors from crates/quic-core.
  • Generate DATAGRAM frame encode/decode vectors.
  • Generate ACK range vectors.
  • Define DCO capability negotiation in agent.

Acceptance:

  • User-space tests can validate every binary struct and packet vector before a kernel module exists.

Phase 1: Kernel Skeleton

Deliverables:

  • nexus_quic_dco.ko builds out-of-tree.
  • Generic netlink family registers.
  • GET_CAPS works.
  • Virtual netdevice create/delete works.
  • Basic stats are visible.
  • No crypto or packet forwarding yet.

Acceptance:

  • Load/unload cycles do not leak netdevices or netlink families.
  • Network namespace create/delete cycles are clean.

Phase 2: Session Table and Control ABI

Deliverables:

  • Add session create/update/delete.
  • Add CID lookup table.
  • Add peer endpoint tuple validation.
  • Add route epoch and peer prefix policy.
  • Add write-only key install path.
  • Add stats per session.

Acceptance:

  • User space can install an established dummy session and retrieve stats.
  • Duplicate CIDs and invalid key lengths fail closed.

Phase 3: Kernel QUIC Packet Protection

Deliverables:

  • Implement 1-RTT short-header seal/open.
  • Implement header protection apply/remove.
  • Implement packet number encode/decode.
  • Support AES-GCM first.
  • Add ChaCha20-Poly1305 after AES-GCM passes.
  • Add KUnit tests using golden vectors.

Acceptance:

  • Kernel output matches user-space quic-core packet vectors byte-for-byte.
  • Kernel can open user-space generated packets and user space can open kernel generated packets.

Phase 4: DATAGRAM Data Path

Deliverables:

  • TX path: IP skb to QUIC DATAGRAM packet.
  • RX path: QUIC DATAGRAM packet to IP skb.
  • MTU enforcement.
  • Source-prefix validation.
  • Basic ACK generation.
  • Basic ACK processing.

Acceptance:

  • Two network namespaces can ping over DCO-generated QUIC DATAGRAM packets.
  • Wireshark/tcpdump sees valid QUIC packets, not a custom UDP format.

Phase 5: ACK, Loss, RTT, and Congestion

Deliverables:

  • ACK range storage.
  • ACK decimation for DATAGRAM-only traffic.
  • RTT sampling.
  • Packet-threshold and time-threshold loss.
  • PTO timer.
  • Congestion window.
  • Pacing.
  • Lab-only fixed-rate mode.

Acceptance:

  • tc netem loss/reorder tests stay stable.
  • DATAGRAM loss is counted but not retransmitted.
  • Public-network mode never sends above congestion allowance.

Phase 6: Performance Work

Deliverables:

  • Multi-queue netdevice.
  • Per-flow queue selection.
  • Per-CPU session hot state.
  • GSO transmit.
  • GRO receive.
  • Batched crypto.
  • Lock contention profiling.
  • CPU affinity guidance.

Acceptance:

  • Local single-tunnel test reaches 3 Gbit/s.
  • Aggregate multi-queue test reaches 10 Gbit/s class on suitable hardware.
  • CPU per Gbit is substantially below user-space QUIC REALITY.

Phase 7: Key Update and Lifetime Management

Deliverables:

  • Key phase tracking.
  • Preinstalled next-key receive path.
  • Kernel/user key update events.
  • Safe old-key discard timers.
  • Session idle timeout.
  • Immediate revocation/delete.

Acceptance:

  • Long-running iperf survives key updates.
  • Forced credential revocation stops traffic quickly.

Phase 8: Path Management

Deliverables:

  • PATH_CHALLENGE/PATH_RESPONSE.
  • NAT rebinding detection.
  • Optional endpoint migration policy.
  • PMTU probing.
  • ICMP Packet Too Big handling.

Acceptance:

  • Controlled endpoint tuple changes work only when migration is enabled.
  • MTU changes do not create persistent blackholes.

Phase 9: Agent Integration

Deliverables:

  • Agent detects DCO capability.
  • Agent performs QUIC REALITY handshake in user space.
  • Agent installs established sessions into kernel.
  • Agent falls back to user-space L3PTP if DCO fails or is unavailable.
  • Backend and frontend expose DCO status.

Acceptance:

  • Same L3 network can run in userspace, dco-preferred, or dco-required mode.
  • dco-required fails closed if the module or capabilities are missing.

Phase 10: Interop and Wire Compatibility

Deliverables:

  • Packet captures decode as QUIC v1.
  • Kernel DCO packets are accepted by the user-space QUIC REALITY peer.
  • User-space QUIC REALITY packets are accepted by kernel DCO.
  • DATAGRAM transport parameter limits are enforced.
  • ACK/loss behavior conforms to RFC expectations.

Acceptance:

  • The DCO path can be disabled and the same peer falls back to user-space QUIC REALITY without changing public wire protocol.

Phase 11: Production Hardening

Deliverables:

  • KASAN/KMSAN/KCSAN test passes.
  • Syzkaller target or fuzz harness for netlink and packet parser.
  • Fault injection for alloc failures.
  • Route cleanup on module unload.
  • Upgrade/downgrade compatibility.
  • Kernel version compatibility matrix.
  • Secure crash/debug procedure.

Acceptance:

  • Module can survive repeated create/delete, namespace churn, malformed packets, and bad netlink input without kernel warnings.

Test Matrix

Correctness:

  • KUnit packet protection tests.
  • KUnit DATAGRAM parser tests.
  • KUnit ACK range tests.
  • User-space/kernel golden vector cross-tests.
  • Network namespace ping tests.
  • iperf3 TCP and UDP over L3PTP DCO.
  • tc netem loss/reorder/jitter tests.
  • MTU blackhole tests.
  • Key update tests.
  • Session revocation tests.

Performance:

  • Single queue vs multi-queue.
  • AES-GCM vs ChaCha20-Poly1305.
  • GSO on/off.
  • GRO on/off.
  • 1G, 3G, 10G targets.
  • 1100B, 1150B, 1200-budget-clamped payloads.
  • Small-packet pps stress.
  • Long-run thermal/CPU stability.

Security:

  • Malformed short-header packets.
  • Forged CID.
  • Forged key phase.
  • Replay and duplicate packets.
  • Prefix spoofing.
  • Netlink fuzzing.
  • Namespace isolation.
  • Key zeroization checks.

Major Risks

Protocol complexity:

  • QUIC recovery and congestion are much larger than WireGuard's data path.
  • ACK correctness matters even though DATAGRAM payloads are not retransmitted.

Kernel safety:

  • skb lifetime, crypto API usage, timers, and netlink parsing are hard to get right.
  • A kernel crash is a host crash.

Wire compatibility:

  • Any shortcut that changes packet protection, packet numbers, ACK behavior, or congestion semantics can make the result non-QUIC.

Maintenance:

  • Linux networking APIs change across kernels.
  • Rust-for-Linux availability may not cover the hot path we need.
  • Out-of-tree modules carry packaging and support cost.

Security:

  • Installing QUIC secrets into the kernel expands the blast radius of a kernel vulnerability.
  • Debug tooling must be designed so secrets cannot leak accidentally.

Go / No-Go Gates

Gate 1: ABI confidence

  • User-space L3PTP is stable.
  • Kernel ABI is reviewed.
  • Golden vectors exist.

Gate 2: Kernel skeleton confidence

  • Load/unload and netns lifecycle are clean.
  • Generic netlink fuzzing has no obvious crashes.

Gate 3: Wire compatibility

  • Kernel packet protection matches user-space vectors.
  • Captures decode as legal QUIC.

Gate 4: Performance

  • Kernel DCO beats user-space QUIC REALITY by a meaningful margin.
  • At least 3 Gbit/s local L3 tunnel is demonstrated.

Gate 5: Safety

  • KASAN/KCSAN/KMSAN/fuzz results are clean enough for controlled deployment.

Do not start with a kernel module immediately. First make today's user-space work DCO-ready:

  1. Keep L3PTP packet format separate from QUIC framing.
  2. Add explicit session/key export structs in user space.
  3. Generate packet protection golden vectors.
  4. Add dco_mode = disabled | preferred | required to the future L3 config.
  5. Add capability negotiation to the agent model.
  6. Keep route/device lifecycle separate from user-space TUN internals.

These steps are useful even if DCO is delayed, and they prevent the current L3 implementation from painting us into a user-space-only corner.

References

  • OpenVPN Data Channel Offload: kernel-space encryption and packet handling to reduce copies and context switches.
  • Linux kTLS documentation: kTLS handles the TLS record layer and does not replace the TLS handshake; QUIC does not use TLS records for 1-RTT data.
  • RFC 9000: QUIC transport, packet numbers, packet spaces, ACK behavior.
  • RFC 9001: QUIC packet protection, header protection, key update.
  • RFC 9221: QUIC DATAGRAM frames, unreliable delivery, ACK/congestion behavior, MTU limitations.

NexusNet documentation