ByteMux 1.0.0

dotnet add package ByteMux --version 1.0.0
                    
NuGet\Install-Package ByteMux -Version 1.0.0
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="ByteMux" Version="1.0.0" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="ByteMux" Version="1.0.0" />
                    
Directory.Packages.props
<PackageReference Include="ByteMux" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add ByteMux --version 1.0.0
                    
#r "nuget: ByteMux, 1.0.0"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package ByteMux@1.0.0
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=ByteMux&version=1.0.0
                    
Install as a Cake Addin
#tool nuget:?package=ByteMux&version=1.0.0
                    
Install as a Cake Tool

ByteMux

A low-latency outbound multiplexer for high-throughput .NET servers that need to merge targeted and broadcast traffic into per-connection TCP streams.

License: MPL-2.0 CI NuGet .NET AOT Compatible

What It Is

ByteMux is a purpose-built outbound convergence layer for workloads where many independent producers need to inject data into the same connection.

It combines:

  • targeted per-connection delivery
  • shared broadcast delivery
  • lock-free hot-path signaling
  • shard-owned draining
  • socket-oriented batching
  • near-zero steady-state allocations

ByteMux is not a generic queue library or messaging framework. It is a complete solution for one specific problem: per-connection fan-in with low-latency egress.

When to Use ByteMux

ByteMux is a good fit when:

  • multiple threads or subsystems need to inject packets into the same TCP connection
  • you need both targeted traffic and shared broadcast traffic
  • latency and GC pressure matter more than general-purpose flexibility
  • you want explicit control over scheduling, batching, and drop behavior
  • you prefer shard-owned serialized egress over ad hoc per-source flushing

ByteMux is probably not the right fit when:

  • standard async socket writes are already sufficient
  • you only need a generic queue, channel, or ring buffer
  • your workload does not have true multi-producer fan-in
  • simplicity matters more than extreme control over the hot path
  • you need a full networking framework or actor runtime

Why It Exists

ByteMux was built for a high-throughput game proxy that sits between players and a game server. The proxy forwards packets in both directions, but it also needs to inject new packets into a player's stream from outside the normal packet flow: targeted messages for individual connections and broadcasts for large groups of subscribers.

That requirement becomes difficult under real constraints. The proxy handles thousands of concurrent connections. Each connection has its own processing pipeline. Injected packets must merge into that pipeline without stalling it, with tight latency targets, minimal GC pressure, and no locks on the hot path.

We evaluated existing approaches before building ByteMux. LMAX Disruptor for .NET is excellent, but it is a concurrency primitive rather than a complete outbound convergence model. Its shape did not match our workload: many independent producers converging on one connection, not one producer writing to many consumers.

We also explored channel-based and pipeline-based designs, but for this latency budget they still left too much coordination, scheduling, and policy overhead on the hot path. We did not need a better queue in isolation. We needed a purpose-built solution for per-connection fan-in and broadcast delivery.

That is what ByteMux provides.

Key Ideas

  • One TxMux per connection
    Each live connection owns its own outbound multiplexer.

  • Two delivery paths
    ByteQueue handles targeted traffic. ByteTopic handles broadcast traffic.

  • Dirty-bit signaling
    Producers mark sources as ready using per-connection dirty bitmasks.

  • Shard-owned draining
    Dedicated shard workers drain signaled muxes and serialize socket egress.

  • Coalesced output
    Drainers batch frames before writing to the sink.

Status

ByteMux is currently source-only and intended for advanced .NET networking scenarios where the workload and latency targets justify a specialized outbound pipeline.

Non-Goals

ByteMux is not:

  • a drop-in replacement for System.Threading.Channels
  • a generic ring buffer package
  • a general-purpose async messaging framework
  • an actor runtime
  • a full socket framework

Architecture

The execution model is simple:

  1. producers enqueue or append data
  2. they signal readiness through dirty bitmasks
  3. shard workers drain signaled muxes
  4. drainers coalesce frames
  5. the sink writes the batch
 External Thread A          External Thread B         Topic Writer
       |                          |                       |
 QueuePublisher              QueuePublisher         TopicPublisher
 .NotifyDirty()              .NotifyDirty()           .Append()
       |                          |                       |
       +----------+---------------+            signals all subscribers
                  |                                       |
                  v                                       v
        +---------------------------------------------------------+
        |  TxMux  (one per connection)                            |
        |  queueDirtyMask : uint32  (1 bit per registered queue)  |
        |  topicDirtyMask : uint32  (1 bit per subscribed topic)  |
        +---------------------+-----------------------------------+
                              |  ShardWork  (gate + latch)
                              v
                    +------------------+
                    |  TxShard         |  (dedicated worker thread)
                    |  worker loop     |
                    +--------+---------+
                             |
                    TxMuxDrainer.DrainOnce()
                    (coalesce frames into batch)
                             |
                    ITxSink.TrySend()
                    (SocketSaeaTxSink -> socket)

Components

Component Responsibility
TxFabric Owns the shard pool and assigns connections to shards
TxMux Per-connection multiplexer with queue/topic dirty masks
TxShard Dedicated worker thread that drains scheduled muxes
ShardWork Gate and latch used to avoid duplicate scheduling and capture signal-during-drain races
ByteQueue SPSC ring buffer for targeted packet delivery
ByteTopic Single-writer, multi-reader append log for broadcast delivery
QueuePublisher Enqueue and signal handle for a ByteQueue
TopicPublisher Append and fan-out handle for a ByteTopic
TopicSubscription Subscription handle for a topic; disposing it unsubscribes and releases the slot
TxMuxDrainer Drain policy abstraction for batching and fairness
ITxSink Pluggable output backend such as socket, stream, or null sink
IEgressTransformer Optional in-place transform hook for encryption, framing, or compression

Hard Limits

Each TxMux can hold at most:

  • 32 registered queues

  • 32 topic subscriptions

These limits are fixed by the internal SlotTable32 used for queue and topic slot management. They cannot be expanded at runtime.

Both TryAddQueue and TrySubscribe return false when their respective table is full. They do not throw and do not emit an error automatically.

That means their return values must always be checked. Ignoring them silently drops the source registration.

If a single connection needs more than 32 independent targeted sources, or more than 32 topic subscriptions, the design should be revisited. Common approaches include multiplexing multiple upstream producers into fewer ByteQueue instances or sharing a queue across multiple producers before data reaches ByteMux.

Getting Started

ByteMux is currently source-only. Add the src/ByteMux project to your solution.

A typical setup looks like this:

  1. start a TxFabric

  2. attach one TxMux per live connection

  3. register one or more ByteQueue instances for targeted traffic

  4. register shared ByteTopic instances for broadcast traffic

  5. let shard workers drain signaled muxes into the configured sink

Targeted Queue Example

Use a queue when one connection needs targeted packets from one or more external producers.

// Create the fabric (owns shard threads)
var fabric = new TxFabric(new TxFabricOptions { ShardCount = 4 });
fabric.Start();

// Create a drainer and attach it to a connection
var sink = new SocketSaeaTxSink(socket);
var drainer = new FairCoalescedTxMuxDrainer(
    sink,
    new FairCoalescedDrainerOptions
    {
        BatchSize = 64 * 1024
    });

TxMux mux = fabric.AttachDrainer(drainer);

// Register a queue on this connection
var queue = new ByteQueue(new ByteQueueOptions
{
    ByteCapacity = 64 * 1024,
    DescCapacity = 1024
});

if (!mux.TryAddQueue(queue, out QueuePublisher? publisher))
    throw new InvalidOperationException("TxMux queue table is full.");

// From any thread: enqueue and signal
ReadOnlySpan<byte> packet = ...;
if (queue.TryEnqueue(packet))
    publisher!.NotifyDirty();

Broadcast Topic Example

Use a topic when one payload should be written once and delivered to many subscribed connections.

// Create a shared topic (one writer, many readers)
var topic = new ByteTopic(new ByteTopicOptions
{
    ByteCapacity = 256 * 1024,
    DescCapacity = 4096
});

var topicPublisher = new TopicPublisher(topic);

// Subscribe each connection's mux to the topic
if (!topicPublisher.TrySubscribe(
        mux,
        out TopicPublisher.TopicSubscription subscription))
{
    throw new InvalidOperationException("TxMux topic table is full.");
}

// From the writer thread: append once, all subscribers are signaled
ReadOnlySpan<byte> broadcastPacket = ...;
topicPublisher.Append(broadcastPacket);

Shard Selection

// Round-robin (default)
TxMux mux = fabric.AttachDrainer(drainer);

// Pin by remote endpoint (stable assignment per client IP:port)
TxMux mux = fabric.AttachDrainer(
    drainer,
    ShardHint.FromEndPoint(remoteEndPoint));

// Explicit shard index
TxMux mux = fabric.AttachDrainer(
    drainer,
    ShardHint.FromIndex(2));

Lifecycle and Disposal

ByteMux has a strict ownership hierarchy and disposal order.

Ownership model:

  • TxFabric owns the shard worker threads

  • TxFabric does not own TxMux instances

  • the caller owns each TxMux

  • TxMux must be disposed before TxFabric

Disposing TxFabric first stops the shard threads while a TxMux may still hold references to shard-owned scheduling state. Always dispose muxes before disposing the fabric.

Recommended disposal order:

  1. QueuePublisher.Dispose()

  2. TopicSubscription.Dispose()

  3. TopicPublisher.Dispose()

  4. TxMux.Dispose()

  5. TxFabric.Dispose()

Failing to dispose a QueuePublisher or TopicSubscription leaks a slot in the corresponding table, which continues to count against the per-TxMux 32-slot limit.

TopicSubscription is itself IDisposable. Unsubscribing is done by disposing the subscription, not by calling a removal method on TopicPublisher.

Teardown Example

// Teardown in reverse ownership order
publisher?.Dispose();
subscription.Dispose();
topicPublisher.Dispose();
mux.Dispose();
fabric.Dispose();

Core Concepts

Queue Path

ByteQueue is a lock-free SPSC ring buffer. One producer enqueues frames while the shard worker dequeues and drains them. The ring is backed by pooled memory, so the hot path performs no allocation. When the ring is full, TryEnqueue returns false and the caller decides how to handle backpressure.

Zero-Copy Queue Writes

TryEnqueue(ReadOnlySpan<byte>) is the simplest way to publish a frame, but it copies the caller's data into the ring.

For callers that want to construct frames directly inside the ring, or already hold data in a form such as ReadOnlySequence<byte>, ByteQueue also supports an acquire/commit path that avoids that copy.

// 1. Reserve space in the ring
if (!queue.TryAcquire(payloadLength, out FrameLease lease))
{
    // ring is full
    return;
}

// 2. Write directly into the lease
lease.CopyFrom(sourceSpan);

// 3. Commit the frame
queue.Commit(in lease);

// 4. Signal as normal
publisher!.NotifyDirty();

FrameLease is a two-segment write window over the circular ring:

  • First

  • Second

When the frame fits without wrapping the ring boundary, Second is empty and the lease is effectively single-segment. When the frame crosses the ring boundary, the payload is split across both spans.

FrameLease.CopyFrom(ReadOnlySpan<byte>) and FrameLease.CopyFrom(in ReadOnlySequence<byte>) handle that split automatically.

Prefer acquire/commit when:

  • the frame is being constructed in-place

  • the source data already exists as a ReadOnlySequence<byte>

  • eliminating the extra copy matters

For callers that cannot handle split writes, TryAcquireContiguous is available as a stricter variant that either returns a guaranteed single-segment lease or indicates that a wrap would occur.

For advanced lease access, LeasePrimitives provides wrap-aware helpers for copying and typed reads, including operations such as:

  • TryCopyOut

  • TryReadUInt16LE

  • TryReadUInt32LE

  • TryReadInt32LE

These helpers handle boundary crossing without heap allocation.

Topic Path

ByteTopic is an append-only circular log with a single writer and any number of independent readers. Each subscriber owns its own cursor. The writer publishes once, and each subscriber advances independently.

If a slow subscriber is lapped by the writer, FastForwardIfLapped advances that subscriber to the oldest retained message and records the drop count. This keeps the writer unblocked.

TopicStartMode controls where a new subscriber begins:

Value Behavior
Head Receive only future messages
OldestRetained Replay everything currently retained

Drainer Strategies

Drainer When to Use
FairCoalescedTxMuxDrainer General-purpose option with fairness controls and IEgressTransformer support
CoalescedTxMuxDrainer Simpler strategy that drains queues first, then topics, into one pooled buffer
DirectStreamTxMuxDrainer Writes directly to a Stream when batching is handled elsewhere

All drainers return a DrainResult that drives scheduling:

Result Meaning
Completed Batch sent; may cool down if no pending signals remain
Pending Async send is in flight; shard will be rescheduled on completion
Busy Sink cannot accept data yet; requeue immediately
NoWork Nothing to drain; park
Closed Sink is closed; stop draining

Egress Transforms

IEgressTransformer allows in-place packet transformation before a batch reaches the sink. Typical uses include encryption, framing, and compression.

Implement MaxOverhead to declare how many bytes the transform may prepend, and implement Transform to perform the operation. The drainer reserves space before each packet so the source and destination spans can overlap safely.

Configuration

TxFabricOptions

Property Default Description
ShardCount Environment.ProcessorCount Number of shard worker threads

TxShardOptions

Property Default Description
ThreadPriority Normal Worker thread priority
PostWorkSpinDuration (none) Spin duration after drain before parking

ByteQueueOptions

Property Default Description
ByteCapacity (required) Ring buffer size in bytes; must be a power of two
DescCapacity (required) Maximum frames in flight; must be a power of two
EnableLatencyTracking false Record enqueue timestamps for latency measurement

ByteTopicOptions

Property Default Description
ByteCapacity (required) Log size in bytes; must be a power of two
DescCapacity (required) Maximum retained messages; must be a power of two
EnableLatencyTracking false Record append timestamps for latency measurement

FairCoalescedDrainerOptions

Property Default Description
BatchSize (required) Coalescing buffer size in bytes
QueueSharePercent 50 Budget percentage reserved for queues; the remainder goes to topics
MaxMsgsPerBatch (none) Optional cap on messages per drain cycle
MaxBytesPerBatch (none) Optional cap on bytes per drain cycle
SourceBurstLimit (none) Optional cap on messages drained per source per cycle

Benchmarks and Simulation

tests/ByteMux.Simulation stress-tests the full stack under realistic load. The simulation outputs per-second P50 and P99 drain latency along with GC allocation per tick.

dotnet run --project tests/ByteMux.Simulation -c Release

The simulation project is also configured for Native AOT publish:

dotnet publish tests/ByteMux.Simulation -c Release -r win-x64

Simulation Results

The simulation suite exercises ByteMux under steady-state multi-connection load using FairCoalescedTxMuxDrainer with a NullSink.

Test environment: AMD Ryzen 5 5600X (6 cores / 12 threads), Windows 10 IoT Enterprise LTSC
Run profile: 5-minute steady-state, first 30 seconds excluded as warmup, 600 connections across 6 shards, 64-byte packets

These results are directional indicators of library overhead under controlled conditions. They are useful for regression tracking and internal comparison, not as universal end-to-end performance claims.

Scenario A — 1 queue per connection

600 connections × 1 queue × 20 msg/s = 12,000 messages/second

  • P50 drain latency: low single-digit microseconds
  • P99 drain latency: tens of microseconds, comfortably below 100 µs
Scenario B — 3 queues per connection, 3 shared topics

600 connections × 3 queues × 20 msg/s

  • 3 topics × 600 subscribers × 20 msg/s
    = 36,000 messages/second
  • P50 drain latency: single- to low double-digit microseconds
  • P99 drain latency: generally below 200 µs, with tail variation influenced by OS scheduling

tests/ByteMux.Benchmarks contains focused BenchmarkDotNet benchmarks for queue throughput, drainer fairness, and socket-versus-stream comparisons.

dotnet run --project tests/ByteMux.Benchmarks -c Release

Design Notes

Bitmask Deduplication

Interlocked.Or on a 32-bit dirty mask is the full signaling mechanism. If multiple producers signal the same connection during the same window, those signals collapse into a single drain cycle with no queued notifications and no locking.

Gate and Latch Scheduling

ShardWork ensures a TxMux appears in a shard run queue at most once at a time. Its gate prevents duplicate scheduling, and its latch captures signals that arrive while a drain is already in progress.

SPSC Queue Semantics

ByteQueue uses Volatile.Read and Volatile.Write on head and tail indices. Producer and consumer state remain separate, and only head/tail visibility crosses the boundary.

Topic Publish Ordering

ByteTopic writes the descriptor Seq field last with a Volatile.Write release. Readers observe Seq using Volatile.Read acquire semantics. A reader that sees a valid sequence number is guaranteed to also see the fully written payload.

Drop-Oldest Broadcast Semantics

Topics never block the writer. When the log wraps, slow subscribers are automatically fast-forwarded to the oldest retained message. For broadcast-style workloads, dropping older frames is often preferable to applying backpressure to the writer.

License

Mozilla Public License 2.0

SPDX-License-Identifier: MPL-2.0

Product Compatible and additional computed target framework versions.
.NET net9.0 is compatible.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed.  net10.0 was computed.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.
  • net9.0

    • No dependencies.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
1.0.0 115 4/11/2026
1.0.0-preview.1 52 4/11/2026