ByteMux 1.0.0

.NET 9.0

dotnet add package ByteMux --version 1.0.0

NuGet\Install-Package ByteMux -Version 1.0.0

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="ByteMux" Version="1.0.0" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

<PackageVersion Include="ByteMux" Version="1.0.0" />
                    

                            Directory.Packages.props

<PackageReference Include="ByteMux" />
                    

                            Project file

For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.

paket add ByteMux --version 1.0.0

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: ByteMux, 1.0.0"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

#:package ByteMux@1.0.0

#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.

#addin nuget:?package=ByteMux&version=1.0.0
                    

                            Install as a Cake Addin

#tool nuget:?package=ByteMux&version=1.0.0
                    

                            Install as a Cake Tool

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

ByteMux

A low-latency outbound multiplexer for high-throughput .NET servers that need to merge targeted and broadcast traffic into per-connection TCP streams.

What It Is

ByteMux is a purpose-built outbound convergence layer for workloads where many independent producers need to inject data into the same connection.

It combines:

targeted per-connection delivery
shared broadcast delivery
lock-free hot-path signaling
shard-owned draining
socket-oriented batching
near-zero steady-state allocations

ByteMux is not a generic queue library or messaging framework. It is a complete solution for one specific problem: per-connection fan-in with low-latency egress.

When to Use ByteMux

ByteMux is a good fit when:

multiple threads or subsystems need to inject packets into the same TCP connection
you need both targeted traffic and shared broadcast traffic
latency and GC pressure matter more than general-purpose flexibility
you want explicit control over scheduling, batching, and drop behavior
you prefer shard-owned serialized egress over ad hoc per-source flushing

ByteMux is probably not the right fit when:

standard async socket writes are already sufficient
you only need a generic queue, channel, or ring buffer
your workload does not have true multi-producer fan-in
simplicity matters more than extreme control over the hot path
you need a full networking framework or actor runtime

Why It Exists

ByteMux was built for a high-throughput game proxy that sits between players and a game server. The proxy forwards packets in both directions, but it also needs to inject new packets into a player's stream from outside the normal packet flow: targeted messages for individual connections and broadcasts for large groups of subscribers.

That requirement becomes difficult under real constraints. The proxy handles thousands of concurrent connections. Each connection has its own processing pipeline. Injected packets must merge into that pipeline without stalling it, with tight latency targets, minimal GC pressure, and no locks on the hot path.

We evaluated existing approaches before building ByteMux. LMAX Disruptor for .NET is excellent, but it is a concurrency primitive rather than a complete outbound convergence model. Its shape did not match our workload: many independent producers converging on one connection, not one producer writing to many consumers.

We also explored channel-based and pipeline-based designs, but for this latency budget they still left too much coordination, scheduling, and policy overhead on the hot path. We did not need a better queue in isolation. We needed a purpose-built solution for per-connection fan-in and broadcast delivery.

That is what ByteMux provides.

Key Ideas

One TxMux per connection
Each live connection owns its own outbound multiplexer.
Two delivery paths
ByteQueue handles targeted traffic. ByteTopic handles broadcast traffic.
Dirty-bit signaling
Producers mark sources as ready using per-connection dirty bitmasks.
Shard-owned draining
Dedicated shard workers drain signaled muxes and serialize socket egress.
Coalesced output
Drainers batch frames before writing to the sink.

Status

ByteMux is currently source-only and intended for advanced .NET networking scenarios where the workload and latency targets justify a specialized outbound pipeline.

Non-Goals

ByteMux is not:

a drop-in replacement for System.Threading.Channels
a generic ring buffer package
a general-purpose async messaging framework
an actor runtime
a full socket framework

Architecture

The execution model is simple:

producers enqueue or append data
they signal readiness through dirty bitmasks
shard workers drain signaled muxes
drainers coalesce frames
the sink writes the batch

 External Thread A          External Thread B         Topic Writer
       |                          |                       |
 QueuePublisher              QueuePublisher         TopicPublisher
 .NotifyDirty()              .NotifyDirty()           .Append()
       |                          |                       |
       +----------+---------------+            signals all subscribers
                  |                                       |
                  v                                       v
        +---------------------------------------------------------+
        |  TxMux  (one per connection)                            |
        |  queueDirtyMask : uint32  (1 bit per registered queue)  |
        |  topicDirtyMask : uint32  (1 bit per subscribed topic)  |
        +---------------------+-----------------------------------+
                              |  ShardWork  (gate + latch)
                              v
                    +------------------+
                    |  TxShard         |  (dedicated worker thread)
                    |  worker loop     |
                    +--------+---------+
                             |
                    TxMuxDrainer.DrainOnce()
                    (coalesce frames into batch)
                             |
                    ITxSink.TrySend()
                    (SocketSaeaTxSink -> socket)

Components

Component	Responsibility
`TxFabric`	Owns the shard pool and assigns connections to shards
`TxMux`	Per-connection multiplexer with queue/topic dirty masks
`TxShard`	Dedicated worker thread that drains scheduled muxes
`ShardWork`	Gate and latch used to avoid duplicate scheduling and capture signal-during-drain races
`ByteQueue`	SPSC ring buffer for targeted packet delivery
`ByteTopic`	Single-writer, multi-reader append log for broadcast delivery
`QueuePublisher`	Enqueue and signal handle for a `ByteQueue`
`TopicPublisher`	Append and fan-out handle for a `ByteTopic`
`TopicSubscription`	Subscription handle for a topic; disposing it unsubscribes and releases the slot
`TxMuxDrainer`	Drain policy abstraction for batching and fairness
`ITxSink`	Pluggable output backend such as socket, stream, or null sink
`IEgressTransformer`	Optional in-place transform hook for encryption, framing, or compression

Hard Limits

Each TxMux can hold at most:

32 registered queues
32 topic subscriptions

These limits are fixed by the internal SlotTable32 used for queue and topic slot management. They cannot be expanded at runtime.

Both TryAddQueue and TrySubscribe return false when their respective table is full. They do not throw and do not emit an error automatically.

That means their return values must always be checked. Ignoring them silently drops the source registration.

If a single connection needs more than 32 independent targeted sources, or more than 32 topic subscriptions, the design should be revisited. Common approaches include multiplexing multiple upstream producers into fewer ByteQueue instances or sharing a queue across multiple producers before data reaches ByteMux.

Getting Started

ByteMux is currently source-only. Add the src/ByteMux project to your solution.

A typical setup looks like this:

start a TxFabric
attach one TxMux per live connection
register one or more ByteQueue instances for targeted traffic
register shared ByteTopic instances for broadcast traffic
let shard workers drain signaled muxes into the configured sink

Targeted Queue Example

Use a queue when one connection needs targeted packets from one or more external producers.

// Create the fabric (owns shard threads)
var fabric = new TxFabric(new TxFabricOptions { ShardCount = 4 });
fabric.Start();

// Create a drainer and attach it to a connection
var sink = new SocketSaeaTxSink(socket);
var drainer = new FairCoalescedTxMuxDrainer(
    sink,
    new FairCoalescedDrainerOptions
    {
        BatchSize = 64 * 1024
    });

TxMux mux = fabric.AttachDrainer(drainer);

// Register a queue on this connection
var queue = new ByteQueue(new ByteQueueOptions
{
    ByteCapacity = 64 * 1024,
    DescCapacity = 1024
});

if (!mux.TryAddQueue(queue, out QueuePublisher? publisher))
    throw new InvalidOperationException("TxMux queue table is full.");

// From any thread: enqueue and signal
ReadOnlySpan<byte> packet = ...;
if (queue.TryEnqueue(packet))
    publisher!.NotifyDirty();

Broadcast Topic Example

Use a topic when one payload should be written once and delivered to many subscribed connections.

// Create a shared topic (one writer, many readers)
var topic = new ByteTopic(new ByteTopicOptions
{
    ByteCapacity = 256 * 1024,
    DescCapacity = 4096
});

var topicPublisher = new TopicPublisher(topic);

// Subscribe each connection's mux to the topic
if (!topicPublisher.TrySubscribe(
        mux,
        out TopicPublisher.TopicSubscription subscription))
{
    throw new InvalidOperationException("TxMux topic table is full.");
}

// From the writer thread: append once, all subscribers are signaled
ReadOnlySpan<byte> broadcastPacket = ...;
topicPublisher.Append(broadcastPacket);

Shard Selection

// Round-robin (default)
TxMux mux = fabric.AttachDrainer(drainer);

// Pin by remote endpoint (stable assignment per client IP:port)
TxMux mux = fabric.AttachDrainer(
    drainer,
    ShardHint.FromEndPoint(remoteEndPoint));

// Explicit shard index
TxMux mux = fabric.AttachDrainer(
    drainer,
    ShardHint.FromIndex(2));

Lifecycle and Disposal

ByteMux has a strict ownership hierarchy and disposal order.

Ownership model:

TxFabric owns the shard worker threads
TxFabric does not own TxMux instances
the caller owns each TxMux
TxMux must be disposed before TxFabric

Disposing TxFabric first stops the shard threads while a TxMux may still hold references to shard-owned scheduling state. Always dispose muxes before disposing the fabric.

Recommended disposal order:

QueuePublisher.Dispose()
TopicSubscription.Dispose()
TopicPublisher.Dispose()
TxMux.Dispose()
TxFabric.Dispose()

Failing to dispose a QueuePublisher or TopicSubscription leaks a slot in the corresponding table, which continues to count against the per-TxMux 32-slot limit.

TopicSubscription is itself IDisposable. Unsubscribing is done by disposing the subscription, not by calling a removal method on TopicPublisher.

Teardown Example

// Teardown in reverse ownership order
publisher?.Dispose();
subscription.Dispose();
topicPublisher.Dispose();
mux.Dispose();
fabric.Dispose();

Core Concepts

Queue Path

ByteQueue is a lock-free SPSC ring buffer. One producer enqueues frames while the shard worker dequeues and drains them. The ring is backed by pooled memory, so the hot path performs no allocation. When the ring is full, TryEnqueue returns false and the caller decides how to handle backpressure.

Zero-Copy Queue Writes

TryEnqueue(ReadOnlySpan<byte>) is the simplest way to publish a frame, but it copies the caller's data into the ring.

For callers that want to construct frames directly inside the ring, or already hold data in a form such as ReadOnlySequence<byte>, ByteQueue also supports an acquire/commit path that avoids that copy.

// 1. Reserve space in the ring
if (!queue.TryAcquire(payloadLength, out FrameLease lease))
{
    // ring is full
    return;
}

// 2. Write directly into the lease
lease.CopyFrom(sourceSpan);

// 3. Commit the frame
queue.Commit(in lease);

// 4. Signal as normal
publisher!.NotifyDirty();

FrameLease is a two-segment write window over the circular ring:

First
Second

When the frame fits without wrapping the ring boundary, Second is empty and the lease is effectively single-segment. When the frame crosses the ring boundary, the payload is split across both spans.

FrameLease.CopyFrom(ReadOnlySpan<byte>) and FrameLease.CopyFrom(in ReadOnlySequence<byte>) handle that split automatically.

Prefer acquire/commit when:

the frame is being constructed in-place
the source data already exists as a ReadOnlySequence<byte>
eliminating the extra copy matters

For callers that cannot handle split writes, TryAcquireContiguous is available as a stricter variant that either returns a guaranteed single-segment lease or indicates that a wrap would occur.

For advanced lease access, LeasePrimitives provides wrap-aware helpers for copying and typed reads, including operations such as:

TryCopyOut
TryReadUInt16LE
TryReadUInt32LE
TryReadInt32LE

These helpers handle boundary crossing without heap allocation.

Topic Path

ByteTopic is an append-only circular log with a single writer and any number of independent readers. Each subscriber owns its own cursor. The writer publishes once, and each subscriber advances independently.

If a slow subscriber is lapped by the writer, FastForwardIfLapped advances that subscriber to the oldest retained message and records the drop count. This keeps the writer unblocked.

TopicStartMode controls where a new subscriber begins:

Value	Behavior
`Head`	Receive only future messages
`OldestRetained`	Replay everything currently retained

Drainer Strategies

Drainer	When to Use
`FairCoalescedTxMuxDrainer`	General-purpose option with fairness controls and `IEgressTransformer` support
`CoalescedTxMuxDrainer`	Simpler strategy that drains queues first, then topics, into one pooled buffer
`DirectStreamTxMuxDrainer`	Writes directly to a `Stream` when batching is handled elsewhere

All drainers return a DrainResult that drives scheduling:

Result	Meaning
`Completed`	Batch sent; may cool down if no pending signals remain
`Pending`	Async send is in flight; shard will be rescheduled on completion
`Busy`	Sink cannot accept data yet; requeue immediately
`NoWork`	Nothing to drain; park
`Closed`	Sink is closed; stop draining

Egress Transforms

IEgressTransformer allows in-place packet transformation before a batch reaches the sink. Typical uses include encryption, framing, and compression.

Implement MaxOverhead to declare how many bytes the transform may prepend, and implement Transform to perform the operation. The drainer reserves space before each packet so the source and destination spans can overlap safely.

Configuration

`TxFabricOptions`

Property	Default	Description
`ShardCount`	`Environment.ProcessorCount`	Number of shard worker threads

`TxShardOptions`

Property	Default	Description
`ThreadPriority`	`Normal`	Worker thread priority
`PostWorkSpinDuration`	(none)	Spin duration after drain before parking

`ByteQueueOptions`

Property	Default	Description
`ByteCapacity`	(required)	Ring buffer size in bytes; must be a power of two
`DescCapacity`	(required)	Maximum frames in flight; must be a power of two
`EnableLatencyTracking`	`false`	Record enqueue timestamps for latency measurement

`ByteTopicOptions`

Property	Default	Description
`ByteCapacity`	(required)	Log size in bytes; must be a power of two
`DescCapacity`	(required)	Maximum retained messages; must be a power of two
`EnableLatencyTracking`	`false`	Record append timestamps for latency measurement

`FairCoalescedDrainerOptions`

Property	Default	Description
`BatchSize`	(required)	Coalescing buffer size in bytes
`QueueSharePercent`	`50`	Budget percentage reserved for queues; the remainder goes to topics
`MaxMsgsPerBatch`	(none)	Optional cap on messages per drain cycle
`MaxBytesPerBatch`	(none)	Optional cap on bytes per drain cycle
`SourceBurstLimit`	(none)	Optional cap on messages drained per source per cycle

Benchmarks and Simulation

tests/ByteMux.Simulation stress-tests the full stack under realistic load. The simulation outputs per-second P50 and P99 drain latency along with GC allocation per tick.

dotnet run --project tests/ByteMux.Simulation -c Release

The simulation project is also configured for Native AOT publish:

dotnet publish tests/ByteMux.Simulation -c Release -r win-x64

Simulation Results

The simulation suite exercises ByteMux under steady-state multi-connection load using FairCoalescedTxMuxDrainer with a NullSink.

Test environment: AMD Ryzen 5 5600X (6 cores / 12 threads), Windows 10 IoT Enterprise LTSC
Run profile: 5-minute steady-state, first 30 seconds excluded as warmup, 600 connections across 6 shards, 64-byte packets

These results are directional indicators of library overhead under controlled conditions. They are useful for regression tracking and internal comparison, not as universal end-to-end performance claims.

Scenario A — 1 queue per connection

600 connections × 1 queue × 20 msg/s = 12,000 messages/second

P50 drain latency: low single-digit microseconds
P99 drain latency: tens of microseconds, comfortably below 100 µs

Scenario B — 3 queues per connection, 3 shared topics

600 connections × 3 queues × 20 msg/s

3 topics × 600 subscribers × 20 msg/s
= 36,000 messages/second

P50 drain latency: single- to low double-digit microseconds
P99 drain latency: generally below 200 µs, with tail variation influenced by OS scheduling

tests/ByteMux.Benchmarks contains focused BenchmarkDotNet benchmarks for queue throughput, drainer fairness, and socket-versus-stream comparisons.

dotnet run --project tests/ByteMux.Benchmarks -c Release

Design Notes

Bitmask Deduplication

Interlocked.Or on a 32-bit dirty mask is the full signaling mechanism. If multiple producers signal the same connection during the same window, those signals collapse into a single drain cycle with no queued notifications and no locking.

Gate and Latch Scheduling

ShardWork ensures a TxMux appears in a shard run queue at most once at a time. Its gate prevents duplicate scheduling, and its latch captures signals that arrive while a drain is already in progress.

SPSC Queue Semantics

ByteQueue uses Volatile.Read and Volatile.Write on head and tail indices. Producer and consumer state remain separate, and only head/tail visibility crosses the boundary.

Topic Publish Ordering

ByteTopic writes the descriptor Seq field last with a Volatile.Write release. Readers observe Seq using Volatile.Read acquire semantics. A reader that sees a valid sequence number is guaranteed to also see the fully written payload.

Drop-Oldest Broadcast Semantics

Topics never block the writer. When the log wraps, slow subscribers are automatically fast-forwarded to the oldest retained message. For broadcast-style workloads, dropping older frames is often preferable to applying backpressure to the writer.

License

Mozilla Public License 2.0

SPDX-License-Identifier: MPL-2.0

Product	Compatible and additional computed target framework versions.
.NET	net9.0 is compatible. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed.

Product

.NET

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

net9.0
- No dependencies.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version	Downloads	Last Updated
1.0.0	115	4/11/2026
1.0.0-preview.1	52	4/11/2026