WebFlux 0.1.9

dotnet add package WebFlux --version 0.1.9
                    
NuGet\Install-Package WebFlux -Version 0.1.9
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="WebFlux" Version="0.1.9" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="WebFlux" Version="0.1.9" />
                    
Directory.Packages.props
<PackageReference Include="WebFlux" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add WebFlux --version 0.1.9
                    
#r "nuget: WebFlux, 0.1.9"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package WebFlux@0.1.9
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=WebFlux&version=0.1.9
                    
Install as a Cake Addin
#tool nuget:?package=WebFlux&version=0.1.9
                    
Install as a Cake Tool

WebFlux

A .NET SDK for preprocessing web content for RAG (Retrieval-Augmented Generation) systems.

NuGet Version NuGet Downloads .NET Support License

Overview

WebFlux processes web content into chunks optimized for RAG systems. It handles web crawling, content extraction, and intelligent chunking with support for multiple content formats.

Installation

dotnet add package WebFlux

Quick Start

using WebFlux;
using Microsoft.Extensions.DependencyInjection;

var services = new ServiceCollection();

// Register your AI service implementations
services.AddScoped<ITextEmbeddingService, YourEmbeddingService>();
services.AddScoped<ITextCompletionService, YourLLMService>(); // Optional

// Add WebFlux
services.AddWebFlux();

var provider = services.BuildServiceProvider();
var processor = provider.GetRequiredService<IWebContentProcessor>();

// Process a website
await foreach (var result in processor.ProcessWithProgressAsync("https://example.com"))
{
    if (result.IsSuccess && result.Result != null)
    {
        foreach (var chunk in result.Result)
        {
            Console.WriteLine($"Chunk {chunk.ChunkIndex}: {chunk.Content}");
        }
    }
}

Features

  • Interface-Based Design: Bring your own AI services (OpenAI, Anthropic, Azure, local models)
  • Multiple Chunking Strategies: Auto, Smart, Semantic, Intelligent, MemoryOptimized, Paragraph, FixedSize, DomStructure
  • Content Formats: HTML, Markdown, JSON, XML, PDF
  • Web Standards: robots.txt, sitemap.xml, ai.txt, llms.txt, manifest.json
  • Streaming: Process large websites with AsyncEnumerable
  • Parallel Processing: Concurrent crawling and processing
  • Rich Metadata: Web document metadata extraction (SEO, Open Graph, Schema.org, Twitter Cards)
  • Progress Tracking: Real-time batch crawling progress with detailed statistics

Chunking Strategies

Strategy Use Case
Auto Automatically selects best strategy based on content
Smart Structured HTML documentation
Semantic General web pages and articles
Intelligent Blogs and knowledge bases
MemoryOptimized Large documents with memory constraints
Paragraph Markdown with natural boundaries
FixedSize Uniform chunks for testing
DomStructure HTML DOM structure-based chunking preserving semantic boundaries

Core Interfaces

WebFlux uses the Interface Provider pattern. You provide AI service implementations, and WebFlux handles crawling, extraction, and chunking.

Required AI Services

ITextEmbeddingService (Required)

Vector embedding generation for semantic chunking:

public interface ITextEmbeddingService
{
    Task<float[]> GetEmbeddingAsync(string text, CancellationToken cancellationToken = default);
    Task<IReadOnlyList<float[]>> GetEmbeddingsAsync(IReadOnlyList<string> texts, CancellationToken cancellationToken = default);
    int MaxTokens { get; }
    int EmbeddingDimension { get; }
}

Optional AI Services

ITextCompletionService (Optional)

LLM text completion for multimodal processing and content reconstruction:

public interface ITextCompletionService
{
    Task<string> CompleteAsync(string prompt, TextCompletionOptions? options = null, CancellationToken cancellationToken = default);
    IAsyncEnumerable<string> CompleteStreamAsync(string prompt, TextCompletionOptions? options = null, CancellationToken cancellationToken = default);
    Task<bool> IsAvailableAsync(CancellationToken cancellationToken = default);
}
IImageToTextService (Optional)

Image-to-text conversion for multimodal content:

public interface IImageToTextService
{
    Task<string> ConvertImageToTextAsync(string imageUrl, ImageToTextOptions? options = null, CancellationToken cancellationToken = default);
    Task<string> ExtractTextFromImageAsync(string imageUrl, CancellationToken cancellationToken = default);
    Task<bool> IsAvailableAsync(CancellationToken cancellationToken = default);
}

Main Processor

IWebContentProcessor

The main entry point for all web content processing:

// Single URL processing
var chunks = await processor.ProcessUrlAsync("https://example.com");

// Website crawling (streaming)
await foreach (var chunk in processor.ProcessWebsiteAsync(url, crawlOptions, chunkOptions))
{
    // Process chunk
}

// Batch processing
var results = await processor.ProcessUrlsBatchAsync(urls, chunkOptions);

Extensibility

IChunkingStrategy

Implement custom chunking strategies:

public interface IChunkingStrategy
{
    string Name { get; }
    string Description { get; }
    Task<IReadOnlyList<WebContentChunk>> ChunkAsync(ExtractedContent content, ChunkingOptions? options = null, CancellationToken cancellationToken = default);
}
IProgressReporter & IEventPublisher

Monitor processing progress and subscribe to system events:

// Progress monitoring
await foreach (var progress in progressReporter.MonitorProgressAsync(jobId))
{
    Console.WriteLine($"Progress: {progress.Progress:P0}");
}

// Event subscription
eventPublisher.Subscribe<PageProcessedEvent>(async evt => await LogEvent(evt));

For detailed implementation examples, see the Tutorial.

Configuration

var options = new CrawlOptions
{
    MaxDepth = 3,
    MaxPages = 100,
    RespectRobotsTxt = true,
    UserAgent = "MyBot/1.0"
};

var chunkOptions = new ChunkingOptions
{
    Strategy = "Auto",
    MaxChunkSize = 512,
    OverlapSize = 64
};

await foreach (var result in processor.ProcessWithProgressAsync(url, options, chunkOptions))
{
    // Handle results
}

Documentation

License

MIT License - see LICENSE file for details.

Support

Product Compatible and additional computed target framework versions.
.NET net10.0 is compatible.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages (1)

Showing the top 1 NuGet packages that depend on WebFlux:

Package Downloads
FluxIndex.SDK

FluxIndex SDK - Complete RAG infrastructure with FileFlux integration, FluxCurator preprocessing, and FluxImprover quality enhancement. AI providers are externally injectable.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
0.1.9 140 1/19/2026
0.1.8 599 12/11/2025
0.1.7 182 11/23/2025
0.1.6 247 11/14/2025
0.1.5 177 11/2/2025
0.1.4 151 10/31/2025
0.1.3 188 10/12/2025
0.1.2 172 10/2/2025
0.1.1 368 9/18/2025
0.1.0 274 9/17/2025