ElBruno.LocalLLMs 0.6.1

There is a newer version of this package available.
See the version list below for details.
dotnet add package ElBruno.LocalLLMs --version 0.6.1
                    
NuGet\Install-Package ElBruno.LocalLLMs -Version 0.6.1
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="ElBruno.LocalLLMs" Version="0.6.1" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="ElBruno.LocalLLMs" Version="0.6.1" />
                    
Directory.Packages.props
<PackageReference Include="ElBruno.LocalLLMs" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add ElBruno.LocalLLMs --version 0.6.1
                    
#r "nuget: ElBruno.LocalLLMs, 0.6.1"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package ElBruno.LocalLLMs@0.6.1
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=ElBruno.LocalLLMs&version=0.6.1
                    
Install as a Cake Addin
#tool nuget:?package=ElBruno.LocalLLMs&version=0.6.1
                    
Install as a Cake Tool

ElBruno.LocalLLMs

NuGet NuGet Downloads Build Status License: MIT HuggingFace .NET GitHub stars Twitter Follow

Run local LLMs in .NET through IChatClient 🧠

Run local LLMs in .NET through IChatClient — the same interface you'd use for Azure OpenAI, Ollama, or any other provider. Powered by ONNX Runtime GenAI.

Features

  • 🔌 IChatClient implementation — seamless integration with Microsoft.Extensions.AI
  • 📦 Automatic model download — models are fetched from HuggingFace on first use
  • 🚀 Zero friction — works out of the box with sensible defaults (Phi-3.5 mini)
  • 🖥️ Multi-hardware — CPU, CUDA, and DirectML execution providers
  • 💉 DI-friendly — register with AddLocalLLMs() in ASP.NET Core
  • 🔄 Streaming — token-by-token streaming via GetStreamingResponseAsync
  • 📊 Multi-model — switch between Phi-3.5, Phi-4, Qwen2.5, Llama 3.2, and more
  • 🎯 Fine-tuned models — pre-trained Qwen2.5 variants for tool calling and RAG (guide)

Installation

dotnet add package ElBruno.LocalLLMs

Then add one runtime package depending on your target hardware:

# 🖥️ CPU (works everywhere — required for CPU-only apps):
dotnet add package Microsoft.ML.OnnxRuntimeGenAI

# 🟢 NVIDIA GPU (CUDA):
dotnet add package Microsoft.ML.OnnxRuntimeGenAI.Cuda

# 🔵 Any Windows GPU — AMD, Intel, NVIDIA (DirectML):
dotnet add package Microsoft.ML.OnnxRuntimeGenAI.DirectML

⚠️ Add exactly one runtime package. Do not reference both Microsoft.ML.OnnxRuntimeGenAI and Microsoft.ML.OnnxRuntimeGenAI.Cuda simultaneously — the native binaries conflict and GPU support will silently fail.

🚀 The library defaults to ExecutionProvider.Auto — it tries GPU first and falls back to CPU automatically. No code changes needed.

Quick Start

using ElBruno.LocalLLMs;
using Microsoft.Extensions.AI;

// Create a local chat client (downloads Phi-3.5 mini on first run)
using var client = await LocalChatClient.CreateAsync();

var response = await client.GetResponseAsync([
    new(ChatRole.User, "What is the capital of France?")
]);

Console.WriteLine(response.Text);

Streaming

using ElBruno.LocalLLMs;
using Microsoft.Extensions.AI;

using var client = await LocalChatClient.CreateAsync(new LocalLLMsOptions
{
    Model = KnownModels.Phi35MiniInstruct
});

await foreach (var update in client.GetStreamingResponseAsync([
    new(ChatRole.System, "You are a helpful assistant."),
    new(ChatRole.User, "Explain quantum computing in simple terms.")
]))
{
    Console.Write(update.Text);
}

Model Metadata

Inspect model capabilities at runtime — context window size, model name, and vocabulary:

using var client = await LocalChatClient.CreateAsync();

var metadata = client.ModelInfo;
Console.WriteLine($"Model:          {metadata?.ModelName}");
Console.WriteLine($"Context window: {metadata?.MaxSequenceLength}");
Console.WriteLine($"Vocab size:     {metadata?.VocabSize}");

This is useful for prompt-length validation, adaptive chunking, and model selection logic.

Dependency Injection

builder.Services.AddLocalLLMs(options =>
{
    options.Model = KnownModels.Phi35MiniInstruct;
    options.ExecutionProvider = ExecutionProvider.DirectML;
});

// Inject IChatClient anywhere
public class MyService(IChatClient chatClient) { ... }

Supported Models

Tier Model Parameters ONNX ID
⚪ Tiny TinyLlama-1.1B-Chat 1.1B ✅ Native tinyllama-1.1b-chat
⚪ Tiny SmolLM2-1.7B-Instruct 1.7B ✅ Native smollm2-1.7b-instruct
⚪ Tiny Qwen2.5-0.5B-Instruct 0.5B ✅ Native qwen2.5-0.5b-instruct
⚪ Tiny Qwen2.5-1.5B-Instruct 1.5B ✅ Native qwen2.5-1.5b-instruct
⚪ Tiny Gemma-2B-IT 2B ✅ Native gemma-2b-it
⚪ Tiny StableLM-2-1.6B-Chat 1.6B 🔄 Convert stablelm-2-1.6b-chat
🟢 Small Phi-3.5 mini instruct 3.8B ✅ Native phi-3.5-mini-instruct
🟢 Small Qwen2.5-3B-Instruct 3B ✅ Native qwen2.5-3b-instruct
🟢 Small Llama-3.2-3B-Instruct 3B ✅ Native llama-3.2-3b-instruct
🟢 Small Gemma-2-2B-IT 2B ✅ Native gemma-2-2b-it
🟡 Medium Qwen2.5-7B-Instruct 7B ✅ Native qwen2.5-7b-instruct
🟡 Medium Llama-3.1-8B-Instruct 8B ✅ Native llama-3.1-8b-instruct
🟡 Medium Mistral-7B-Instruct-v0.3 7B ✅ Native mistral-7b-instruct-v0.3
🟡 Medium Gemma-2-9B-IT 9B ✅ Native gemma-2-9b-it
🟡 Medium Phi-4 14B ✅ Native phi-4
🟡 Medium DeepSeek-R1-Distill-Qwen-14B 14B ✅ Native deepseek-r1-distill-qwen-14b
🟡 Medium Mistral-Small-24B-Instruct 24B ✅ Native mistral-small-24b-instruct
🔴 Large Qwen2.5-14B-Instruct 14B ✅ Native qwen2.5-14b-instruct
🔴 Large Qwen2.5-32B-Instruct 32B ✅ Native qwen2.5-32b-instruct
🔴 Large Llama-3.3-70B-Instruct 70B ✅ ONNX llama-3.3-70b-instruct
🔴 Large Mixtral-8x7B-Instruct-v0.1 8x7B 🔄 Convert mixtral-8x7b-instruct-v0.1
🔴 Large DeepSeek-R1-Distill-Llama-70B 70B 🔄 Convert deepseek-r1-distill-llama-70b
🔴 Large Command-R (35B) 35B 🔄 Convert command-r-35b

Fine-Tuned Models

Pre-trained variants optimized for specific tasks. A fine-tuned 0.5B model often matches or exceeds a base 1.5B on its specialized task.

Model Size Task HuggingFace ID
Qwen2.5-0.5B-ToolCalling ~1 GB Tool/function calling elbruno/Qwen2.5-0.5B-LocalLLMs-ToolCalling
Qwen2.5-0.5B-RAG ~1 GB RAG with citations elbruno/Qwen2.5-0.5B-LocalLLMs-RAG
Qwen2.5-0.5B-Instruct ~1 GB General-purpose elbruno/Qwen2.5-0.5B-LocalLLMs-Instruct

See the Supported Models Guide for detailed model cards, performance benchmarks, and selection guidance.

Samples

Sample Description
HelloChat Minimal console chat
StreamingChat Token-by-token streaming
MultiModelChat Switch models at runtime
DependencyInjection ASP.NET Core DI registration
ToolCallingAgent Function calling and tool use
FineTunedToolCalling Fine-tuned model for improved tool calling
RagChatbot RAG pipeline with document retrieval
ConsoleAppDemo Interactive console application

Requirements

  • .NET 8.0 or .NET 10.0
  • CPU (default), NVIDIA GPU (CUDA), or Windows GPU (DirectML)
  • ~2-8 GB disk space per model (depending on size and quantization)

Building from Source

git clone https://github.com/elbruno/ElBruno.LocalLLMs.git
cd ElBruno.LocalLLMs
dotnet restore ElBruno.LocalLLMs.slnx
dotnet build ElBruno.LocalLLMs.slnx
dotnet test ElBruno.LocalLLMs.slnx --framework net8.0

Documentation

🤝 Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

📄 License

This project is licensed under the MIT License — see the LICENSE file for details.

👋 About the Author

Made with ❤️ by Bruno Capuano (ElBruno)

🙏 Acknowledgments

Product Compatible and additional computed target framework versions.
.NET net8.0 is compatible.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed.  net9.0 was computed.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed.  net10.0 was computed.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages (3)

Showing the top 3 NuGet packages that depend on ElBruno.LocalLLMs:

Package Downloads
ElBruno.ModelContextProtocol.MCPToolRouter

Semantic routing for Model Context Protocol (MCP) tool definitions using local embeddings. Indexes MCP tools and returns the most relevant tools for a given prompt via vector search.

ElBruno.LocalLLMs.Rag

RAG (Retrieval-Augmented Generation) pipeline for ElBruno.LocalLLMs. Provides document chunking, embedding storage, and semantic search.

ElBruno.LocalLLMs.BitNet

BitNet 1.58-bit LLM inference using bitnet.cpp. IChatClient implementation for Microsoft.Extensions.AI.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
0.16.0 197 4/17/2026
0.15.0 138 4/16/2026
0.11.0 144 4/4/2026
0.9.0 116 4/4/2026
0.7.2 172 3/28/2026
0.7.1 130 3/28/2026
0.7.0 119 3/28/2026
0.6.1 125 3/28/2026
0.6.0 120 3/28/2026
0.5.0 161 3/28/2026
0.1.8 108 3/19/2026
0.1.7 105 3/18/2026
0.1.6 103 3/18/2026
0.1.0 107 3/18/2026