whfmt.FileFormatCatalog 1.0.0

There is a newer version of this package available.
See the version list below for details.
dotnet add package whfmt.FileFormatCatalog --version 1.0.0
                    
NuGet\Install-Package whfmt.FileFormatCatalog -Version 1.0.0
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="whfmt.FileFormatCatalog" Version="1.0.0" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="whfmt.FileFormatCatalog" Version="1.0.0" />
                    
Directory.Packages.props
<PackageReference Include="whfmt.FileFormatCatalog" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add whfmt.FileFormatCatalog --version 1.0.0
                    
#r "nuget: whfmt.FileFormatCatalog, 1.0.0"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package whfmt.FileFormatCatalog@1.0.0
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=whfmt.FileFormatCatalog&version=1.0.0
                    
Install as a Cake Addin
#tool nuget:?package=whfmt.FileFormatCatalog&version=1.0.0
                    
Install as a Cake Tool

whfmt.FileFormatCatalog

675+ embedded file format and language definitions for automatic format detection and syntax highlighting.
Cross-platform net8.0 — works in any .NET 8 application.

dotnet add package whfmt.FileFormatCatalog

Full documentation: whfmt-FileFormatCatalog-guide.md — API reference, architecture, integration guides (Level 1–3), and .whfmt format specification.


About

This catalog grew out of the format detection engine inside WpfHexEditorIDE — a full-featured binary/text IDE built on WPF. Every time a file is opened, the IDE needs to know what it is, which editor to route it to, and how to syntax-highlight it. Rather than hardcoding rules, we built a declarative .whfmt format — a JSON definition file that captures magic bytes, extensions, MIME types, entropy hints, quality scores, and syntax grammars in one place.

Over time the catalog grew to 675+ definitions covering everything from Nintendo ROMs and audio codecs to machine learning models and certificate formats. The syntax grammar side expanded to 35 languages to drive the built-in code editor.

This package extracts that catalog as a standalone, cross-platform library — useful for any application that needs to identify files, route them to the right handler, or provide syntax highlighting without taking a dependency on a full IDE framework.


Quick Start

1 — Add the using directives

using WpfHexEditor.Core.Definitions;
using WpfHexEditor.Core.Contracts;

var catalog = EmbeddedFormatCatalog.Instance;

2 — Detect a format by extension

EmbeddedFormatEntry? entry = catalog.GetByExtension(".zip");
Console.WriteLine(entry?.Name);            // "ZIP Archive"
Console.WriteLine(entry?.PreferredEditor); // "structure-editor"

3 — Detect by magic bytes

// Pass at least the first 16 bytes — 512 bytes recommended
byte[] header = File.ReadAllBytes("unknown.bin")[..512];
EmbeddedFormatEntry? detected = catalog.DetectFromBytes(header);
Console.WriteLine(detected?.Name);         // e.g. "PNG Image"

4 — Detect by MIME type

EmbeddedFormatEntry? entry = catalog.GetByMimeType("image/png");

5 — Browse a category

// Enum overload — IntelliSense, no typos
IReadOnlyList<EmbeddedFormatEntry> games = catalog.GetByCategory(FormatCategory.Game);

// String overload — for dynamic/runtime scenarios
IReadOnlyList<EmbeddedFormatEntry> same = catalog.GetByCategory("Game");

6 — Extract a syntax grammar for a code editor

EmbeddedFormatEntry? cs = catalog.GetByExtension(".cs");
if (cs?.HasSyntaxDefinition == true)
{
    string? grammar = catalog.GetSyntaxDefinitionJson(cs.ResourceKey);
    // Feed grammar into your tokenizer / syntax highlighter
}

7 — Access the full JSON or schema

// Full .whfmt JSON for any entry (cached)
string json = catalog.GetJson(entry.ResourceKey);

// Embedded JSON schema — enum overload (recommended)
string? schema = catalog.GetSchemaJson(SchemaName.Whfmt);

// String overload
string? same = catalog.GetSchemaJson("whfmt");

8 — Route to the right editor

IReadOnlyList<string> editors = catalog.GetCompatibleEditorIds("report.pdf");
// ["hex-editor", "structure-editor"]

Fast Startup — PreWarm

// Call once from a background thread at startup to pre-load all JSON into cache
await Task.Run(() => EmbeddedFormatCatalog.Instance.PreWarm());

Advanced Examples

Batch folder scanner — group files by detected category

var catalog = EmbeddedFormatCatalog.Instance;

var byCategory = Directory
    .EnumerateFiles(@"C:\Downloads", "*.*", SearchOption.AllDirectories)
    .Select(path =>
    {
        // Try extension first (fast), fall back to magic bytes (accurate)
        var entry = catalog.GetByExtension(Path.GetExtension(path));
        if (entry is null)
        {
            using var fs = File.OpenRead(path);
            var header = new byte[512];
            int read = fs.Read(header, 0, header.Length);
            entry = catalog.DetectFromBytes(header.AsSpan(0, read));
        }
        return (Path: path, Category: entry?.Category ?? "Unknown", Entry: entry);
    })
    .GroupBy(f => f.Category)
    .OrderByDescending(g => g.Count());

foreach (var group in byCategory)
    Console.WriteLine($"{group.Key}: {group.Count()} file(s)");

// Pull only the game ROMs using the enum
var roms = catalog.GetByCategory(FormatCategory.Game);
Console.WriteLine($"Known game formats: {roms.Count}");

Magic-byte validator — detect extension spoofing

var catalog = EmbeddedFormatCatalog.Instance;

bool IsExtensionSpoofed(string filePath)
{
    var byExtension = catalog.GetByExtension(Path.GetExtension(filePath));
    if (byExtension is null) return false; // unknown format — skip

    using var fs = File.OpenRead(filePath);
    var header = new byte[512];
    int read = fs.Read(header, 0, header.Length);
    var byBytes = catalog.DetectFromBytes(header.AsSpan(0, read));

    // Spoofed if bytes point to a different known format
    return byBytes is not null && byBytes.ResourceKey != byExtension.ResourceKey;
}

// Usage
if (IsExtensionSpoofed(@"C:\uploads\invoice.pdf"))
    Console.WriteLine("Warning: file content does not match its extension.");

Grammar loader — wire syntax highlighting into a custom editor

var catalog = EmbeddedFormatCatalog.Instance;

// Load grammars only for the Programming category (enum — no typo risk)
var languages = catalog.GetByCategory(FormatCategory.Programming)
    .Where(e => e.HasSyntaxDefinition)
    .OrderBy(e => e.Name);

foreach (var lang in languages)
{
    string? grammarJson = catalog.GetSyntaxDefinitionJson(lang.ResourceKey);
    if (grammarJson is null) continue;

    // Deserialize into your tokenizer model and register
    // MyTokenizerRegistry.Register(lang.Name, grammarJson);
    Console.WriteLine($"Loaded grammar: {lang.Name} ({lang.Extensions.FirstOrDefault()})");
}
// Output: Loaded grammar: C# (.cs), Loaded grammar: Python (.py), ...

// Validate your own .whfmt file against the embedded schema
string? whfmtSchema = catalog.GetSchemaJson(SchemaName.Whfmt);
// Pass whfmtSchema to your JSON schema validator (e.g. JsonSchema.Net)

MIME negotiation — extension ↔ MIME bidirectional mapping

var catalog = EmbeddedFormatCatalog.Instance;

// Extension → MIME (e.g. for HTTP Content-Type)
string? GetMimeType(string extension)
    => catalog.GetByExtension(extension)?.MimeTypes?.FirstOrDefault();

// MIME → canonical extension (e.g. for file download naming)
string? GetExtension(string mimeType)
    => catalog.GetByMimeType(mimeType)?.Extensions.FirstOrDefault();

// Examples
Console.WriteLine(GetMimeType(".png"));          // "image/png"
Console.WriteLine(GetMimeType(".zip"));          // "application/zip"
Console.WriteLine(GetExtension("image/png"));    // ".png"
Console.WriteLine(GetExtension("audio/mpeg"));   // ".mp3"

Features

Format Detection

  • 675+ embedded .whfmt definitions — extension, MIME type, and magic-byte lookup
  • DetectFromBytes(ReadOnlySpan<byte>) — zero-alloc magic-byte scoring across all signatures
  • GetByExtension, GetByMimeType, GetByCategory — multiple lookup strategies
  • GetCompatibleEditorIds — returns all compatible editor IDs for a given file path
  • 27 categories: Archives, Audio, Images, Game, Documents, Video, System, 3D, and more

Syntax Highlighting

  • 35 language grammars with syntaxDefinition blocks (C#, Python, JS/TS, Go, Rust, Java, Kotlin, Swift, Dart, PHP, Ruby, Lua, SQL, YAML, TOML, Markdown, and more)
  • GetSyntaxDefinitionJson(resourceKey) — raw grammar JSON ready for a tokenizer
  • HasSyntaxDefinition flag for fast filtering

Enum API

  • FormatCategory enum — all 27 categories with IntelliSense, no string typos
  • SchemaName enum — Whfmt, Whcd, Whdbg, Whidews, Whscd
  • All enum overloads delegate to string overloads — both variants always available

Performance

  • Singleton with lazy thread-safe initialization (LazyInitializer)
  • Entries backed by FrozenSet<T> — O(1) set operations
  • JSON cache — each resource key read once, then served from memory
  • PreWarm() — pre-load all JSON on a background thread before first use

What's New in 1.0.0

  • New: Initial NuGet release — cross-platform net8.0.
  • New: FormatCategory enum — type-safe overload for GetByCategory().
  • New: SchemaName enum — type-safe overload for GetSchemaJson().
  • New: DetectFromBytes(ReadOnlySpan<byte>) — magic-byte detection across all 675+ signatures.
  • New: GetByMimeType(string) — MIME type lookup.
  • New: GetByCategory(string/FormatCategory) — category browsing.
  • New: GetSchemaJson(string/SchemaName) — access to 5 embedded JSON schemas.
  • New: MimeTypes and Signatures fields on EmbeddedFormatEntry.

Included Assemblies

Both bundled inside the package — zero external NuGet dependencies:

Assembly Purpose
WpfHexEditor.Core.Definitions EmbeddedFormatCatalog singleton + 675+ embedded .whfmt definitions
WpfHexEditor.Core.Contracts IEmbeddedFormatCatalog, EmbeddedFormatEntry, FormatCategory, SchemaName

License

GNU Affero General Public License v3.0 (AGPL-3.0)

Product Compatible and additional computed target framework versions.
.NET net8.0 is compatible.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed.  net9.0 was computed.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed.  net10.0 was computed.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.
  • net8.0

    • No dependencies.

NuGet packages (2)

Showing the top 2 NuGet packages that depend on whfmt.FileFormatCatalog:

Package Downloads
whfmt.Fuzz

Generate format-aware mutant files for fuzzing parsers and decoders. Uses whfmt fuzz strategies (boundary_values, enum_sweep, corrupt_signature, bit_flip, overflow, random_bytes, byte_swap, truncate, duplicate) declared in 799 format definitions (schema v3). Automatically recomputes checksums after mutation. Cross-platform net8.0.

whfmt.Analysis

Field-level semantic diff between two binary files using whfmt format definitions. Groups entries, ignores noise fields (timestamps, checksums), and surfaces meaningful structural changes. Powered by 799 whfmt.FileFormatCatalog definitions (schema v3). Cross-platform net8.0.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
1.4.0 56 5/21/2026
1.3.2 133 5/12/2026
1.3.1 138 5/7/2026
1.3.0 91 5/7/2026
1.2.0 104 5/1/2026
1.1.1 97 4/28/2026
1.1.0 96 4/28/2026
1.0.0 110 4/16/2026