Undoc 0.1.11
See the version list below for details.
dotnet add package Undoc --version 0.1.11
NuGet\Install-Package Undoc -Version 0.1.11
<PackageReference Include="Undoc" Version="0.1.11" />
<PackageVersion Include="Undoc" Version="0.1.11" />
<PackageReference Include="Undoc" />
paket add Undoc --version 0.1.11
#r "nuget: Undoc, 0.1.11"
#:package Undoc@0.1.11
#addin nuget:?package=Undoc&version=0.1.11
#tool nuget:?package=Undoc&version=0.1.11
Undoc
High-performance Microsoft Office document extraction to Markdown for .NET.
Installation
dotnet add package Undoc
Usage
Basic Usage
using Undoc;
// Parse a document
using var doc = UndocDocument.ParseFile("document.docx");
// Convert to Markdown
var markdown = doc.ToMarkdown();
Console.WriteLine(markdown);
// Convert to plain text
var text = doc.ToText();
// Convert to JSON
var json = doc.ToJson();
With Markdown Options
using Undoc;
using var doc = UndocDocument.ParseFile("document.xlsx");
var options = new MarkdownOptions
{
IncludeFrontmatter = true,
ParagraphSpacing = true
};
var markdown = doc.ToMarkdown(options);
Parse from Bytes
using Undoc;
byte[] data = File.ReadAllBytes("document.pptx");
using var doc = UndocDocument.ParseBytes(data);
var markdown = doc.ToMarkdown();
Extract Resources (Images)
using Undoc;
using var doc = UndocDocument.ParseFile("document.docx");
// Get all resource IDs
var resourceIds = doc.GetResourceIds();
foreach (var id in resourceIds)
{
// Get resource metadata
using var info = doc.GetResourceInfo(id);
var filename = info?.RootElement.GetProperty("filename").GetString();
Console.WriteLine($"Resource: {filename}");
// Get resource binary data
var data = doc.GetResourceData(id);
if (data != null && filename != null)
{
File.WriteAllBytes(filename, data);
}
}
Document Metadata
using Undoc;
using var doc = UndocDocument.ParseFile("document.docx");
Console.WriteLine($"Title: {doc.Title}");
Console.WriteLine($"Author: {doc.Author}");
Console.WriteLine($"Sections: {doc.SectionCount}");
Console.WriteLine($"Resources: {doc.ResourceCount}");
Console.WriteLine($"Library Version: {UndocDocument.Version}");
Supported Formats
- DOCX - Microsoft Word documents
- XLSX - Microsoft Excel spreadsheets
- PPTX - Microsoft PowerPoint presentations
Features
- RAG-Ready Output: Structured Markdown optimized for RAG/LLM applications
- High Performance: Native Rust implementation via P/Invoke
- Asset Extraction: Images and embedded resources
- Metadata Preservation: Document properties, styles, formatting
- Cross-Platform: Windows, Linux, macOS (Intel & ARM)
API Reference
UndocDocument Class
Static Methods
ParseFile(string path)- Parse document from file pathParseBytes(byte[] data)- Parse document from bytes
Instance Methods
ToMarkdown(MarkdownOptions? options)- Convert to MarkdownToText()- Convert to plain textToJson(bool compact)- Convert to JSONPlainText()- Get plain text (fast extraction)GetResourceIds()- List of resource IDsGetResourceInfo(string id)- Resource metadata as JsonDocumentGetResourceData(string id)- Resource binary data
Properties
Title- Document titleAuthor- Document authorSectionCount- Number of sectionsResourceCount- Number of resourcesVersion(static) - Library version
MarkdownOptions Class
IncludeFrontmatter- Include YAML frontmatterEscapeSpecialChars- Escape special charactersParagraphSpacing- Add extra paragraph spacing
License
MIT License - see LICENSE for details.
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net8.0
- No dependencies.
NuGet packages (2)
Showing the top 2 NuGet packages that depend on Undoc:
| Package | Downloads |
|---|---|
|
FileFlux
Complete document processing SDK optimized for RAG systems. Transform PDF, DOCX, Excel, PowerPoint, Markdown and other formats into high-quality chunks with intelligent semantic boundary detection. Includes advanced chunking strategies, metadata extraction, and performance optimization. |
|
|
FileFlux.Core
Pure document extraction SDK for RAG systems. Zero AI dependencies. Extract text from PDF, DOCX, Excel, PowerPoint, Markdown, HTML, and text files. Provides IDocumentReader interface and implementations. Use FileFlux.Core for extraction-only scenarios. For AI-enhanced extraction (image OCR, captioning), use the FileFlux package. |
GitHub repositories
This package is not used by any popular GitHub repositories.