Unhwp 0.1.14
dotnet add package Unhwp --version 0.1.14
NuGet\Install-Package Unhwp -Version 0.1.14
<PackageReference Include="Unhwp" Version="0.1.14" />
<PackageVersion Include="Unhwp" Version="0.1.14" />
<PackageReference Include="Unhwp" />
paket add Unhwp --version 0.1.14
#r "nuget: Unhwp, 0.1.14"
#:package Unhwp@0.1.14
#addin nuget:?package=Unhwp&version=0.1.14
#tool nuget:?package=Unhwp&version=0.1.14
Unhwp
High-performance .NET library for extracting HWP/HWPX Korean word processor documents to Markdown.
Installation
dotnet add package Unhwp
Or via NuGet Package Manager:
Install-Package Unhwp
Quick Start
using Unhwp;
// Simple conversion
string markdown = UnhwpConverter.ToMarkdown("document.hwp");
Console.WriteLine(markdown);
// Extract plain text
string text = UnhwpConverter.ExtractText("document.hwp");
// Full parsing with images
using var result = UnhwpConverter.Parse("document.hwp");
Console.WriteLine(result.Markdown);
Console.WriteLine($"Sections: {result.SectionCount}");
Console.WriteLine($"Paragraphs: {result.ParagraphCount}");
// Save images
foreach (var img in result.Images)
{
img.Save($"output/{img.Name}");
}
Features
- Fast: Native Rust library with zero-copy parsing
- Complete: Extracts text, tables, images, and document structure
- Clean Output: Optional cleanup pipeline for polished Markdown
- Format Support: HWP 5.0, HWPX, and HWP 3.x (legacy)
API Reference
UnhwpConverter (Static Class)
Properties
Version- Gets the library version stringSupportedFormats- Gets a description of supported formats
Methods
DetectFormat(string path) -> DocumentFormat
Detect the format of a document file.
var format = UnhwpConverter.DetectFormat("document.hwp");
if (format == DocumentFormat.Hwp5)
Console.WriteLine("HWP 5.0 format");
Parse(string path, RenderOptions? options = null) -> ParseResult
Parse a document with full access to content and images.
using var result = UnhwpConverter.Parse("document.hwp");
Console.WriteLine(result.Markdown);
Console.WriteLine(result.Text);
foreach (var img in result.Images)
Console.WriteLine($"{img.Name}: {img.Data.Length} bytes");
ParseBytes(byte[] data, RenderOptions? options = null) -> ParseResult
Parse a document from byte array.
byte[] documentBytes = File.ReadAllBytes("document.hwp");
using var result = UnhwpConverter.ParseBytes(documentBytes);
Console.WriteLine(result.Markdown);
ToMarkdown(string path) -> string
Convert an HWP/HWPX document to Markdown.
string markdown = UnhwpConverter.ToMarkdown("document.hwp");
ToMarkdownWithCleanup(string path, CleanupOptions? options = null) -> string
Convert with optional cleanup.
string markdown = UnhwpConverter.ToMarkdownWithCleanup(
"document.hwp",
CleanupOptions.Aggressive
);
ExtractText(string path) -> string
Extract plain text content.
string text = UnhwpConverter.ExtractText("document.hwp");
Classes
ParseResult
Result of parsing a document. Implements IDisposable.
Properties:
Markdown- Rendered Markdown contentText- Plain text contentRawContent- Content without cleanupSectionCount- Number of sectionsParagraphCount- Number of paragraphsImageCount- Number of imagesImages- List of extracted images
RenderOptions
Options for Markdown rendering.
var opts = new RenderOptions
{
IncludeFrontmatter = true,
ImagePathPrefix = "images/",
TableFallback = TableFallback.Html,
PreserveLineBreaks = false,
EscapeSpecialChars = true
};
CleanupOptions
Options for output cleanup.
// Presets
var minimal = CleanupOptions.Minimal;
var defaultOpts = CleanupOptions.Default;
var aggressive = CleanupOptions.Aggressive;
var disabled = CleanupOptions.Disabled;
// Custom
var custom = new CleanupOptions
{
Enabled = true,
Preset = CleanupPreset.Default,
DetectMojibake = true,
PreserveFrontmatter = true
};
UnhwpImage
Represents an extracted image.
Properties:
Name- Image filenameData- Image data as byte array
Methods:
Save(string path)- Save image to file
Enums
DocumentFormat
Unknown- Unknown formatHwp5- HWP 5.0 binary formatHwpx- HWPX XML formatHwp3- HWP 3.x legacy format
TableFallback
Markdown- Render as Markdown tablesHtml- Render as HTML tablesText- Render as plain text
CleanupPreset
Minimal- Minimal cleanupDefault- Balanced cleanupAggressive- Maximum cleanup
Platform Support
- Windows (x64)
- Linux (x64)
- macOS (x64, ARM64)
Target Frameworks
- .NET 6.0, 7.0, 8.0, 10.0
- .NET Standard 2.0, 2.1
License
MIT License - see LICENSE for details.
Links
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net5.0 was computed. net5.0-windows was computed. net6.0 is compatible. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 is compatible. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 is compatible. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
| .NET Core | netcoreapp2.0 was computed. netcoreapp2.1 was computed. netcoreapp2.2 was computed. netcoreapp3.0 was computed. netcoreapp3.1 was computed. |
| .NET Standard | netstandard2.0 is compatible. netstandard2.1 is compatible. |
| .NET Framework | net461 was computed. net462 was computed. net463 was computed. net47 was computed. net471 was computed. net472 was computed. net48 was computed. net481 was computed. |
| MonoAndroid | monoandroid was computed. |
| MonoMac | monomac was computed. |
| MonoTouch | monotouch was computed. |
| Tizen | tizen40 was computed. tizen60 was computed. |
| Xamarin.iOS | xamarinios was computed. |
| Xamarin.Mac | xamarinmac was computed. |
| Xamarin.TVOS | xamarintvos was computed. |
| Xamarin.WatchOS | xamarinwatchos was computed. |
-
.NETStandard 2.0
- System.Memory (>= 4.5.5)
-
.NETStandard 2.1
- No dependencies.
-
net10.0
- No dependencies.
-
net6.0
- No dependencies.
-
net7.0
- No dependencies.
-
net8.0
- No dependencies.
-
net9.0
- No dependencies.
NuGet packages (2)
Showing the top 2 NuGet packages that depend on Unhwp:
| Package | Downloads |
|---|---|
|
FileFlux
Complete document processing SDK optimized for RAG systems. Transform PDF, DOCX, Excel, PowerPoint, Markdown and other formats into high-quality chunks with intelligent semantic boundary detection. Includes advanced chunking strategies, metadata extraction, and performance optimization. |
|
|
FileFlux.Core
Pure document extraction SDK for RAG systems. Zero AI dependencies. Extract text from PDF, DOCX, Excel, PowerPoint, Markdown, HTML, and text files. Provides IDocumentReader interface and implementations. Use FileFlux.Core for extraction-only scenarios. For AI-enhanced extraction (image OCR, captioning), use the FileFlux package. |
GitHub repositories
This package is not used by any popular GitHub repositories.