Textify 0.1.0-beta

HTML to plaintext conversion library (.NET Standard 2.0).

This is a prerelease version of Textify.
Install-Package Textify -Version 0.1.0-beta
dotnet add package Textify --version 0.1.0-beta
<PackageReference Include="Textify" Version="0.1.0-beta" />
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add Textify --version 0.1.0-beta
The NuGet Team does not provide support for this client. Please contact its maintainers for support.

Textify

build codecov NuGet License

An HTML to plaintext conversion library for .NET Standard 2.0 written in C#.

Features

  • Supports HTML headings, paragraphs, containers, lists and tables (basic support)
  • Takes an HTML string as an input or an INode from AngleSharp
  • Outputs a readable text representation of the web page
  • Targets .NET Standard 2.0
  • Full test coverage

Installation

Install from NuGet:

Install-Package Textify

or

dotnet add package Textify

Usage

HtmlToTextConverter converter = new HtmlToTextConverter();
string output = converter.Convert(html);

By default, the whole page will be converted.

If you're interested in converting only a part of it, parse the page by yourself with AngleSharp and pass the INode you're interested in. You don't need to install AngleSharp because Textify already depends on it.

HtmlParser parser = new HtmlParser();
IHtmlDocument doc = parser.ParseDocument(html);
IElement element = doc.QuerySelector("#main");

HtmlToTextConverter converter = new HtmlToTextConverter();
string output = converter.Convert(element);

Example

Input:

<div id="page">
    <header>
        <a href="/" class="site-logo">
        	<img src="logo.png" alt="Logo" />
        </a>
        <h1>
            Site title
        </h1>
    </header>
    <main>
    	<article>
        	<h2>Article title</h2>
            
            <p>
                <strong>Lorem ipsum</strong> dolor sit amet, consectetur adipiscing elit,
                sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
            </p>

            <p>Ut enim ad minim veniam, quis nostrud exercitation ullamco
            laboris nisi ut aliquip ex ea commodo consequat.</p>
            
            Here is a list of things anyway:

            <ul>
                <li>One</li>
                <li>Two</li>
                <li>Three</li>
            </ul>

            But maybe a table is nicer:<br><br>
            
            <table>
                <thead>
                	<th>Key</th>
                    <th>Value</th>
                </thead>
                <tr>
                	<td>One</td>
                    <td>Value</td>
                </tr>
            </table>
        </article>
    </main>
</div>

Output:

[IMG: Logo]

++++++++++
Site title
++++++++++

-------------
Article title
-------------

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

Here is a list of things anyway:

* One
* Two
* Three

But maybe a table is nicer:

| Key | Value |

| One | Value |

License

MIT license.

Thanks to Jay Taylor for the inspiration with his html2text Go module.

Textify

build codecov NuGet License

An HTML to plaintext conversion library for .NET Standard 2.0 written in C#.

Features

  • Supports HTML headings, paragraphs, containers, lists and tables (basic support)
  • Takes an HTML string as an input or an INode from AngleSharp
  • Outputs a readable text representation of the web page
  • Targets .NET Standard 2.0
  • Full test coverage

Installation

Install from NuGet:

Install-Package Textify

or

dotnet add package Textify

Usage

HtmlToTextConverter converter = new HtmlToTextConverter();
string output = converter.Convert(html);

By default, the whole page will be converted.

If you're interested in converting only a part of it, parse the page by yourself with AngleSharp and pass the INode you're interested in. You don't need to install AngleSharp because Textify already depends on it.

HtmlParser parser = new HtmlParser();
IHtmlDocument doc = parser.ParseDocument(html);
IElement element = doc.QuerySelector("#main");

HtmlToTextConverter converter = new HtmlToTextConverter();
string output = converter.Convert(element);

Example

Input:

<div id="page">
    <header>
        <a href="/" class="site-logo">
        	<img src="logo.png" alt="Logo" />
        </a>
        <h1>
            Site title
        </h1>
    </header>
    <main>
    	<article>
        	<h2>Article title</h2>
            
            <p>
                <strong>Lorem ipsum</strong> dolor sit amet, consectetur adipiscing elit,
                sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
            </p>

            <p>Ut enim ad minim veniam, quis nostrud exercitation ullamco
            laboris nisi ut aliquip ex ea commodo consequat.</p>
            
            Here is a list of things anyway:

            <ul>
                <li>One</li>
                <li>Two</li>
                <li>Three</li>
            </ul>

            But maybe a table is nicer:<br><br>
            
            <table>
                <thead>
                	<th>Key</th>
                    <th>Value</th>
                </thead>
                <tr>
                	<td>One</td>
                    <td>Value</td>
                </tr>
            </table>
        </article>
    </main>
</div>

Output:

[IMG: Logo]

++++++++++
Site title
++++++++++

-------------
Article title
-------------

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

Here is a list of things anyway:

* One
* Two
* Three

But maybe a table is nicer:

| Key | Value |

| One | Value |

License

MIT license.

Thanks to Jay Taylor for the inspiration with his html2text Go module.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version History

Version Downloads Last updated
0.1.0-beta 89 11/29/2019