Devlooped.Web
1.2.2
Prefix Reserved
See the version list below for details.
dotnet add package Devlooped.Web --version 1.2.2
NuGet\Install-Package Devlooped.Web -Version 1.2.2
<PackageReference Include="Devlooped.Web" Version="1.2.2" />
paket add Devlooped.Web --version 1.2.2
#r "nuget: Devlooped.Web, 1.2.2"
// Install Devlooped.Web as a Cake Addin #addin nuget:?package=Devlooped.Web&version=1.2.2 // Install Devlooped.Web as a Cake Tool #tool nuget:?package=Devlooped.Web&version=1.2.2
Read HTML as XML and query it with CSS over XLinq (or HtmlAgilityPack killer 😉).
No need to learn an entirely new object model for a page 🤘. This makes it the most productive and lean library for web scraping using the latest and greatest that .NET can offer.
Usage
using System.Xml.Linq;
using Devlooped.Web;
XDocument page = HtmlDocument.Load("page.html")
IEnumerable<XElement> elements = page.CssSelectElements("div.menuitem");
XElement title = page.CssSelectElement("html head meta[name=title]");
By default, HtmlDocument.Load
will skip non-content elements script
and
style
, turn all element names into lower case, and ignore all XML namespaces
(useful when loading XHTML, for example) for easier querying. These options
as well as granular whitespace handling can be configured using the overloads
receiving an HtmlReaderSettings
.
The underlying parsing is performed by the amazing SgmlReader library by Microsoft's Chris Lovett.
In addition, the following extension methods make it easier to work with XML documents where you want to query with CSS or XPath without having to deal with XML namespaces:
using System.Xml;
using System.Xml.Linq;
using Devlooped.Web;
var doc = XDocument.Load("doc.xml")
// Will remove all xmlns declarations, and allow querying elements
// as if none had namespaces, returns the root element
XElement nons = doc.RemoveNamespaces();
// Alternatively, you can also ignore at the XmlReader level
using var reader = XmlReader.Create("doc.xml").IgnoreNamespaces();
doc = XDocument.Load(reader);
// Finally, you can also skip elements at the reader level
using var reader = XmlReader.Create("doc.xml").SkipElements("foo", "bar");
doc = XDocument.Load(reader);
CSS
At the moment, supports the following CSS selector features:
And all combinators
Non-CSS features:
text()
pseudo-attribute selector: selects the node text contents, as specified in the XPathtext()
location path. Can be used instead of an attribute name selector, such asdiv[text()=foo]
. All attribute value selectors are also supported:[text()=val]
: Represents an element whose text contents is exactly "val".[text()~=val]
: Represents an element whose text contents is a whitespace-separated list of words, one of which is exactly "val". If "val" contains whitespace, it will never represent anything (since the words are separated by spaces). Also if "val" is the empty string, it will never represent anything.[text()|=val]
: Represents an element whose text contents either being exactly "val" or beginning with "val" immediately followed by "-" (U+002D).[text()^=val]
: Represents an element whose text contents begins with the prefix "val". If "val" is the empty string then the selector does not represent anything.[text()$=val]
: Represents an element whose text contents ends with the suffix "val". If "val" is the empty string then the selector does not represent anything.[text()*=val]
: Represents an element whose text contents contains at least one instance of the substring "val". If "val" is the empty string then the selector does not represent anything.
Sponsors
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net5.0 was computed. net5.0-windows was computed. net6.0 was computed. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 was computed. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. |
.NET Core | netcoreapp2.0 was computed. netcoreapp2.1 was computed. netcoreapp2.2 was computed. netcoreapp3.0 was computed. netcoreapp3.1 was computed. |
.NET Standard | netstandard2.0 is compatible. netstandard2.1 was computed. |
.NET Framework | net461 was computed. net462 was computed. net463 was computed. net47 was computed. net471 was computed. net472 was computed. net48 was computed. net481 was computed. |
MonoAndroid | monoandroid was computed. |
MonoMac | monomac was computed. |
MonoTouch | monotouch was computed. |
Tizen | tizen40 was computed. tizen60 was computed. |
Xamarin.iOS | xamarinios was computed. |
Xamarin.Mac | xamarinmac was computed. |
Xamarin.TVOS | xamarintvos was computed. |
Xamarin.WatchOS | xamarinwatchos was computed. |
-
.NETStandard 2.0
- Microsoft.Xml.SgmlReader (>= 1.8.30)
- Superpower (>= 3.0.0)
NuGet packages (3)
Showing the top 3 NuGet packages that depend on Devlooped.Web:
Package | Downloads |
---|---|
Devlooped.Xml.Css
Superseded by Devlooped.Web |
|
Devlooped.Html
Superseded by Devlooped.Web |
|
Devlooped.Epub
Lightweight read-only API for processing EPUB documents. |
GitHub repositories
This package is not used by any popular GitHub repositories.