1. HtmlAgilityPack

    By:

    This is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor XSLT to use it, don't worry...). It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to... More information

  2. Spidy

    By:

    FSharp Web Crawler

  3. MisterHexCrawler

    By:

    Simple web crawler that return IObservable using Reactive Extension(Rx) and async await.

  4. SkyScraper

    By:

    Web scraper / crawler / spider. Supports robots protocol and user agent.

  5. Arachnophobia

    By:

    A library that includes a HttpHandler and HttpModule to let spiders know not to index your ASP.Net site