CRMScraper.Library 1.1.52

There is a newer version of this package available.
See the version list below for details.
dotnet add package CRMScraper.Library --version 1.1.52                
NuGet\Install-Package CRMScraper.Library -Version 1.1.52                
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="CRMScraper.Library" Version="1.1.52" />                
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add CRMScraper.Library --version 1.1.52                
#r "nuget: CRMScraper.Library, 1.1.52"                
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
// Install CRMScraper.Library as a Cake Addin
#addin nuget:?package=CRMScraper.Library&version=1.1.52

// Install CRMScraper.Library as a Cake Tool
#tool nuget:?package=CRMScraper.Library&version=1.1.52                

CRM Scraper

CRM Scraper is a powerful library designed to scrape CRM (Customer Relationship Management) systems and extract valuable data. This project provides a comprehensive scraping solution, supporting both static and dynamic websites using HTML parsing and Playwright for dynamic content rendering.

Features

  • HTML Parsing: Scrape static websites using HtmlAgilityPack for extracting structured data.
  • Dynamic Content Scraping: Utilizes Playwright to scrape websites with dynamic content (JavaScript-heavy websites).
  • Extensible API: Built with flexibility in mind, allowing users to extend the scraper as per their use case.
  • Retry Mechanism: Built-in retry mechanism with exponential backoff for failed requests.
  • Concurrent Scraping: Supports concurrent scraping tasks to speed up large-scale data extraction.
  • Unit Tests: Extensive test coverage using xUnit for core functionalities.

Project Structure

.
├── ScraperConsoleApp           # Console application to manually test the library
├── src
│   ├── CRMScraper.Library      # Main library containing scraper logic
│   ├── CRMScraper.Tests        # Unit tests for the library
├── TestResults                 # Test result artifacts, including coverage reports
├── .github                     # GitHub Actions for CI/CD
├── scraping_service_library_net.sln # Solution file

Library Components

  • ScraperClient: Core scraping logic that handles page requests, both static and dynamic.
  • ScraperTaskExecutor: Manages the execution of scraping tasks concurrently.
  • PageElementsExtractor: Service that handles the extraction of JavaScript and API links from the page.
  • ScraperHelperService: Provides helper methods such as retry logic for scraping.

Getting Started

Prerequisites

  • .NET 8 SDK or later
  • Playwright (for dynamic content scraping)

Installing

  1. Clone the repository:

    git clone https://github.com/yourusername/scraping_service_library_net.git
    cd scraping_service_library_net
    
  2. Restore dependencies:

    dotnet restore
    
  3. Build the project:

    dotnet build --configuration Release
    
  4. Run the console application:

    cd ScraperConsoleApp
    dotnet run
    

Running Tests

The project uses xUnit for unit tests and coverlet for code coverage. To run the tests and generate a coverage report:

dotnet test --configuration Release --collect:"XPlat Code Coverage" --results-directory TestResults/ --logger "trx;LogFileName=TestResults.trx"

CI/CD

This project uses GitHub Actions for continuous integration and deployment. The CI pipeline performs the following tasks:

  • Build the project
  • Run unit tests with code coverage
  • Generate a NuGet package and upload it as an artifact

The .github/workflows/dotnet-ci.yml file defines the build and test steps.

Creating a NuGet Package

To create a NuGet package, use the following command:

dotnet pack --configuration Release --output ./nupkgs

Usage

You can integrate the CRMScraper.Library into your project by including the package. Here's an example of using the ScraperClient:

using CRMScraper.Library;
using CRMScraper.Library.Core;
using System.Net.Http;

var httpClient = new HttpClient();
var scraperClient = new ScraperClient(httpClient, new PageElementsExtractor());

var result = await scraperClient.ScrapePageAsync("https://example.com");
Console.WriteLine(result.HtmlContent);

Contributing

Contributions are welcome! If you find a bug or have a feature request, please open an issue. For larger changes, feel free to fork the repository and submit a pull request.

License

This project is licensed under the MIT License - see the LICENSE file for details.


### Key Sections Covered:
1. **Project Overview**: A description of the CRM Scraper and its main features.
2. **Project Structure**: Provides a high-level structure of the project.
3. **Getting Started**: Instructions for cloning, building, and running the project.
4. **Running Tests**: Commands for running tests and generating coverage reports.
5. **CI/CD**: A brief overview of the GitHub Actions pipeline.
6. **Creating a NuGet Package**: Instructions for generating a NuGet package.
7. **Usage Example**: Sample code showing how to use the library.
8. **Contributing**: Encourages open-source contributions.
9. **License**: Licensing information (MIT assumed, but this can be customized).
Product Compatible and additional computed target framework versions.
.NET net8.0 is compatible.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed.  net9.0 was computed.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last updated
1.1.95 139 9/18/2024
1.1.92 104 9/18/2024
1.1.89 103 9/18/2024
1.1.84 96 9/18/2024
1.1.79 108 9/18/2024
1.1.65 91 9/17/2024
1.1.58 92 9/17/2024
1.1.52 94 9/17/2024