RuiJi.Net.Core
1.0.2-beta
See the version list below for details.
dotnet add package RuiJi.Net.Core --version 1.0.2-beta
NuGet\Install-Package RuiJi.Net.Core -Version 1.0.2-beta
<PackageReference Include="RuiJi.Net.Core" Version="1.0.2-beta" />
paket add RuiJi.Net.Core --version 1.0.2-beta
#r "nuget: RuiJi.Net.Core, 1.0.2-beta"
// Install RuiJi.Net.Core as a Cake Addin
#addin nuget:?package=RuiJi.Net.Core&version=1.0.2-beta&prerelease
// Install RuiJi.Net.Core as a Cake Tool
#tool nuget:?package=RuiJi.Net.Core&version=1.0.2-beta&prerelease
RuiJi.Net Documentation
http://www.ruijihg.com/archives/ruijinet/getting-started
RuiJi.Net.Core Sample
crawl use local ip automatic
var crawler = new RuiJiCrawler();
var request = new Request("https://www.baidu.com");
var response = crawler.Request(request);
crawl with special ip
var crawler = new RuiJiCrawler();
var request = new Request("https://www.baidu.com");
request.Ip = "192.168.31.196";
var response = crawler.Request(request);
crawl with proxy
var crawler = new RuiJiCrawler();
var request = new Request("https://www.baidu.com");
request.Proxy = new RequestProxy("223.93.172.248", 3128);
var response = crawler.Request(request);
extract url
var crawler = new RuiJiCrawler();
var request = new Request("https://www.oschina.net/blog");
var response = crawler.Request(request);
var content = response.Data.ToString();
var parser = new RuiJiParser();
var eb = parser.ParseExtract("css a.blog-title-link[href]\nexp https://my.oschina.net/*/blog/*");
var result = RuiJiExtracter.Extract(content, eb.Block);
extract tile
var crawler = new RuiJiCrawler();
var request = new Request("http://www.ruijihg.com/archives/category/tech/bigdata");
var response = crawler.Request(request);
var content = response.Data.ToString();
var parser = new RuiJiParser();
var eb = parser.ParseExtract(@"[tile]\ncss article:html
[meta]
#title
css .entry-header:text
#summary
css .entry-header + p:text
ex /Read more »/ -e");
var result = RuiJiExtracter.Extract(content, eb.Block);
extract meta
var crawler = new RuiJiCrawler();
var request = new Request("https://my.oschina.net/zhupingqi/blog/1826317");
var response = crawler.Request(request);
var content = response.Data.ToString();
var parser = new RuiJiParser();
var eb = parser.ParseExtract(@"[meta]
#title
css h1.header:text
#author
css div.blog-meta .avatar + span:text
#date
css div.blog-meta > div.item:first:text
regS /发布于/ 1
#words_i
css div.blog-meta > div.item:eq(1):text
regS / / 1
#content
css #articleContent:html");
var result = RuiJiExtracter.Extract(content, eb.Block);
detect mine
var crawler = new RuiJiCrawler();
var request = new Request("http://img10.jiuxian.com/2018/0111/cd51bb851410404388155b3ec2c505cf4.jpg");
var response = crawler.Request(request);
var ex = response.Extensions;
run js
var request = new Request("https://gitee.com/zhupingqi/RuiJi.Net");
request.RunJS = true;
var crawler = new RuiJiCrawler();
var response = crawler.Request(request);
cookie
var request = new Request("https://gitee.com/zhupingqi/RuiJi.Net");
request.Cookie = "xxxxxx";
var crawler = new RuiJiCrawler();
var response = crawler.Request(request);
More Feature
please visit my github
or my website
http://www.ruijihg.com/archives/ruijinet/getting-started
RuiJi.Net Cluster is waitting for you
RuiJi.Net is a dotnet distributed crawler framework written in c#.Major features include distribute crawler, distribute extracter and managed cookie, support ip polling that using the server public network address and proxy server.
RuiJi.Net has extract model called RuiJi Expression,It divides the web page into block,tile and meta. you can extract web page by RuiJi Expression and save the expression with text file or databse.
RuiJi.Net have more features including extract rule match by url wildcard and page feature, paging extract, url function, cookie manager and cookie channel,much selecors to clear data, ...
If you like , please star my project, It will give me more motivation to improve this project.
Product | Versions |
---|---|
.NET Framework | net46 net461 net462 net463 net47 net471 net472 net48 net481 |
-
- CsQuery (>= 1.3.4)
- DiffPlex (>= 1.2.1)
- ICSharpCode.SharpZipLib.dll (>= 0.85.4.369)
- log4net (>= 1.2.10)
- Newtonsoft.Json (>= 6.0.4)
- PhantomJS (>= 2.1.1)
- RestSharp (>= 106.2.2)
- UDE.CSharp (>= 1.1.0)
- UniversalTypeConverter (>= 1.0.5)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.
Version | Downloads | Last updated | |
---|---|---|---|
1.2.2 | 556 | 8/21/2020 | |
1.2.1 | 561 | 8/21/2020 | |
1.2.0 | 487 | 6/22/2020 | |
1.1.9 | 462 | 6/22/2020 | |
1.1.8 | 426 | 6/22/2020 | |
1.1.7 | 802 | 10/8/2018 | |
1.1.6 | 742 | 10/6/2018 | |
1.1.5 | 716 | 10/5/2018 | |
1.1.4 | 742 | 10/4/2018 | |
1.1.3 | 697 | 9/25/2018 | |
1.1.1 | 699 | 9/25/2018 | |
1.1.0 | 719 | 9/20/2018 | |
1.0.6-beta | 599 | 7/30/2018 | |
1.0.5-beta | 732 | 7/17/2018 | |
1.0.4-beta | 636 | 7/7/2018 | |
1.0.3-beta | 710 | 7/4/2018 | |
1.0.2-beta | 765 | 6/25/2018 | |
1.0.1-beta | 721 | 6/22/2018 |