Pyther.Parser 0.2.2

dotnet add package Pyther.Parser --version 0.2.2
NuGet\Install-Package Pyther.Parser -Version 0.2.2
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="Pyther.Parser" Version="0.2.2" />
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add Pyther.Parser --version 0.2.2
#r "nuget: Pyther.Parser, 0.2.2"
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
// Install Pyther.Parser as a Cake Addin
#addin nuget:?package=Pyther.Parser&version=0.2.2

// Install Pyther.Parser as a Cake Tool
#tool nuget:?package=Pyther.Parser&version=0.2.2

CSV File Reader/Writer

  • namespace Pyther.Parser.CSV;

Features

  • easy to use
  • extreme fast (take a look at the end of this document)
  • lightweight code
  • can handle csv files of any size with minimal memory footprint
  • can transform rows into associative records, dynamic objects or objects of any class
  • lots of optional settings

Quickstart

Read CSV Files

One of many ways to read CSV files:

using Pyther.Parser.CSV;

var csv = new CSVReader();

foreach (var record in csv.ReadRecordFromPath(@"C:\orders.csv"))
{
    Console.WriteLine(record["customer-lastname"].ToString());
}

Write CSV Files

One of many ways to write CSV files:

using Pyther.Parser.CSV;

var csv = new CSVWriter(@"C:\heroes.csv");

csv.Headers.Add("Name", "FirstName", "LastName", "Height", "Remarks");
csv.Write("Parker, Peter", "Peter", "Parker", 175.3, "Arguments example");

CSVReader

Example Data

Lets take a csv like this as an example:

alternate text is missing from this package README image order.csv

We will discover 4 different ways to parse this file.

1st Way - Read as Row

  • returns a System.Collections.Generic.List of object per entry
  • access by index
var csv = new CSVReader();

foreach (List<object> row in csv.ReadRowFromPath(@"C:\orders.csv"))
{
    string remark = row[5].ToString() ?? "";
    string lastname = row[2].ToString() ?? "";
}

2nd Way - Read as Record

  • returns a Pyther.Parser.CSV.Record per entry
  • access by index or column header name
var csv = new CSVReader();

foreach (Record row in csv.ReadRecordFromPath(@"C:\orders.csv"))
{
    string remark = row[5].ToString() ?? "";
    string lastname = row["customer-lastname"].ToString() ?? "";
}
  • if you only need to access the record by header name or index, you can use one of these values as the second paramater:
    • RecordFlags.Indexed ... allow access by index only
    • RecordFlags.Associative ... allow access by name only
    • RecordFlags.Both ... allow access by index and name (default)

3rd Way - Read as Dynamic Object

  • returns a dynamic object
  • HeaderTransformMethod defines how the column name should be transformed
    • TransformMethods.KebabCaseToTitleCase: customer-lastnameCustomerName
  • access by object property (named from colum header)
var csv = new CSVReader(new Settings()
{
    HeaderTransformMethod = TransformMethods.Auto
});

foreach (dynamic obj in csv.ReadDynamicFromPath(@"C:\orders.csv"))
{
    string remark = obj.Remark.ToString() ?? "";
    string lastname = obj.CustomerLastname.ToString() ?? "";
}
  • Hint: This is the slowest way

4th Way - Read as Object

lets say we have an object like

public class Order
{
    public string? OrderId { get; set; }
    public string? CustomerFirstname { get; set; }
    public string? CustomerLastname { get; set; }
    public string? CustomerPhone { get; set; }
    public DateTime DateOfPurchase { get; set; }
    public string? Remark { get; set; }
}

we can read each line transformed to this object:

var csv = new CSVReader(new Settings()
{
    HeaderTransformMethod = TransformMethods.Auto
});

foreach (var obj in csv.ReadObjectFromPath<Order>(@"C:\orders.csv"))
{
    string remark = obj.Remark;
    string lastname = obj.CustomerLastname;
}
  • You can also populate an existing object by using it as second argument. This way you can also recycle an object to improve performance.
Order myOrder = new Order();
foreach (var _ in csv.ReadObjectFromPath(@"C:\orders.csv", myOrder))
{
    Console.WriteLine($"{csv.RowId,3} | {myOrder.OrderId} / {myOrder.DateOfPurchase}");
}

CSVWriter

  • no matter what way you choose. After initialization you don't have to care about enclosures, delimters, escapes, aso.

Create the csv object

var csv = new CSVWriter(@"C:\heroes.csv");
Defining the Column Headers
  • by default headers are optional, but are required if you want to write (dynamic) objects
  • let create the following headers: Name, FirstName, LastName, Height and Remarks:
csv.Headers.Add("Name", "FirstName", "LastName", "Height", "Remarks");
// or
csv.Headers.
    Add("Name").
    Add("FirstName").
    Add("LastName").
    Add("Height").
    Add("Remarks");
// or
csv.Headers.
    Add("Name", "FirstName").
    Add("LastName").
    Add("Height", "Remarks");

after that you can write the headers to the file. If you skip this, they will be written at the time the first record was written.

csv.WriteHeader();

1st Way - Write from Arguments

  • all parameters are object
csv.Write("Parker, Peter", "Peter", "Parker", 175.3, "Arguments example");

2nd Way - Write from Lists

var row = new List<object>{"Parker, Peter", "Peter", "Parker", 175.3, "List of objects example" };
csv.Write(row);

or

var row = new List<string>{"Parker, Peter", "Peter", "Parker", "175.3", "List of string example" };
csv.Write(row);

3rd Way - Write from Records

  • Hint: on all cases, the ordering doesn't matter
  1. from associative records
var rec = new Record();
rec["Name"] = "Parker, Peter";
rec["FirstName"] = "Peter";
rec["Height"] = 175.3;
rec["LastName"] = "Parker";
rec["Remarks"] = "List of mixed Record example";

csv.Write(rec);
  1. from indexed records
var rec = new Record();
rec[0] = "Parker, Peter";
rec[1] = "Peter";
rec[2] = "Parker";
rec[3] = 175.3;
rec[4] = "List of indexed Record example";

csv.Write(rec);
  1. or mixed
var rec = new Record();
rec["Name"] = "Parker, Peter";
rec["FirstName"] = "Peter";
rec[2] = "Parker";
rec["Height"] = 175.3;
rec["Remarks"] = "List of mixed Record example";

csv.Write(rec);

Performance Hint: if you know the amount of columns upfront (what is almost always the case), you should give this information as the first constructor argument:

var rec = new Record(5);
...

4th Way - Write from Dynamic Objects

You can also write dynamic objects

dynamic obj = new ExpandoObject();
obj.Name = "Parker, Peter";
obj.FirstName = "Peter";
obj.LastName = "Parker";
obj.Height = 175.3;
obj.Remarks = "dynamic object example";

csv.WriteDynamic(obj);

5th Way - Write from Custom Objects

Lets say we have the following Person class

class Person
{
    public string? FirstName { get; set; }
    public string? LastName { get; set; }
    public string? Name => LastName + ", " + FirstName;
    public double Height { get; set; }
    public string? Remarks { get; set; }
}

with the following example data

var person = new Person()
{
    FirstName = "Peter",
    LastName = "Parker",
    Height = 175.3,
    Remarks = "custom object example"
};

we can simply write it the following way

csv.Write(person);

6th Way - Write Custom Objects using nested paths

Lets say we have two model class:

class Order
{
    public string? Id { get; set; }
    public Address? Billing { get; set; }
    public Address? Shipping { get; set; }
}

class Address
{
    public string? FirstName { get; set; }
    public string? LastName { get; set; }
    public string? Company { get; set; }
}

with the following example data

Order order = new()
{
    Id = "123",
    Shipping = new Address()
    {
        FirstName = "Peter",
        LastName = "Parker",
        Company = "Marvel"
    }
};

and we have the following csv headers:

csv.Headers
    .Add("Id")
    .Add("Billing.FirstName", "Billing.LastName", "Billing.Company")
    .Add("Shipping.FirstName", "Shipping.LastName", "Shipping.Company");

we can simply write it the following way

csv.WriteNested(order);

and we get the follow result (remember order.Billing was not set)

Id,Billing.FirstName,Billing.LastName,Billing.Company,Shipping.FirstName,Shipping.LastName,Shipping.Company

123,,,,Peter,Parker,Marvel

Settings

You can affect the way how the csv file will be parsed using a Pyther.Parser.CSV.Settings object as a constrructor parameter:

var csv = new CSVReader(new Settings()
{
    ...
});

Encoding

  • Defines the file encoding.
  • type: Encoding
  • default: Encoding.UTF8

BufferSize

  • The buffer size used to read the file.
  • type: int
  • default: 1MB

RecordSeparator

  • Defines how the records are separated.
  • type: string
  • default: Environment.NewLine

Delimeter

  • Defines how the columns/cells are separated.
  • type: string
  • default: ,

Enclosure

  • Defines the field enclosure.
  • type: string
  • default: "

ForceEnclosure

  • Should values always be enclosed?
  • type: bool
  • default: false

Escape

  • An optional escape symbol.
  • type: string
  • default: \

EscapeByDoubling

  • Enable/Disable escaping using the enclosure symbol twice.
  • type: bool
  • default: false

HasHeaders

  • Does the CSV contain headers?
  • type: bool
  • default: true

HeaderTransformMethod

  • Callback method to transform column header names.
  • type: Func<string, string>? (string) → string
  • default: null

CellTransformMethod

  • Callback method to transform cell content.
  • type: Func<object, int, string?, object>? (cell data, column index, column name) → object
  • default: null

AutoTrim

  • Auto trim cell values?
  • type: bool
  • default: true

IgnoreEmptyLines

  • Ignore empty lines?
  • type: bool
  • default: true

FormatProvider

  • Format provider used when writing data (null means current culture)
  • type IFormatProvider
  • default: CultureInfo.InvariantCulture

ErrorToManyColumns

  • How to handle the error, if there are more column headers than record cells.
  • type: ErrorHandling
  • default: ErrorHandling.TryToSolve
  • values: Ignore, TryToSolve orThrow

ErrorToFewColumns

  • How to handle the error, if there are less column headers than record cells.
  • type: ErrorHandling
  • default: ErrorHandling.TryToSolve
  • values: Ignore, TryToSolve orThrow

ErrorInvalidClassProperty

  • How to handle the error, if a property doesn't exists in the object.
  • type: ErrorHandling
  • default: ErrorHandling.TryToSolve
  • values: Ignore, TryToSolve orThrow

How can I ...

... limit the result

Since the Read....() method returns an IEnumerable, you can use all methods they define. This include the Take() , Skip(), Where() aso. :

// skip 2 and get 5 records
foreach (var obj in csv.ReadRecord(@"C:\orders.csv").Skip(2).Take(5))
{
    ...
}

... transform cell data during parsing

First create a callback method, that is called for each cell:

  • Arguments
    • data ... raw cell data
    • columnIndex ... the index of the column of the current cell
    • columnName ... If headers are given, this argument will hold the column name of the current cell
  • Return
    • This method has to return final cell data
private static object MyCellTransform(object data, int columnIndex, string? columnName)
{
    switch (columnName)
    {
        case "DateOfPurchase":
            return DateTime.Parse((string)data).ToUniversalTime();
        default:
            return data;
    }
}

Set the method in the settings


var csv = new CSVReader(new Settings()
{
    ...
    CellTransformMethod = MyCellTransform
});```

Performance

  • Test System
    • Intel Core i5 13600KF 14x 5.1 GHz
    • 32GB DDR4-RAM PC-3600
    • NVME M.2 SSD 1TB Kingston KC3000
    • Windows 11 Pro 64-Bit

Scenario 1 (typical case)

  • Test Scenario
    • real world data (shop orders)
    • 80 col x 100k rows = 8 Mio cells
    • ~75 MB
    • Average of 5 iterations
ReadRow ReadRecord ReadRecord (indexed) ReadDynamic ReadObject
Time in Seconds 0.7918 0.8926 0.7074 3.7348 1.598
Cells per Seconds 10.10 Mio 8.96 Mio 11.31 Mio 2..14 Mio 5.01 Mio

Scenario 2 (optimized case)

  • Test Scenario
    • same as above using amazon orders, with the following options:
      • Enclosure = null
      • Delimeter = '\t'
      • Escape = null
      • RecordSeparator = Environment.NewLine
    • these options will use the fast path for parsing
ReadRow ReadRecord ReadRecord (indexed) ReadDynamic ReadObject
Time in Seconds 0.312 0.472 0.320 3.011 0.968
Cells per Seconds 25.64 Mio 16.95 Mio 25 Mio 2.66 Mio 8.26 Mio

Open Topics

  • CSVWriter: Allow Header alias or Transform
  • CSVWriter: ErrorToManyColumns, ErrorToFewColumns, ErrorInvalidClassProperty
Product Compatible and additional computed target framework versions.
.NET net6.0 is compatible.  net6.0-android was computed.  net6.0-ios was computed.  net6.0-maccatalyst was computed.  net6.0-macos was computed.  net6.0-tvos was computed.  net6.0-windows was computed.  net7.0 was computed.  net7.0-android was computed.  net7.0-ios was computed.  net7.0-maccatalyst was computed.  net7.0-macos was computed.  net7.0-tvos was computed.  net7.0-windows was computed.  net8.0 was computed.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.
  • net6.0

    • No dependencies.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last updated
0.2.2 178 12/29/2023
0.2.1 109 12/27/2023
0.2.0 97 12/27/2023
0.1.0 116 12/10/2023