A .NET library to read and write Apache Parquet files.
See the version list below for details.
Install-Package Parquet.Net -Version 1.3.0
dotnet add package Parquet.Net --version 1.3.0
<PackageReference Include="Parquet.Net" Version="1.3.0" />
paket add Parquet.Net --version 1.3.0
- .NET byte and sbyte type is supported
- DataSet has got a new .Merge method that allows to merge two datasets, even if rows and columns are incompatible
- we have removed a dependency on Snappy.Sharp completely which conflicted with projects targeting .NET 4.5
- dependency on System.ValueTuple is removed
- Apache Thrift dependency was replaced by a custom build which has zero downstream dependencies now. This was causing problems for projects using ASP.NET Core and specific Kestrel version, as stupidly enough Apache Thrift was referencing web hosting framework!
- INT64 (C# long) type is supported (#194)
- Decimal datatype is fully supported (#209). This includes support for simple System.Decimal, and decimal types with different scales and precisions. Decimals are encoded by utilising all three encodings from parquet specs, however this can be switched off for compatibility with older system. Decimals are fully compatible with Hive and Impala which have some edge cases not complying with parquet specifications. Thanks to @dmitryPavliv and @nzapolski for making this possible
- fixed a flaw in dictionary encoding implementation affecting files written for AWS Impala (#193)
- when a column contains only single value and it's null Parquet.Net was crashing (#198)
- Reader supports nested structures.
- Parquet output is now compatible with AWS Athena
- Writer can append data to existing file
- Parquet metadata sets page sizes according to standard
- Schema and SchemaElement has Show method allowing to get user readable representation
- some files mix encodings between data pages resulting in wrong count of rows and occasional crashes (#183)
- some string encoded fields not annotated properly were crashing Parquet.Net reader (#138)
Showing the top 1 GitHub repositories that depend on Parquet.Net:
ML.NET is an open source and cross-platform machine learning framework for .NET.
Read more about the GitHub Usage information on our documentation.