Axis.Dia 0.0.1

dotnet add package Axis.Dia --version 0.0.1
NuGet\Install-Package Axis.Dia -Version 0.0.1
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="Axis.Dia" Version="0.0.1" />
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add Axis.Dia --version 0.0.1
#r "nuget: Axis.Dia, 0.0.1"
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
// Install Axis.Dia as a Cake Addin
#addin nuget:?package=Axis.Dia&version=0.0.1

// Install Axis.Dia as a Cake Tool
#tool nuget:?package=Axis.Dia&version=0.0.1

Dia

Data representation specification and implementation, supporting both textual (human readable) and binary formats.

Contents

  1. Introduction
  2. Specification
  3. Annotation
  4. Bool
  5. Integer
  6. Decimal
  7. Instant
  8. String
  9. Symbol
  10. Clob
  11. Blob
  12. List
  13. Record
  14. Appendix (things like key-words, var-bytes, etc will appear here)

<a id="Introduction"></a> Introduction

Dia is yet another data representation format, born from the need for more feature-sets in json; as such, it is a superset of json.

<a id="Specification"></a> Specification

Dia recognizes 9 data types. The concept of types in Dia allow for the absence of values: in this case, a null is used. Dia types are: 0. Annotation

  1. Bool
  2. Int
  3. Decimal
  4. Instant
  5. String
  6. Symbol
  7. Clob
  8. Blob
  9. List
  10. Record

Every dia value represents data of the corresponding type, or the absence of data. All values may also have an optional annotation list attached to them.

As already stated, Dia supports Textual and Binary representation of a specific arrangement of the above types. The following sections will discuss the details of each of the types, as well as their representation in the 2 formats. However, before proceeding, a general overview of the binary format is necessary, as there are shared concepts among the types that need to be established first.

1. Data Packet

At the root of the Dia data model is a data packet. Strictly speaking, it is a list of 0 or more Dia values, where each Dia value can be of any of the supported types. It is in this form that data is transmitted or stored, or utilized.

2. Binary Representation

In binary representation, every value exists as a self-contained entity, independently from other values - meaning a series of bytes will represent each value, the reading and interpretation of which can be done without need for any other ancillary information. The only exception to this rule is for the symbol type, and will be explained in detail later.

Generally speaking, the first byte of each dia-value, henceforth called the type-metadata, represents it's type: since dia supports only 9 distinct types, this byte is more than sufficient to represent 9 distinct values: 1 - 9. For values of the bool type, a single byte is usually enough to encapsulate the entire data.

The first 4 bits (index 0 - 3) are reserved to represent actual type identifiers, bit 5 (index 4) is reserved for indicating if annotations exist on the value, bit 6 (index 5) is reserved to indicate if the value is null (if set), while the remaining 2 bits (index 6, 7) are left for each type to use as it pleases.

Immediately after the type-metadata is an optional byte group for annotations, and then a byte group for the types pay-load.

Dia binary packets are a sequential list of dia binary values.

3. Text Representation

As stated previously, the textual representation of dia is a superset of json; it is essentially json + annotations + extra-data-types.

<a id="Annotation"></a> Annotation

An annotation is not a bonafide dia type, but a special use-case of the Symbol type. An annotation is used to tag a value with extra (possible semantic) meaning, and is usually open to interpretation by the user of the data. Annotations come in 2 flavors:

  1. Tags, which are symbols in the identifier format, and cannot be equivalent to Dia keywords.

  2. Attributes (Value-Pair): a name/value pair defined as follows: <tag>:<text>, where <tag> is the name, and <text> is the value. Attributes conform to the pattern: /^\[a-zA-Z_\](([.-])?\[a-zA-Z0-9_\])*@.+\z/, the symbol is called an attribute.

Note: Annotations are never null (null symbols).

Text Representation

When present in textual format, attribtues present themselves as text with an end-delimiter of "::". The following examples illustrate this:

valid::annotations::<dia-value> // valid
 
in valid::att-ributes::<dia-value> // invalid
  
'scale@metric'::'expiry-notation@false positive'::<dia-value> // valid
  
'genre@horror'::<dia-value> // valid
  
abc@xyz::<dia-value> // invalid
Binary Representation

Identical to (Symbol)(#Symbol), but with a var-byte byte-count appearing first, representing the number of annotations that follow.

<a id="Bool"></a> 1. Bool

The Bool type represents typical boolean values: true, and false, in addition to null.

Text Representation

Binary values are represented by case insensitive "true" and "false", and the case sensitive "null.bool".

Binary Representation
Type-Metadata Byte
  • [.... 0001] (0x1)
Description

The type-metadata byte is sufficient for encapsulating all instances of the bool value. So a bool value will be represented with only 1 byte.

  • true: [.1.. 0010]
  • false: [.0.. 0010]
  • null: [..1. 0010]
  • annotated: [...1 0010]

<a id="Int"></a> 2. Integer

The int type represents signed, unlimited mathematical integer values, in addition to null.

Text Representation

Integers come in 3 flavors:

  1. Decimal integers: represented as an optional negative sign, and a series of digits possibly separated by an underscore.
  2. Hex integers: represented by the prefix "0x", followed by an unlimited sequence of any valid hexadecimal characters (0-9, a-f, A-F), possibly separated by an underscore.
  3. Binary integers: represented by the prefix "0b", followed by an unlimited sequence of zeros and ones, possibly separated by an underscore. Example:
// decimal integers
0 // valid
01 // valid
1 // valid
000 // valid
34565454 // valid
343_454_53 // valid
_545 // invalid
4345_ // invalid

// hex integers
0x01 // valid
0X0 // valid
0x // invalid
0x000a // valid
0xA // valid
0xx // invalid
0x545yt345 // invalid

// binary
0b // invalid
0B0 // invalid
0b0000011010110 // invalid
0b11010110 .. valid
Binary Representation
Type-Metadata
  • [.... 0010] (0x2)
Custom-Metadata
  • annotated: [...1 0010]
  • null: [..1. 0010]
  • Int8: [00.. 0010]
  • Int16: [01.. 0010]
  • Int32: [10.. 0010]
  • Intxx: [11.. 0010]
Description

Integers require additional bytes besides the type-metadata for their data to be represented.

Dia supports unlimited/arbitrary-length integers, so special consideration is taken to cater for this.

Examples
  1. To represent the value '42' as an Int8 value, we need 2 bytes:
    • 00 [00.. 0010]
    • 01 [0010 1010]
  2. To represent the value '1228' as an Int16 value, we need 3 bytes:
    • 00 [01.. 0010]
    • 01 [1100 1100]
    • 02 [0000 0100]
  3. To represent the value '86443187' as an Int32 value, we need 5 bytes:
    • 00 [10.. 0010]
    • 01 [1011 0011]
    • 02 [0000 0100]
    • 03 [0010 0111]
    • 04 [0000 0101]
  4. To represent the arbitrarily large integers, e.g '9223372036854775807000981123', we need 15 bytes:
    • 00 [11.. 0010]
    • 01 [1000 0011]
    • 02 [1101 1101]
    • 03 [1101 0000]
    • 04 [1010 0011]
    • 05 [1111 1100]
    • 06 [1111 1111]
    • 07 [1111 1111]
    • 08 [1111 1111]
    • 09 [1111 1111]
    • 10 [1111 1111]
    • 11 [1001 0011]
    • 12 [1110 1011]
    • 13 [1101 1100]
    • 14 [0000 0011]

See var-byte for a better explanation of how var-bytes are represented.

<a id="Decimal"></a> 3. Decimal

The Decimal type represents signed, unlimited mathematical floating point values, in addition to null.

Text Representation

Decimals come in 2 flavors:

  1. Regular decimal notation
  2. Scientific/exponent decimal notation. Examples:
// regular notation
0.0
123.0
-0.44544

// exponent notation
0.0e0
2.5E-1
2345.0E+06

With either notation, the fractional part of the number must always be present. Also, the exponent notation is essentially the regular notation with a "E<sign><digits>" concatenated to it.

Binary Representation
Type-Metadata
  • [.... 0011] (0x3)
Custom-Metadata
  • annotated: [...1 0011]
  • null: [..1. 0011]
  • Decimal16: [00.. 0011]
  • BigDecimal: [01.. 0011]
Description

The binary representation for decimals comes in 2 flavors:

  1. The straightforward adaptation of the dotnet binary representation for decimal, ergo 16 bytes of data.
  2. The custom format for big (arbitrary size) decimal values. BigDecimals are a tuple of an int (scale), and an arbitrary-length integer (significand) represented by a var-byte value.

4. <a id="Instant"></a> Instant

The Instant type represents a timestamp, in addition to null. Instants are represented in various precisions, with the unspecified precisions assuming the default values.

Text Representation

In text, instants are represented in the format: <year>-<month>-<day><delimiter><hour>:<minute>:<second>.<sub-seconds><time-zone>.

  • Mandatory components: year, month, day, delimiter
  • Optional components: hour, minute, second, sub-seconds, time-zone. Optional components are used in order of thier appearance: e.g, year-month-day-delimiter-second.sub-second is invalid, except for the time-zone component, which can appear independent of the presence of all the other optional components. Examples:
2023-02-13TZ // <year>-<month>-<day><delimiter><time-zone> -> 2023/02/13 00:00:00.0 +00:00
1993-09-27T12 +00:00 // <year>-<month>-<day><delimiter><hour><time-zone> -> 1993/09/27 12:00:00.0 +00:00
1993-09-27T12:31 // <year>-<month>-<day><delimiter><hour>:<minute> -> 1993/09/27 12:31:00.0
1993-09-27T12:31:08 // <year>-<month>-<day><delimiter><hour>:<minute>:<seconds> -> 1993/09/27 12:31:08.0
1993-09-27T12:31:08.0023319 // <year>-<month>-<day><delimiter><hour>:<minute>:<seconds>.<sub-second> -> 1993/09/27 12:31:08.0023319
Binary Representation
Type-Metadata
  • [.... 0100] (0x4)
  • annotated: [...1 0100]
  • null: [..1. 0100]
Custom-Metadata

...

Description

The instant is made of 8 components, each with their own binary representation, the first 3 of which are mandatory. In practice, however, there are actually only 5 components, because the Hour, Minute and Second components are stored as a unit of seconds, as well as the Month and Day components.

Components include (in the order they appear in the data stream):

  1. Year
  2. MD (Month-Day)
  3. HMS (Hour-Minute-Seconds) (optional)
  4. Sub-seconds (optional)
  5. Time-zone (optional)
Year

The year is a special component. It represents an ever increasing value, and as such a definit amount of data cannot be reserved for it. Owing to this, it is stored using var-bytes. The unique ability for var-bytes to store an arbitrary stream of bits is taken advantage of, thus the year component also acts as a "bit-lender" for other components whose data representations overflow whole-bytes. Specifically speaking, the Day component borrows the needed 1 bit from the first bit of the Year component: meaning a single "right-shift" of the bits restores the original value of the year component.

  • Data type: var-byte
  • Capacity: variable
Month + Day

12 months require 4 bits to encapsulate, while 31 days require 5 bits. Together, this is a byte and one bit. As stated above, the single bit is borrowed from the first bit-position of the year component.

  • Data type: byte
  • Capacity: 1.125
  • Arrangement:
    • Month: 00 [.... xxxx]
    • Day: 00 [xxxx ....], M0 [.... ...x] ps: M0 represents the custom metadata byte[0]
Hour + Minute + Second

24 hours (0-23) requires 5 bits; being identical, minutes and seconds require 6 bits each to store their range (0-59), yielding a total of 17bits, or 2 bytes and an extra bit. The extra bit is borrowed from the CustomMetadata.

  • Data type: byte
  • Capacity: 2.125
  • Arrangement:
    • Seconds: 00 [..xx xxxx]
    • Minutes: 00 [xx.. ....], 01 [.... xxxx]
    • Hours: 01 [xxxx ....], M0 [.... ..x.]
Sub-seconds

The unit of the subsecond component is the tick, each of which is 100 nanoseconds. There are a total of 9,999,999 ticks in a second, translating to 24 bits, or 3 bytes.

  • Data type: byte
  • Capacity: 3
Time-Zone

The time zone is stored as a sign bit indicating the timezone direction (positive for east, negative for west), 4 bits for 12 hour range, and 6 bits for 59 minutes range, a total of 11 bits

  • Data type: bits
  • Capacity: 7
  • Arrangement:
    • Sign: M0 [.... .x..]
    • Hour: 00 [xx.. ....], M0 [...x x...]
    • Minutes: 00 [..xx xxxx]
Overall arrangement
  • 00 [type-metadata]
  • 01 [custom-metadata]
  • 02 [year]+
  • 03 [month-day]
  • 04 [HMS]
  • 05* [HMS]
  • 06 [sub-seconds]
  • 07 [sub-seconds]
  • 08 [sub-seconds]
  • 07* [time-zone]

<a id="String"></a> 5. String

The string maintains its ubiquitous definition here: a sequence of unicode-encoded characters, in addition to null.

Text Representation

There are 2 flavors of this: Single-line string, Multi-line string. Both representations are delimiter-enclosed, and support escaping.

1. Single-line string

This is represented as a delimiter-enclosed group of characters that must be presented on a single line. This means literal characters that descend to the next line must be escaped. The enclosing delimiter for the single-line string is ". Examples:

"a valid string"

"another valid string with \n escaped new-line"

"another valid string with \u11EC a unicode escaped character"

"invalid
string"
2. Multi-line string

This slightly more complex flavor supports new-line characters, and epecial escape sequences that allow for user-friendly representation of text. The enclosing delimiter for the multi-line string is @", and ". Exmaples:

@"Valid string"

@"Valid string
with new line"

@"Valid string
with new line, and another \n escaped new ilne"

// special escaping for improved formatting/readability
@"\
        This string has the special new-line escape that \
        essentially \"swallows\" all white-space characters \
        up until the next non-whitespace character, or the \
        end of the text. to use a regular back-slash, do this '\\'.
        In this case, all of the white spaces preceeding the regular \
        back-slash are included in the text.
"
<a id="Escapes"></a> Escape sequences

The following escape sequences are supported:

Escape Sequence Meaning
\0 Null character
\a Bell
\b Backspace
\t Tab
\v Vertical Tab
\n New line
\r Carriage return
\f Form feed
\" Double quote
\\ Backslash
\NL Escape whitespaces. A backslash followed by a new line, and arbitrary number of whitespaces
\xHH 1 byte char escape
\uHHHH 2 byte char escape
Binary Representation

Type-Metadata

  • [.... 0101] (0x5)

Custom-Metadata

  • annotated: [...1 0101]
  • null: [..1. 0101]

Description

String data is stored as unicode - ie, 2 bytes per character; however, a string-count component is used to signify how many characters (groups of 2 bytes) the string contains. The string-count comes right after the type-metadata, and is represented as a var-byte.

<a id="Symbol"></a> 6. Symbol

A symbol is similar to a string, but with a few restrictions on it: It is a sequence of ONLY printable ascii characters. There are 3 types of symbols, each depending on the nature of the character sequence contained:

  1. When the sequence of characters conforms to the pattern /^\[a-zA-Z_\](([.-])?\[a-zA-Z0-9_\])*\z/, the symbol is called an identifier. In the identifier form, symbols cannot be equal to the Dia keywords.

  2. All other character sequence arrangements not matching the 2 above are classified as general symbols.

Two symbols are equivalent if they contain the same sequence of characters. Also note since the symbol is restricted to printable ascii characters, it means escape characters are never applied, but left in their escape format.

Symbols provide an Api for extracting attributes, or identifiers where present; only when extracting attributes are escape sequences processed and reduced to their actual unicode characters.

Textual Representation

Textually, except for identifier symbols, symbols MUST be enclosed by the ' delimiter. In the case of identifiers, the single-quotes can be omited. When present however, the enclosing delimiters cannot be empty.

Escape sequences supported are all listed here, excluding \NL, but including \'.

Exmaples:

null.symbol // valid, representing a null symbol
abc // valid identifier symbol

'abc' // valid identifier symbol, equivalent to the previous symbol

abc_xyz // valid

abc.xyz // valid

abc-xyz // valid

symbol // invalid

symbol.something.else // valid

null // invalid (keyword)

null_something // valid

null.something // valid

'another valid symbol' // valid

'also valid \n\x4e \u1c2f symbol' // valid
Binary Representation

Symbols represent the only data types that need contextual information while reading/writing from a binary stream. The reason is that symbols are repeated a lot, and can benefit from some form of compression. The compression process used is simple:

  1. Differentiate between a binary representation of a regular symbol, and a symbol ID.
  2. The first time a regular symbol is encountered, it is read/written as a regular symbol, and an ID is created for it using a sequentially incremented integer value, and store in a symbol table. This table is built and used ONLY during the course of reading/writing.
  3. For reading scenarios, when a symbol ID is encountered, it means it has already been read before, so the actual value is resolved from the table mentioned above.
  4. For writing scenarios, subsequent encounters of the symbol will be resolved from the table, and the IDs will be written to the stream.
Type-Metadata
  • [.... 0110] (0x6)
Custom-Metadata
  • annotated: [...1 0110]
  • null: [..1. 0110]
  • regular symbol: [.... 0110]
  • symbol ID: [.1.. 0110]
Description

In the same manner as with strings, data is store as unicode, so the symbol uses a var-byte to store the character count (not byte count), and following that is a sequence of bytes for the characters.

<a id="Clob"></a> 7. Clob

The Clob type represents a sequence of unicode characters whose meaning is left for the interpretation of the applications/systems that utilize it.

Text Representation

The textual representation of Clobs are identical to the multi-line string representation with one exception: they have an extra escape sequence - owing to the distinct enclosing delimiters. Enclosing delimiters are << and >>.

Escape sequences supported are all listed here, including \>>, and while the \NL escape sequence is supported, the version that accepts alignment parameters is not recognized for clobs.

Examples:

example 1
<<
    clob stuff. Can be anything at all, however, use of the greater-than symbol \> must be escaped.
>>

example 2
<<\
   all of the white-spaces preceeding the '\\' are ignored.
>>

example 3
<<
    function asString(x) {
        return x.toString("some values come here, even \\n new lines are accepted");
    }
>>
Binary Representation

Type-Metadata

  • [.... 0111] (0x7)

Custom-Metadata

  • annotated: [...1 0111]
  • null: [..1. 0111]

Description

Following the type-metadata is a var-byte value that represents the number of expected bytes - since Clobs are ascii characters, each character is one byte. Following the var-byte value is the actual byte sequence for the Clob.

<a id="Blob"></a> 8. Blob

The blob is, as the name implies, a block of raw bytes.

Text Representation

Blob values are represented textually as a delimiter-enclosed base-64 encoding of the actual bytes. Enclosing delimiters are < and >. Whitespaces and comments are allowed between the delimiters and the base-64 data.

Examples:

<
    VGhlIHF1aWNrIGJyb3duIGZveCBqdW1wcyBvdmVyIHRoZSBsYXp5IGRvZy4=
>
<VGhlIHF1aWNrIGJyb3duIGZveCBqdW1wcyBvdmVyIHRoZSBsYXp5IGRvZy4=>
< VGhlIHF1aWNrIGJyb3duIGZveCBqdW1wcyBvdmVyIHRoZSBsYXp5IGRvZy4= >
Binary Representation

Type-Metadata

  • [.... 1000] (0x8)

Custom-Metadata

  • annotated: [...1 1000]
  • null: [..1. 1000]

Description

Following the type-metadata is a var-byte value that represents the number of expected bytes, following that, is the actual byte sequence for the Blob.

<a id="List"></a> 9. List

A list is a sequence of zero or more Dia values.

Text Representation

Similar to json, this is represented as a delimiter enclosed sequence of comma separated values. Delimiters are [, and ]. Between each value can appear whitespaces or comments.

Examples:

[ 234, 3.45, true, [], 1992-04-05T, <abcxyz> << clob text>>, annotated::"string value"]
Binary Representation

Type-Metadata

  • [.... 1001] (0x9)

Custom-Metadata

  • annotated: [...1 1001]
  • null: [..1. 1001]

Description

Similar to the blob, a var-byte value follows the type-metadata, representing the number of items in the list. Following the count, are the binary representation of each value, layed out serially.

<a id="Record"></a> 10. Record

A record is a sequence of zero or more Dia properties. A property is a key-value pair, where the key is a non-null symbol, and the value is any legal Dia value. Records do not support duplicated property names in it's sequence of properties.

Text Representation

Similar to json, this is represented as a delimiter enclosed sequence of comma separated properties. Delimiters are {, and }. Between each property can appear whitespaces or comments. Each property in turn consists of a symbol and any dia-value, separated by a colon :. Again, whitespaces/comments can appear anywhere between these elements.

Worthy of note is that the symbols in this case can also be enclosed in " - this way, valid json also become valid dia structures.

Examples:

// valid
{ something: true}

// valid
{
    something: 2345.54,
    annotation::"key" : < b64_bytes= >,
    'Key@value'::again::'property' : bleh::34
}
Binary Representation

Type-Metadata

  • [....-1010] (0xA)

Custom-Metadata

  • annotated: [...1 1010]
  • null: [..1. 1010]

Description

The record is similar to the list, i.e the var-byte count represents number of properties in the record. Following the property count, the properties are themselves laid out serially with the key/name first, then the value last.

<a id="Appendix"></a> Apendix

<a id="Var-byte"></a> Var-byte

A variable byte binary representation. This is a regular 1-byte integer number, except that the sign-bit now represents 'overflow': if the bit is set, it means another var-byte value follows, containing more bits for the data. Reading the collection of var-byte data requires removing all the overflow bits, and concatenating the remaining bits.

<a id="Comments"></a> Comments

Comments are...

<a id="Keywords"></a> Keywords

Keywords incldue...

Product Compatible and additional computed target framework versions.
.NET net7.0 is compatible.  net7.0-android was computed.  net7.0-ios was computed.  net7.0-maccatalyst was computed.  net7.0-macos was computed.  net7.0-tvos was computed.  net7.0-windows was computed.  net8.0 was computed.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last updated
0.0.1 143 9/26/2023