Temporalio.Extensions.OpenTelemetry 0.1.0-beta2

Prefix Reserved
This is a prerelease version of Temporalio.Extensions.OpenTelemetry.
There is a newer version of this package available.
See the version list below for details.
dotnet add package Temporalio.Extensions.OpenTelemetry --version 0.1.0-beta2                
NuGet\Install-Package Temporalio.Extensions.OpenTelemetry -Version 0.1.0-beta2                
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="Temporalio.Extensions.OpenTelemetry" Version="0.1.0-beta2" />                
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add Temporalio.Extensions.OpenTelemetry --version 0.1.0-beta2                
#r "nuget: Temporalio.Extensions.OpenTelemetry, 0.1.0-beta2"                
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
// Install Temporalio.Extensions.OpenTelemetry as a Cake Addin
#addin nuget:?package=Temporalio.Extensions.OpenTelemetry&version=0.1.0-beta2&prerelease

// Install Temporalio.Extensions.OpenTelemetry as a Cake Tool
#tool nuget:?package=Temporalio.Extensions.OpenTelemetry&version=0.1.0-beta2&prerelease                

OpenTelemetry Support

This extension adds OpenTelemetry tracing support to the Temporal .NET SDK.

Temporal .NET SDK OpenTelemetry support consists of an interceptor that creates spans (i.e. .NET diagnostic activities) to trace execution. A special approach must be taken for workflows since they can be interrupted and resumed elsewhere while OpenTelemetry spans cannot.

Although generic .NET System.Diagnostic activities are used, this extension is OpenTelemetry specific due to the propagation capabilities in use for serializing spans across processes/SDKs. For users needing generic tracing or other similar features, this code can be used as a guide for how to write an interceptor.

⚠️ UNDER ACTIVE DEVELOPMENT

This SDK is under active development and has not released a stable version yet. APIs may change in incompatible ways until the SDK is marked stable.

Quick Start

Add the Temporalio.Extensions.OpenTelemetry package from NuGet. For example, using the dotnet CLI:

dotnet add package Temporalio.Extensions.OpenTelemetry --prerelease

In addition to configuring the OpenTelemetry tracer provider with the proper sources, the Temporalio.Extensions.OpenTelemetry.TracingInterceptor class must be set as the interceptor when creating a client. For example, this sets up tracing to the console:

using OpenTelemetry;
using OpenTelemetry.Resources;
using OpenTelemetry.Trace;
using Temporalio.Client;
using Temporalio.Extensions.OpenTelemetry;
using Temporalio.Worker;

// Setup the tracer provider
using var tracerProvider = Sdk.CreateTracerProviderBuilder().
    AddSource(
        TracingInterceptor.ClientSource.Name,
        TracingInterceptor.WorkflowsSource.Name,
        TracingInterceptor.ActivitiesSource.Name).
    AddConsoleExporter().
    Build();

// Create a client to localhost on "default" namespace with the tracing
// interceptor
var client = await TemporalClient.ConnectAsync(new("localhost:7233")
{
    Interceptors = new[] { new TracingInterceptor() },
});

// Now client and worker calls are traced...

How it Works

The tracing interceptor uses OpenTelemetry context propagation to serialize the currently active diagnostic activity as a Temporal header that is then built back into the activity context for use by downstream code.

Outbound calls like starting a workflow from a client or starting an activity from a workflow will create a diagnostic activity and serialize it to the outbound Temporal header. Inbound calls like executing an activity or handling a signal will deserialize the header as a context and use it for new diagnostic activities.

This means, notwithstanding workflow tracing caveats discussed later, if a client starts a workflow that executes an activity that itself uses a Temporal client to start a workflow, the spans will be parented properly across all processes. This works even across Temporal SDK languages.

Client and Activity Tracing

When a client starts a workflow, sends a signal, or issues a query, a new diagnostic activity is created for the life of the call. The diagnostic activity context is serialized to the Temporal header which is then received by the worker on the workflow side and set as the current parent context for other diagnostic activities.

When a worker receives a Temporal activity execution, it starts a diagnostic activity for the entirety of the attempt, recording an exception on failure.

Since these use .NET diagnostic activities in traditional ways, they can be combined with normal diagnostic activity use. Therefore is it very normal to create a diagnostic activity that surrounds the client call or create a diagnostic activity inside of the Temporal activity.

Workflow Tracing

Both the .NET diagnostic API and OpenTelemetry require activities/spans to be started and stopped in process. This works well for normal imperative code that is not distributed. But Temporal workflows are distributed functions that can be interrupted and resumed elsewhere. .NET diagnostic activities cannot be resumed elsewhere so there is a feature gap that must be taken into account.

OpenTelemetry supports propagating spans (i.e. .NET diagnostic activities) across process boundaries which Temporal uses in all SDKs to resume traces across workers (and even across languages). However .NET does not support implicitly attaching the extracted span context, so this feature gap must be taken into account.

Also, the way workflow code resumption is performed in Temporal is to deterministically replay already-executed steps. Therefore, if those steps have external effects, Temporal must skip those during replay. Temporal does this with Workflow.Logger by not logging during replay. However with .NET diagnostic activities, one cannot rehydrate an existing diagnostic activity to not be recorded part of the time but still use it for proper parenting of new diagnostic activities. Activities can only be created to start/stop, they cannot be created from existing span/trace IDs deterministically. So a solution must be provided for this as well.

To recap, here are the traditional problems with using simple .NET/OpenTelemetry distributed tracing for workflows:

  • .NET does not support resuming/recreating diagnostic activities or otherwise preemptible code
  • .NET does not support an implicit parent diagnostic activity context, only implicit parent diagnostic activities
  • .NET does not support deterministic span/trace IDs

Solution

In order to have an implicit parent context without creating a diagnostic activity, and have it be somewhat resumable, a wrapper for diagnostic activity context + diagnostic activity called WorkflowActivity has been created. This can be just a context or the actual diagnostic activity. To create a diagnostic workflow activity, the ActivitySource extension TrackWorkflowDiagnosticActivity can be used. This will use the underlying diagnostic activity source, but will immediately start and stop the .NET diagnostic activity it wraps. This is because a diagnostic activity cannot represent the time taken for the code within it due to the fact that it may complete in another process which .NET does not allow. So the diagnostic activity's time range in workflows is immediate and does not matter. However, parentage is still mostly supported.

How Workflow Tracing Works

On workflow inbound calls for executing a workflow, handling a signal, handling a query, or handling an update, a new diagnostic activity is created only when not replaying (query is never considered "replaying" in this case) but a new diagnostic workflow activity wrapper is always created with the context from the Temporal header (i.e. the diagnostic activity created on client workflow start). Although the diagnostic activity is started-then-stopped immediately, it becomes the parent for diagnostic activities in that same-worker-process cached workflow instance. Workflows are removed from cache (and therefore replayed from beginning on next run) when they throw an exception or are forced out of the cache for LRU reasons.

Diagnostic activities created for signals, queries, and updates are parented to the client-outbound diagnostic activity that started the workflow, and only linked to the client-outbound diagnostic activity that invoked the signal/query/update.

If a workflow/update fails or if a workflow task fails (i.e. workflow suspension due to workflow/signal/update exception that was not a Temporal exception), a new diagnostic activity is created representing that failure. The diagnostic activity representing the run of the workflow is not only already completed (as they all are) but it may not even be created because this could be replaying on a different worker.

Outbound calls from a workflow for scheduling an activity, scheduling a local activity, starting a child, signalling a child, or signalling an external workflow will create a diagnostic activity only when not replaying. This is then serialized into the header of the outbound call to be deserialized on the worker that accepts it.

Overall, this means that with no cache eviction (i.e. runs to completion on the same process it started without non-Temporal exception), diagnostic activities will be properly ordered/parented. However, when a replay is needed, the diagnostic activities may not be parented the same. But they will all be parented to the same outer diagnostic activity that was created on client outbound. Also, in cases of task failure (or worker crash or similar), a diagnostic activity may be duplicated since it's not "replaying" when Temporal is continually trying to proceed with new code past task failure.

⚠️WARNING Do not use .NET diagnostic activity API inside of workflows. They are inherently non-deterministic and can lead to unpredictable behavior/traces during replay (which often only surfaces in failure scenarios).

Creating Diagnostic Activities in Workflows

Users can create their own diagnostic workflow activities via the ActivitySource extension TrackWorkflowDiagnosticActivity. This is IDisposable like a normal System.Diagnostic.Activity, but unlike the diagnostic activity, the diagnostic workflow activity may not result in a real diagnostic activity during replay. Also, it is started/stopped immediately. It is simply placed as async-local until disposed so it can implicitly become the parent of any others.

Workflow Tracing Example

For example, take the following code:

using Temporalio.Activities;
using Temporalio.Extensions.OpenTelemetry;
using Temporalio.Workflows;

[Workflow]
public class MyWorkflow
{
    public static readonly ActivitySource CustomSource = new("MyCustomSource");

    [WorkflowRun]
    public async Task RunAsync()
    {
        await Workflow.ExecuteActivityAsync(
            (MyActivities act) => act.DoThing1Async(),
            new() { StartToCloseTimeout = TimeSpan.FromSeconds(10) });
        using (CustomSource.TrackWorkflowDiagnosticActivity("MyCustomActivity"))
        {
            await Workflow.ExecuteActivityAsync(
                (MyActivities act) => act.DoThing2Async(),
                new() { StartToCloseTimeout = TimeSpan.FromSeconds(10) });
        }
    }
}

So running this workflow might have the following diagnostic activities in this hierarchy:

  • StartWorkflow:MyWorkflow
    • RunWorkflow:MyWorkflow
      • StartActivity:DoThing1
        • RunActivity:DoThing1
      • MyCustomActivity
        • StartActivity:DoThing2
          • RunActivity:DoThing2
      • CompleteWorkflow: MyWorkflow

But if, say, the worker crashed after starting the first activity, it might look like:

  • StartWorkflow:MyWorkflow
    • RunWorkflow:MyWorkflow
      • StartActivity:DoThing1
        • RunActivity:DoThing1
    • MyCustomActivity
      • StartActivity:DoThing2
        • RunActivity:DoThing2
    • CompleteWorkflow: MyWorkflow

Notice how some diagnostic activities are now not under the RunWorkflow:MyWorkflow. This is because the workflow resumes on a different process but, due to .NET and OpenTelemetry limitations, a diagnostic activity cannot be resumed on a different process.

Product Compatible and additional computed target framework versions.
.NET net5.0 was computed.  net5.0-windows was computed.  net6.0 was computed.  net6.0-android was computed.  net6.0-ios was computed.  net6.0-maccatalyst was computed.  net6.0-macos was computed.  net6.0-tvos was computed.  net6.0-windows was computed.  net7.0 was computed.  net7.0-android was computed.  net7.0-ios was computed.  net7.0-maccatalyst was computed.  net7.0-macos was computed.  net7.0-tvos was computed.  net7.0-windows was computed.  net8.0 was computed.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed. 
.NET Core netcoreapp2.0 was computed.  netcoreapp2.1 was computed.  netcoreapp2.2 was computed.  netcoreapp3.0 was computed.  netcoreapp3.1 was computed. 
.NET Standard netstandard2.0 is compatible.  netstandard2.1 was computed. 
.NET Framework net461 was computed.  net462 is compatible.  net463 was computed.  net47 was computed.  net471 was computed.  net472 was computed.  net48 was computed.  net481 was computed. 
MonoAndroid monoandroid was computed. 
MonoMac monomac was computed. 
MonoTouch monotouch was computed. 
Tizen tizen40 was computed.  tizen60 was computed. 
Xamarin.iOS xamarinios was computed. 
Xamarin.Mac xamarinmac was computed. 
Xamarin.TVOS xamarintvos was computed. 
Xamarin.WatchOS xamarinwatchos was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages (1)

Showing the top 1 NuGet packages that depend on Temporalio.Extensions.OpenTelemetry:

Package Downloads
InfinityFlow.Temporal.Migrator

Package Description

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last updated
1.4.0 371 12/19/2024
1.3.1 25,382 9/11/2024
1.3.0 8,845 8/14/2024
1.2.0 16,258 6/27/2024
1.1.2 4,017 6/5/2024
1.1.1 6,701 5/10/2024
1.1.0 1,402 5/7/2024
1.0.0 26,997 12/5/2023
0.1.0-beta2 1,979 10/30/2023
0.1.0-beta1 4,760 7/24/2023
0.1.0-alpha6 231 7/17/2023