Skip to content

Proposal: Telemetry for sending a message #4500

@compulim

Description

@compulim

Please vote on this feature if you want it to be implemented.

Is your feature request related to a problem? Please describe.

Although Web Chat do not collect any telemetry data, we did not emit data points that are helpful in some telemetry scenario.

We would like to see the followings in the telemetry log:

  • Reliability of send message
    • Failure to collect telemetry data, such as network error, is an understandable excuse of not able to track in reliability
  • Time taken to send a message successfully

Describe the suggestion or request in detail

Emit telemetry events:

  • "sending" when the activity appears in the transcript
  • "send successful" when the activity status turned into "Just now."
  • "send failure" when the activity status turned into "Send failed. Retry."

Since "send failure" is not a terminal state, in the telemetry log, we could see:

  • "sending" -> "send failure" -> "send successful"
  • "sending" -> "send failure" -> (increased styleOptions.sendTimeout) -> "sending" -> "send failure"

Because we need to emit "sending" for measuring reliability, we should not use useTrackTiming to measure "time taken to send a message". More information in the "Additional context" section.

Possibly a <SendStatusTelemetryProvider> component and it subscribes to useActivities and useSendStatusByActivityKey.

<SendStatusTelemetryProvider> would keep track of what activities updated and perform work on every updated activity:

  • If the activity is from end-user:
    • If the send status reported by useSendStatusByActivityKey changed, call the handler of useTrackEvent

Describe alternatives you have considered

No response

Additional context

Azure Application Insights SDK

The following is the API of Azure Application Insights, copied from https://github.com/microsoft/ApplicationInsights-JS#sending-telemetry-to-the-azure-portal.

appInsights.trackEvent({name: 'some event'});
appInsights.trackPageView({name: 'some page'});
appInsights.trackPageViewPerformance({name : 'some page', url: 'some url'});
appInsights.trackException({exception: new Error('some error')});
appInsights.trackTrace({message: 'some trace'});
appInsights.trackMetric({name: 'some metric', average: 42});
appInsights.trackDependencyData({absoluteUrl: 'some url', responseCode: 200, method: 'GET', id: 'some id'});
appInsights.startTrackPage("pageName");
appInsights.stopTrackPage("pageName", null, {customProp1: "some value"});
appInsights.startTrackEvent("event");
appInsights.stopTrackEvent("event", null, {customProp1: "some value"});
appInsights.flush();

Timing.js is the class that App Insights use for measuring time. In their AnalyticsPlugin.ts, they are turning startTrackEvent/stopTrackEvent pairs into trackEvent:

properties.duration = duration.toString();
_self.trackEvent({ name, properties, measurements } as IEventTelemetry);

We could use trackEvent instead. Or in our world, useTrackEvent.

Google Analytics SDK (gtag.js)

Copied from https://developers.google.com/analytics/devguides/collection/gtagjs/user-timings.

gtag('event', 'timing_complete', {
  'name' : 'load',
  'value' : 3549,
  'event_category' : 'JS Dependencies'
});
Parameter name Data type Required Description
name string Yes A string to identify the variable being recorded (e.g. 'load').
value integer Yes The number of milliseconds in elapsed time to report to Google Analytics (e.g. 20).
event_category string No A string for categorizing all user timing variables into logical groups (e.g. 'JS Dependencies').
event_label string No A string that can be used to add flexibility in visualizing user timings in the reports (e.g. 'Google CDN').

Only one call is needed for timing (timing_complete) and this should be emitted when timing ends. In Google Analytics, there are no concepts of "start event".

Conclusion

Both Application Insights and Google Analytics do not emit 2 events for timing:

  • Application Insights: emit trackEvent
  • Google Analytics: emit timing_complete

To track the reliability of a potentially long network call, we should not use the native implementation of telemetry provider. This is because they will not emit the start event if the browser had closed prematurely.

It seems their native implementation is only good for measuring performance (e.g. how long does it takes to compress an image), rather than for measuring reliability-related performance (e.g. how long it takes to send a message).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions