Awesome
Archival
This repo was archived by the Apollo Security team on 2023-05-26
Apollo Tracing
[2022-02-16] Notice: This tracing format was designed to provide tracing data from graphs to the Apollo Engine
engineproxy
, a project which was retired in 2018. We learned that a trace format which describes resolvers with a flat list of paths (with no way to aggregate similar nodes or repeated path prefixes) was inefficient enough to have real impacts on server performance, and so we have not been actively developing consumers or producers of this format for several years. Apollo Server (as of v3) no longer ships with support for producing this format, andengineproxy
which consumed it is no longer supported. We suggest that people looking for formats for describing performance traces consider either the Apollo Studio protobuf-based trace format or a more generic format such as OpenTelemetry.
Apollo Tracing is a GraphQL extension for performance tracing.
Thanks to the community, Apollo Tracing already works with most popular GraphQL server libraries, including Node, Ruby, Scala, Java, Elixir, Go and .NET, and it enables you to easily get resolver-level performance information as part of a GraphQL response.
Apollo Tracing works by including data in the extensions field of the GraphQL response, which is reserved by the GraphQL spec for extra information that a server wants to return. That way, you have access to performance traces alongside the data returned by your query.
It’s already supported by Apollo Engine, and we’re excited to see what other kinds of integrations people can build on top of this format.
We think this format is broadly useful, and we’d love to work with you to add support for it to your tools of choice. If you’re looking for a first idea, we especially think it would be great to see support for Apollo Tracing in GraphiQL and the Apollo Client developer tools!
If you’re interested in working on support for other GraphQL servers, or integrations with more tools, please get in touch on the #apollo-tracing
channel on the Apollo Slack.
Supported GraphQL Servers
Response Format
The GraphQL specification allows servers to include additional information as part of the response under an extensions
key:
The response map may also contain an entry with key
extensions
. This entry, if set, must have a map as its value. This entry is reserved for implementors to extend the protocol however they see fit, and hence there are no additional restrictions on its contents.
Apollo Tracing exposes trace data for an individual request under a tracing
key in extensions
:
{
"data": <>,
"errors": <>,
"extensions": {
"tracing": {
"version": 1,
"startTime": <>,
"endTime": <>,
"duration": <>,
"parsing": {
"startOffset": <>,
"duration": <>,
},
"validation": {
"startOffset": <>,
"duration": <>,
},
"execution": {
"resolvers": [
{
"path": [<>, ...],
"parentType": <>,
"fieldName": <>,
"returnType": <>,
"startOffset": <>,
"duration": <>,
},
...
]
}
}
}
}
Collected data
- The
startTime
andendTime
of the request are timestamps in RFC 3339 format with at least millisecond but up to nanosecond precision (depending on platform support).
Some more details (adapted from the description of the JSON encoding of Protobuf's Timestamp type):
A timestamp is encoded as a string in the RFC 3339 format. That is, the format is "{year}-{month}-{day}T{hour}:{min}:{sec}[.{frac_sec}]Z" where {year} is always expressed using four digits while {month}, {day}, {hour}, {min}, and {sec} are zero-padded to two digits each. The fractional seconds, which can go up to 9 digits (i.e. up to 1 nanosecond resolution), are optional. The "Z" suffix indicates the timezone ("UTC"); the timezone is required, though only UTC (as indicated by "Z") is presently supported. For example, "2017-01-15T01:30:15.01Z" encodes 15.01 seconds past 01:30 UTC on January 15, 2017. In JavaScript, one can convert a Date object to this format using the standard
toISOString()
method. In Python, a standarddatetime.datetime
object can be converted to this format usingstrftime
with the time format spec '%Y-%m-%dT%H:%M:%S.%fZ'. Likewise, in Java, one can use the Joda Time'sISODateTimeFormat.dateTime()
to obtain a formatter capable of generating timestamps in this format.
- Resolver timings should be collected in nanoseconds using a monotonic clock like
process.hrtime()
in Node.js orSystem.nanoTime()
in Java.
The limited precision of numbers in JavaScript is not an issue for our purposes, because
Number.MAX_SAFE_INTEGER
nanoseconds is about 104 days, which should be plenty even for long running requests!
-
The server should keep the start time of the request both as wall time, and as monotonic time to calculate
startOffset
s andduration
s (for the request as a whole and for individual resolver calls, see below). -
The
duration
of a request is in nanoseconds, relative to the request start, as an integer. -
The
startOffset
of parsing, validation, or a resolver call is in nanoseconds, relative to the request start, as an integer. -
The
duration
of parsing, validation, or a resolver call is in nanoseconds, relative to the resolver call start, as an integer.
The end of a resolver call represents the return of a value for a field, but it does not include resolving subfields. If an asynchronous value such as a promise is returned from a resolver however, the resolver call isn't considered to have ended until the asynchronous value has been resolved.
- The
path
is the response path of the current resolver in a format similar to the error result format specified in the GraphQL specification:
This field should be a list of path segments starting at the root of the response and ending with the field associated with the error. Path segments that represent fields should be strings, and path segments that represent list indices should be 0‐indexed integers. If the error happens in an aliased field, the path to the error should use the aliased name, since it represents a path in the response, not in the query.
parentType
,fieldName
andreturnType
are strings that reflect the runtime type information usually passed to resolvers (e.g. in theinfo
argument forgraphql-js
).
Example
query {
hero {
name
friends {
name
}
}
}
{
"data": {
"hero": {
"name": "R2-D2",
"friends": [
{
"name": "Luke Skywalker"
},
{
"name": "Han Solo"
},
{
"name": "Leia Organa"
}
]
}
},
"extensions": {
"tracing": {
"version": 1,
"startTime": "2017-07-28T14:20:32.106Z",
"endTime": "2017-07-28T14:20:32.109Z",
"duration": 2694443,
"parsing": {
"startOffset": 34953,
"duration": 351736,
},
"validation": {
"startOffset": 412349,
"duration": 670107,
},
"execution": {
"resolvers": [
{
"path": [
"hero"
],
"parentType": "Query",
"fieldName": "hero",
"returnType": "Character",
"startOffset": 1172456,
"duration": 215657
},
{
"path": [
"hero",
"name"
],
"parentType": "Droid",
"fieldName": "name",
"returnType": "String!",
"startOffset": 1903307,
"duration": 73098
},
{
"path": [
"hero",
"friends"
],
"parentType": "Droid",
"fieldName": "friends",
"returnType": "[Character]",
"startOffset": 1992644,
"duration": 522178
},
{
"path": [
"hero",
"friends",
0,
"name"
],
"parentType": "Human",
"fieldName": "name",
"returnType": "String!",
"startOffset": 2445097,
"duration": 18902
},
{
"path": [
"hero",
"friends",
1,
"name"
],
"parentType": "Human",
"fieldName": "name",
"returnType": "String!",
"startOffset": 2488750,
"duration": 2141
},
{
"path": [
"hero",
"friends",
2,
"name"
],
"parentType": "Human",
"fieldName": "name",
"returnType": "String!",
"startOffset": 2501461,
"duration": 1657
}
]
}
}
}
}
Compression
We recommend that people enable compression in their GraphQL server, because the tracing format adds to the response size, but compresses well.
Although we tried other approaches to make the tracing format more compact (including deduplication of keys, common items, and structure) this complicated generating and interpreting trace data, and didn't bring the size down as much as compressing the entire HTTP response body does.
In our tests on Node.js, the processing overhead of compression is less than the overhead of sending additional bytes for an uncompressed response. But more test results from different server environments are definitely welcome, so we can help people make an informed decision about this.