Tracing

Tracing is the process of tracing a request through all of your systems and keeping track of where the request went, providing essential visibility into your application's behavior.

Conceptual Idea

When it comes to observability data, we often talk about three pillars:

Logging: a set of lists of events through systems.
Tracing: a set of directed graphs of events through systems.
Metrics: an aggregated value representing events that occurred at a given location in a system.

Tracing represents individual requests flowing through your system. Unlike logs, which are only loosely related to entries before and after them, traces consist of a collection of spans. Each span maintains a reference to (1) the trace to which it belongs and (2) its parent span (unless it is the root span). This structure creates a tree that maps a request's journey through your system.

A trace is essentially a tree of spans. The spans are the nodes in the tree, and they are connected via edges represented by the parent_span_id property of each span.

Analogy

Think of tracing like following a shopper in your mall and recording their journey—where they entered, which shop they visited first, what they interacted with inside, when they exited, which shops they visited next, and eventually when they left the mall.

As they move through the mall, you write notes about what happened and send them in mini-envelopes to the back office. Whenever a shopper passes through a checkpoint, you place a sticker on their back containing a unique ID for this specific journey and an ID for the note you're writing at this particular location. As the shopper continues through the mall, each checkpoint updates the sticker and sends its observations to the back office.

User enters mall: Start writing your letter, assign a trace_id and span_id. Keep the envelope in your pocket. Write the trace_id and span_id on a sticker and place it on the user's back.
- User goes to first shop: Read the trace_id and span_id from the sticker. Start a new letter, reusing the trace_id, assigning parent_span_id = span_id, and creating a new span_id. Write the existing trace_id and new span_id on a fresh sticker, and place it over the existing one. Keep this letter in your pocket.
- User walks out of the first shop: Remove your sticker. On your letter, record how long the visit took and what happened. Put the letter in an envelope and send it to the back office.
- User goes to the second shop: Repeat the process...
User exits mall: Finalize your letter with the duration and details of what you observed at the exit. Send it to the back office.

Essentially, each station does two things:

It records information about what it saw from its own perspective.
It applies a sticker on the user's back so that the next station knows about the tracking IDs for this user.

When the back office receives all these letters, it assembles them by trace_id and then constructs a tree by connecting them through the parent_span_id = span_id relationships.

Comparison to Logging

Tracing and logging are ultimately quite similar but differ in a few specific ways.

Logging often assigns a request_id to logs and propagates it down the request path. This is functionally equivalent to trace_id.
Logging eschews the propagation and maintenance of span_id.
In practice, tracing spans often record “logs” on them, so they are almost like a container for logs.

In terms of cutting through a system, both logging and tracing provide a similar projection of the system. In terms of power, a tracing system can do everything that a logging system can do, but a logging system cannot do everything that a tracing system does. In that sense, logging is a subset of tracing. A badly implemented tracing system where span context is not propagated between callers and callees essentially degrades into a logging system.

This is all true and great in theory, but in practice, logging and tracing differ in quite drastic ways. It all boils down to expectations and usage.

Users of logging systems expect data retention for long-term investigations—spanning weeks, months, and years. Logs typically migrate to cold storage and serve critical security functions, such as detecting persistent threats that may have originated months earlier.

On the other hand, users of tracing systems expect to emit a lot of span data to measure many more call sites than a logger normally would. Expectations include being able to query a live set of tracing data (within minutes, hours, or days) and being able to emit a huge volume of span data. When a user observes a trace, they expect to see fine details about every span in the request flow.

Expectations	Logging	Tracing
Retention	Long term	Short term
Volume	Less	Massive
Query pattern	Scan, window	By `trace_id`, aggregates
Sampling	None	Tolerated
Sub-sampling	None	Tolerated

Retention: how long the data point is kept in complete form in storage.
Volume: instantaneous measurement of the quantity of data ingested (rate).
Query pattern: how the data is typically accessed.
Sampling: whether all data is ingested in the first place.
Sub-sampling: whether ingested data of a certain age gets reshaped before moving to a different storage tier—for example, erasing records and details from ingested spans or recording only high-level summaries for data older than a given period.

While logging and tracing share fundamental similarities in theory, their dramatically different user experience expectations necessitate vastly different implementations in most observability platforms.

Localhost in Humanlog

The localhost experience of Humanlog doesn't distinguish between logging and tracing in terms of retention. However, if your application emits a lot of spans, you can expect them to make up most of your databases' storage footprint. Future work will help manage this footprint. Please reach out via our community channels to let us know if you hit problems with this so we can prioritize the work.

Because Humanlog on localhost is a dev tool, we recommend that you do not sample your spans in development, so that you can have complete fidelity over what your work looks like. In development environments, you're typically generating much less data than in production, and seeing everything is more valuable than the performance trade-offs that sampling addresses at scale.

What's Next?

Need help or want to give feedback? Join our community channels.