From the Exploring Data website - http://curriculum.qed.qld.gov.au/kla/eda/
© Education Queensland, 1997

The Density Trace

A density trace is an alternative display to a histogram which is particularly effective with data that is bi-modal or multi-modal, as a histogram often gives a poor representation of the shape of data with such a distribution.

This is part of the help file from NCSS Jr. It nicely explains the purpose of this display and how it is constructed. Don’t try to make one without a statistics program!

The NCSS Jr Help File

The histogram is widely used and needs little explanation. However, it does have its drawbacks. The number and width of the intervals are a subjective decision, yet they can have an impact on the look of the histogram. Slightly different boundary values can give dramatically different looking histograms. (You can experiment with NCSS to see the impact of changing the number of bins on the look of the histogram.)

Another problem with the histogram is that the rectangles make it appear that the data are spread uniformly throughout the interval. But this is often not the case. Also, the "skyscraper" look of the histogram doesn’t resemble the rather smooth nature of the data’s distribution.

These complaints against the histogram have brought many new innovations. One of the newest and most popular display techniques for showing the distribution of data is the density trace.

Density refers to the relative frequency (concentration) of data points along the data range. Mathematically, the density at a value x is defined as the fraction of data values per unit of measurement that lie in an interval centered at x. Once you pick a suitable interval width, you can calculate the density at any (and every) x value. If you calculate the density at, say, 50 values and connect them, you’ll have a density trace.

In this program, the interval width is specified as a percentage. As you increase the percentage, you increase the amount of data included in each density calculation. This increases the smoothness of the chart. The following four density traces were made of the same data at increasing percentage smoothness. Note how much more appealing these charts are than the histogram.

Density Trace at 5% Data Range

Density Trace at 20% Data Range

Density Trace at 40% Data Range

Density Trace at 60% Data Range

As the interval width is increased, data points further and further from the center value are included. In order to decrease the weight of points that are far removed from the center value, we use a weighting scheme that weights points proportionally to their distance from the center value. The weight function used is half the cosine function with its peak at the center value. It decreases symmetrically to zero, after which a weight of zero is applied. Hence, points have a smaller and smaller impact on the density trace as they are further and further from the center.

Another way to think of the density trace is to imagine that you construct 1000 histograms of the same data using slightly different boundary positions and take the average rectangle height at each of 50 values along the data range. This would give you a smoothed histogram that has many of the same properties of the density trace. Hence, the density trace should be thought of as a smoothed histogram in which interval width and number of bins do not come into play.

The OldFaith Dataset

The duration and interruption times of the Old Faithful geyser is the classic example of a dataset in which the shape of the histogram is inflenced by the width of the bins. The displays below show three different histograms constructed from the duration data. As these examples show, the density trace is not affected by either the width or the number of bins.