Drawing a Distribution Curve in Excel

I have long asserted that Excel is one of the most powerful tools ever created. It’s also one of the most underutilized - many don’t even realize all the things it can do - as well as one of the most mis-utilized, at least compared to what was intended to be used for.

People use Excel for project plans, taking notes, making lists, as an address book, and a new use that I just saw today - keeping a list of motivational quotes (you know who you are, Steve!). Part of the problem is that there is not an app like OmniOutliner on the PC - one of the cool surprises that I had when getting my Mac.

One of the things you would expect to be able to do with Excel is some statistical analysis. Excel does come with some stats functions that are pretty useful and go way beyond my knowledge of math, but there’s one basic thing it doesn’t do - at least not without some coaxing: plot a frequency distribution curve of the values of a set of data.

For example, I recently wanted to analyze a set of data that represented the times, in minutes, that a certain process took to complete. I had about 40 measurements of this process, and I wanted to see how they were grouped together. I wanted to draw a graph of the instances of the various times to get a visual representation of how the times were distributed across the different values. I wanted to have something that looked like this:
example of a Poisson distribution

After playing around in Excel a bit I couldn’t find a way to make one of these graphs. After a bit of searching, I found a host of Windows apps that can draw nice frequency distribution graphs - some of which appear useful, most of which are expensive. I worked with Excel for a while and came up with a way to effectively draw a frequency distribution graph.

You do this by making use of the frequency function in Excel and then drawing a graphs with two series of the same data.

First, you decide which frequency groups you want to plot. My data fell in the range of 1 to over 1500 minutes, with a lot of the data towards lower numbers. I decided to group my data in blocks of 25 minutes. This means that I want to see how many of the processes I measured took between 0 and 25 minutes, how many took between 25 and 50 minutes, etc. I set a maximum of 450 for my analysis, so the final group was how many of the processes took 450 minutes or more.

I then setup by data groupings as numbers in successive cells of a column in a spreadsheet:
frequency distribution - column A

Next, I used the frequency function to have Excel count the number of occurrences of my data values that fall within each group. When you use the function across a range of cells like this, you have to enter it as an array function. Array functions in Excel are very powerful, but somewhat difficult to understand at first. Essentially they let you do operations on a group of cells at one time, rather than just one cell at a time.

The formula I will use is the following, where the DataSeries!B2:B42 is the group of cells that contain my actual data, and the A7:A25 is the range of cells that contains the frequency groupings I just talked about above:

=FREQUENCY(DataSeries!B2:B42,A7:A25)

To enter this, you first highlight all the cells in the output range - B6 to B25 in my case above. Note that I highlight one cell past the last number in my grouping range so that Excel will put the number of instances greater than the last number in the grouping range.

After you highlight all the cells in the output range, you then enter the frequency formula above in the data entry form at the top of the Excel window:
entering the formula

Now, rather than pressing <ENTER> after typing in the formula, press <CTRL><SHIFT><ENTER>. This enters the formula as an array formula and tells Excel to apply the formula to all the highlighted cells. After doing this, I had the counts of data that I wanted:
frequency counts

You can now quite easily create a graph with this data that almost is what we’re after. I chose to use an area graph like this:
frequency distribution graph 1

To make this look a little bit nicer and less choppy, I added another series to this graph using a line graph and smoothing the line. First, I simply created a copy of the data values in column B, e.g. cell C7 would be


=B7

Now you have a copy of the frequency data in the worksheet:
data with a series copy
In the graph, add a data series (right-click on the graph, choose Source Data, then click on the Series tab, and click add) consisting of the values in the copied column (column C in my example). Then make this data series a line graph (right-click on the new data series on the graph, then select graph type, and choose the line graph.
choose line graph
The line graph now forms a line along the top of the area graph. But, we want to make it smoothed. So, double-click on the line graph and select the Smoothed line option like this:
format data series
Now, we have a nicer looking frequency distribution graph:
Frequency Distribution Graph
For a nice touch, you can paste the graph as a picture in somewhere like PowerPoint and then draw a vertical line to represent the median.
Frequency Distribution Graph with median line
There you go - hopefully you’ll be able to use this to visually represent your data better in Excel.

9 Responses to “Drawing a Distribution Curve in Excel”


  1. 1yoh

    this is helpful however which chart type do you choose to make the orignal graph

  2. Hi,

    Excellent Tutorial but can you please show us the snapshot of Excel where you select the graph type for the first (initial) graph you create. You mention : “I chose to use an area graph like this:”

    I am not able to create that first graph itself !

  3. 3Eric

    Very useful and clear, thanks a lot!
    Took me 6 min to do it!

  4. 4Richard

    This was a great step-by-step tutorial!

    Thanks,

    Richard

  5. 5Dan

    Just what I was looking for… worked great, and saved me a ton of time! thanks!

  6. 6Briony

    Thanks, it’s a really clear tutorial. Unfortunately its
    not quite working for me… When I hit a little box opens up saying ‘Cannot find ‘DataSeries’. Copy from:’ and then it’s trying to help me open microsoft excel documents, even though I’ve given it the data series - in my case, =FREQUENCY(DataSeries!W7:W20,V29:V38)

    I’m using excel on a mac. Any ideas?

  7. 7Briony

    Ah-ha.

    Ok, I’ve figured it out. What works for me, on a mac (in the UK, if that makes any difference), is the formula:

    =FREQUENCY(AC6:AC24,AA28:AA37)

    or, in your original example,

    =FREQUENCY(B2:B42,A7:A25)

    then

    B

  8. 8Briony

    Somehow a part of my last comment is missing:

    then apple shift enter

  9. Hey Dude, really thanks for this wonderful publish, I was working on this and fortunately found this article online and did same as you explained and I got the good graph…thanks a lot dude…..

Leave a Reply