Presenting Your Data

Definitions

Bar Chart: a chart with horizontal bars that show frequencies or percentages for the categories. It is a graphical equivalent of a summary table.

Chartjunk: a chart that has unnecessary details. It can be misleading and/or difficult to read.

Classes: intervals for the numerical data.

Column Chart: a chart with vertical columns that show frequencies or percentages for the categories. It is a graphical equivalent of a summary table.

Contingency Table: a table/matrix that displays the frequencies of the observations with same values for two variables. Also called a two-way table.

Cumulative Frequency Distribution: for each value, shows the number of observations that are equal to or below that particular value. It is a table equivalent of an ogive.

Cumulative Percentage Distribution: for each value, shows the percentage of observations that are equal to or below that particular value. It is a table equivalent of an ogive.

Frequency Distribution: for each value, shows the number of observations that are equal to that particular value. It is a table equivalent of a histogram.

Histogram: a chart with bars that show frequencies or percentages for the classes. The area of a bar is proportional to the frequency or percentage.

Line Plot: a line with categories marked on it, and with stacked Xs above each category mark, where each X is for one observation that falls into that category. It is a primitive analog of a column chart, and a graphical equivalent of a summary table.

Ogive: the curve of a cumulative distribution function. It is a graphical equivalent of a cumulative distribution table.

Ordered Array: the values sorted in order, from the smallest to the largest.

Pareto Chart: a combination of a bar chart and an ogive. The categories must be sorted from the largest frequency to the lowest frequency.

Percentage Distribution: for each value, shows the percentage (of the total number of observations, for all values) of observations that are equal to that particular value. It is a table equivalent of a histogram.

Percentage Polygon: the percentage frequency of each class is represented with a dot, and the dots are connected with a line. This is similar to a histogram.

Pie Chart: a circle divided into sectors that each represent a proportion of the whole.

Scatter Plot: a graph with the variables on the axes where each observation is plotted as a dot with the coordinates equal to the values of the two variables for that observation.

Stem-and-Leaf Display: an alternative to a histogram. Each value is split into a stem and a leaf, a list of the stems is written in a column, and beside each stem the leaves are recorded in a row.

Summary Table: list of the categories along with their counts or percentages. It is a table equivalent of a bar/column chart.

Time-Series Plot: displays observations on the y-axis against equally spaced time intervals on the x-axis.

These are the illustrations I have used in class:

Stem-and Leaf Display at Minato Mirai train station in Yokohama, Japan.

Notes

The term "summary table" is used for all kind of summaries. We will reserve it, in this class, for a simple summary of the counts/frequencies or percentages.

A contingency table is most often useful when we summarize the relationship between two variables each taking on two or three values.

An ordered array is limited in use:

A frequency distribution is for numerical (not for categorial) data.

The line plot conveys same information as the column chart.

The bar chart and the column chart convey exactly same information.

The pie chart is used to show the parts of a whole.

The Pareto chart is a combination of a bar chart and an ogive. The categories must be sorted from the largest frequency to the lowest frequency.

The main advantage of the stem-and-leaf display is that it preserves the original data while showing the distribution.

A histogram is a graphical counterpart of a frequency distribution.

A percentage polygon is (usually) also a graphical counterpart of a frequency distribution.

An ogive is a graphical counterpart of a cumulative frequency distribution.

An scatter plot shows the relationship between two variables.

A time-series plot shows how a variable behaves over time.

Avoid chartjunk. It is annoying, unprofessional, and misleading.

Read These

Chapter 2. Organizing and Visualizing Variables in the textbook:

2.1 Organizing Categorical Variables (pp. 38-40)

2.2 Organizing Numerical Variables (pp. 42-48)

2.3 Visualizing Categorical Variables (pp. 51-55)

2.4 Visualizing Numerical Variables (pp. 57-62)

2.5 Visualizing Two Numerical Variables (pp. 65-67)

2.7 Challenges in Organizing and Visualizing Variables (pp. 70-74)

Watch This

Figure 0030.040. A Beginner's Guide to Graphing Data. Notice the terminology. Paul Anderson calls a time-series plot in his illustration a "line graph." He may be using this term to refer to a whole family of graphs, all using a line to present the data, of which a time-series plot is one.

Answer These

The video clip in Figure 0030.040. A Beginner's Guide to Graphing Data, tells you how to graph. Use it to answer the following 2 questions:

(from 0030.040) Explain what is bad about the pie chart Paul Anderson gives as an example.

(from 0030.040) The vertical axis does not start at 0 in Atmospheric Carbon Dioxide. Try to present same graph with the vertical axis starting at 0. Does it appear to tell a different story about the changes in the atmospheric carbon dioxide? If you were trying to present the data and downplay the changes, would you use the graph with the vertical axis starting at 0 or one that is shown in the clip? If you were trying to present the data and emphasize the changes, would you use the graph with the vertical axis starting at 0 or one that is shown in the clip?

Do problem 2.14 (p. 49 in the textbook).

Do problem 2.27 (p. 56 in the textbook). You do not need the files it mentions; just use the tables given in the question.

Do problem 2.48 (p. 67 in the textbook).

This one is a little more difficult: What is the number of the people under 21 in the work force (the histogram below)? Explain.

the histogram for the work force participation by age group