Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
COMPHISTOGRAM Statement

Dictionary of Options

The following entries describe the options in detail. All options apply with high resolution graphics output.

ANNOKEY
specifies that annotation requested with the ANNOTATE= option is to be applied only to the key cell. By default, annotation is applied to all of the cells. Use the CLASSKEY= option to specify the key cell.

ANNOTATE=SAS-data-set
ANNO=SAS-data-set
specifies an input data set containing annotate variables as described in SAS/GRAPH Software: Reference. You can use this data set to add features to the comparative histogram. The ANNOTATE= data set you specify in the COMPHISTOGRAM statement is used for all plots created by the statement. You can also specify an ANNOTATE= data set in the PROC CAPABILITY statement to enhance all plots created by the procedure; for more information, see "ANNOTATE= Data Sets" .

BARWIDTH=value
specifies the width of the histogram bars in screen percent units.

C=value-list | MISE
specifies the standardized bandwidth parameter c for kernel density estimates requested with the KERNEL option. You can specify up to five values to display multiple estimates in each cell. You can also specify the keyword MISE to request the bandwidth parameter that minimizes the estimated mean integrated square error (MISE). For example, consider the following statements (for more information, see "Kernel Density Estimates" ):

   proc capability;
      comphist length / class=batch kernel(c = 0.5 1.0 mise);
   run;


The KERNEL option displays three density estimates. The first two have standardized bandwidths of 0.5 and 1.0, respectively. The third has a bandwidth parameter that minimizes the MISE. You can also use the C= and K= options (K= specifies kernel type) to display multiple estimates. For example, consider the following statements:
   proc capability;
      comphist length / class = batch
                        kernel(c = 0.75 k = normal triangular);
   run;
Here two estimates are displayed. The first uses a normal kernel and bandwidth parameter of 0.75, and the second uses a triangular kernel and a bandwidth parameter of 0.75. In general, if more kernel types are specified than bandwidth parameters, the last bandwidth parameter in the list will be repeated for the remaining estimates. Likewise, if more bandwidth parameters are specified than kernel types, the last kernel type will be repeated for the remaining estimates. The default is MISE.

CAXIS=color
CAXES=color
CA=color
specifies the color for the axes, tick marks, and target line. The default is the first color in the device color list.

CBARLINE=color
specifies the color of the outline of the histogram bars. This option overrides the C= option in the SYMBOL1 statement. The default is the first color in the device color list.

CFILL=color
specifies a color used to fill the bars of the histograms (or the areas under a fitted curve if you also specify the FILL option). See the entry for the FILL option for additional details. See Output 3.1.1 and Example 3.2 for examples. Refer to SAS/GRAPH Software: Reference for a list of colors. By default, bars and curve areas are not filled.

CFRAME=color
specifies the color for the area enclosed by the axes and the frame. This area is not filled by default. The CFRAME= option cannot be used with the NOFRAME option, the CTILES= option, or the variable _CTILE_ in a CLASSSPEC= data set.

CFRAMENLEG=color | EMPTY
specifies that the legend requested with the NLEGEND option (or the variable _TILELB_ in a CLASSSPEC= data set) is to be framed and that the frame is to be filled with the color indicated. If you specify CFRAMENLEG=EMPTY, a frame is drawn but not filled with a color.

CFRAMESIDE=color
specifies the color for filling the frame area for the row labels displayed along the left side of a comparative histogram requested with the CLASS= option. This color is also used to fill the frame area for the label of the corresponding CLASS= variable (if a label is associated with the variable.) See Output 3.2.1 for an example. By default, these areas are not filled.

CFRAMETOP=color
specifies the color for filling the frame area for the column labels displayed across the top of a comparative histogram requested with the CLASS= option. This color is also used to fill the frame area for the label of the corresponding CLASS= variable (if a label is associated with the variable.) See Output 3.2.1 for an example. By default, these areas are not filled.

CGRID=color
specifies the color for grid lines requested with the GRID option. The default is the first color in the device color list. If you use CGRID=, you do not need to specify the GRID option.

CHREF=color
specifies the color for lines requested with the HREF=option. The default is the first color in the device color list.

CLASS=variable
CLASS=(variable1 variable2)
specifies that a comparative histogram is to be created using the levels of the variables (also referred to as class-variables or classification variables).

If you specify a single variable, a one-way comparative histogram is created. The observations in the input data set are sorted by the formatted values (levels) of the variable. A separate histogram is created for the process variable values in each level, and these component histograms are arranged in an array to form the comparative histogram. Uniform horizontal and vertical axes are used to facilitate comparisons. For an example, see Figure 3.2.

If you specify two classification variables, a two-way comparative histogram is created. The observations in the input data set are cross-classified according to the values (levels) of these variables. A separate histogram is created for the process variable values in each cell of the cross-classification, and these component histograms are arranged in a matrix to form the comparative histogram. The levels of variable1 are used to label the rows of the matrix, and the levels of variable2 are used to label the columns of the matrix. Uniform horizontal and vertical axes are used to facilitate comparisons. For an example, see Output 3.2.1.

Classification variables can be numeric or character, and the length of a character variable cannot exceed 16. Formatted values are used to determine the levels. You can specify whether missing values are to be treated as a level with the MISSING1 and MISSING2 options.

If a label is associated with a classification variable, the label is displayed on the comparative histogram. The variable label is displayed parallel to the column (or row) labels. For an example, see Figure 3.2.

CLASSKEY='value'
CLASSKEY=('value1' 'value2')
specifies the key cell in a comparative histogram requested with the CLASS= option. The bin size and midpoints are first determined for the key cell, and then the midpoint list is extended to accommodate the data ranges for the remaining cells. Thus, the choice of the key cell determines the uniform horizontal axis used for all cells.

If you specify CLASS=variable, you can specify CLASSKEY='value' to identify the key cell as the level for which variable is equal to value. The value can have up to 16 characters, and you must specify a formatted value. By default, the levels are sorted in the order determined by the ORDER1= option, and the key cell is the level that occurs first in this order. The cells are displayed in this order from top to bottom (or left to right), and, consequently, the key cell is displayed at the top or at the left. If you specify a different key cell with the CLASSKEY= option, this cell is displayed at the top or at the left unless you also specify the NOKEYMOVE option.

If you specify CLASS=(variable1 variable2), you can specify CLASSKEY=('value1' 'value2') to identify the key cell as the level for which variable1 is equal to value1 and variable2 is equal to value2. Here, value1 and value2 must be formatted values, and they must be enclosed in quotes. For an example of the CLASSKEY= option with a two-way comparative histogram, see Output 3.2.1. By default, the levels of variable1 are sorted in the order determined by the ORDER1= option, and within each of these levels, the levels of variable2 are sorted in the order determined by the ORDER2= option. The default key cell is the combination of levels of variable1 and variable2 that occurs first in this order. The cells are displayed in order of variable1 from top to bottom and in order of variable2 from left to right. Consequently, the default key cell is displayed in the upper left corner. If you specify a different key cell with the CLASSKEY= option, this cell is displayed in the upper left corner unless you also specify the NOKEYMOVE option.

CLASSSPEC=SAS-data-set
CLASSSPECS=SAS-data-set
specifies a data set that provides distinct specification limits for each cell, as well as a color, legend, and label for the corresponding tile. The following table lists the variables that are read from a CLASSSPECS= data set:

Variable Name Description
BY variablessubsets the data set
Classification variablesspecifies the structure of the comparative histogram
_VAR_specifies name of process variable (must be character variable of length 8)
_LSL_specifies lower specification limit for tile
_TARGET_specifies target value for tile
_USL_specifies upper specification limit for tile
_CTILE_specifies background color for tiles (must be character variable of length 8)
_TILELG_specifies text displayed in color tile legend at bottom of comparative histogram (character variable of length not greater than 16)
_TILELB_specifies text displayed in corner of each tile (character variable of length not greater than 16)


If you specify a CLASSSPEC= data set, you cannot use the SPEC statement or a SPEC= data set. If you use a BY statement, the CLASSSPEC= data set must contain one observation for each unique combination of process and classification variables within each BY group. See Example 3.1 for an example of a CLASSSPECS= data set.

Also note that

COLOR=color
specifies the color of the normal density curve or the kernel density estimate curve. Enclose the COLOR= option in parentheses after the NORMAL option or the KERNEL option. See Output 3.1.1 for an example.

CPROP=color
specifies the color for a horizontal bar whose length (relative to the width of the tile) indicates the proportion of the total frequency that is represented by the corresponding cell. For an example, see Figure 3.3. Empty bars are displayed if you specify CPROP=EMPTY. By default, bars are not displayed.

CTEXT=color
CT=color
specifies the color for tick mark labels and axis labels. The default is the color specified for the CTEXT= option in the most recent GOPTIONS statement.

CVREF=color
specifies the color for lines requested with the VREF= option. The default is the first color in the device color list.

DESCRIPTION='string'
DES='string'
specifies a description, up to 40 characters, that appears in the PROC GREPLAY master menu. The default is the variable name.

FILL
fills areas under a fitted density curve with colors and patterns. Enclose the FILL option in parentheses after the keyword NORMAL or KERNEL. Depending on the area to be filled (outside or between the specification limits), you can specify the color and pattern with options in the SPEC statement and the COMPHISTOGRAM statement, as summarized in the following table:

Area Under Curve Statement Option
between specificationCOMPHISTCFILL=color
limitsCOMPHISTPFILL=pattern
left of lowerSPECCLEFT=color
specification limitSPECPLEFT=pattern
right of upperSPECCRIGHT=color
specification limitSPECPRIGHT=pattern
If you do not display specification limits, you can use the CFILL= and PFILL= options to specify the color and pattern for the entire area under the curve. Solid fills are used by default if patterns are not specified. You can specify the FILL option with only one fitted curve. For an example, see Output 3.1.1. Refer to SAS/GRAPH Software: Reference for a list of available patterns and colors. If you do not specify the FILL option but you do specify the options in the preceding table, the colors and patterns are applied to the corresponding areas under the histogram.

FONT=font
specifies a software font for text used outside the framed areas of a comparative histogram (labels for axes, tick marks, and so forth). This font takes precedence over the FTEXT= font specified in a GOPTIONS statement. Refer to SAS/GRAPH Software: Reference for a list of fonts.

GRID
adds a grid to the comparative histogram. Grid lines are horizontal lines positioned at major tick marks on the vertical axis.

HEIGHT=value
specifies the height in percent screen units of text for axis labels, tick mark labels, and legends. The HEIGHT= option takes precedence over the HTEXT= option in the GOPTIONS statement.

HOFFSET=value
specifies the offset in percent screen units at both ends of the horizontal axis. Specify HOFFSET=0 to eliminate the default offset.

HREF=value-list
draws reference lines perpendicular to the horizontal axis at the values specified. For an illustration, see Output 4.1.1.

HREFLABELS='label1'...'labeln'
HREFLABEL='label1'...'labeln'
HREFLAB='label1'...'labeln'
specifies labels for the lines requested with the option. The number of labels must equal the number of lines. Enclose the labels in quotes. Labels can be up to 16 characters. For an illustration, see Output 4.1.1.

HREFLABPOS=n
specifies the vertical position of HREFLABELS= labels as follows: 1 positions the labels along the top of the histogram; 2 staggers the labels from top to bottom; 3 positions the labels along the bottom. The default is 1.
INFONT=font
specifies a software font for text used inside the framed areas of the comparative histogram (such as sample size legends). The INFONT= option takes precedence over the FTEXT= option in the GOPTIONS statement. Refer to SAS/GRAPH Software: Reference for a list of fonts.

INHEIGHT=value
specifies the height in percent screen units of text used inside the framed areas of the comparative histogram (such as sample size legends). The default height is the height you specify with the HEIGHT= option. If you do not specify the HEIGHT= option, the default height is the height you specify with the HTEXT= option in the GOPTIONS statement.

INTERTILE=value
specifies the distance in horizontal percent screen units between tiles. For an example, see Figure 3.3. By default, the tiles are contiguous.

K=NORMAL | TRIANGULAR | QUADRATIC
specifies the type of kernel (normal, triangular, or quadratic) used to compute kernel density estimates requested with the KERNEL option. Enclose the K= option in parentheses after the keyword KERNEL. You can specify a single type or a list of types. If you specify more estimates than types, the last kernel type in the list is used for the remaining estimates. By default, a normal kernel is used.

KERNEL<( kernel-options )>
requests a kernel density estimate for each cell of the comparative histogram. You can specify the kernel-options described in the following table:

FILLspecifies that the area under the curve is to be filled
COLOR=specifies the color of the curve
L=specifies the line style for the curve
W=specifies the width of the curve
K=specifies the type of kernel
C=specifies the smoothing parameter
See Output 3.1.1 for an example. By default, the estimate is based on the AMISE method. For more information, see "Kernel Density Estimates" .

L=linetype
specifies the line type for a normal or kernel density estimate curve. Enclose the L= option in parentheses after the NORMAL option or the KERNEL option. If you use the L= option with the KERNEL option, you can specify a single line type or a list of line types. Refer to SAS/GRAPH Software: Reference for a list of available line types. The default is 1, which produces a solid line.

LGRID=n
specifies the line type for the grid requested with the GRID option. If you use the LGRID= option, you do not need to specify the GRID option. The default is 1, which produces a solid line.

LHREF=n
LH=n
specifies the line type for lines requested with the option. The default is 2, which produces a dashed line.

LVREF=n
LV=n
specifies the line type for lines requested with the VREF= option. The default is 2, which produces a dashed line.

MAXNBIN=n
specifies the maximum number of bins to be displayed. This option is useful in situations where the scales or ranges of the data distributions differ greatly from cell to cell. By default, the bin size and midpoints are determined for the key cell, and then the midpoint list is extended to accommodate the data ranges for the remaining cells. However, if the cell scales differ considerably, the resulting number of bins may be so great that each cell histogram is scaled into a narrow region. By limiting the number of bins with the MAXNBIN= option, you can narrow the window about the data distribution in the key cell. Note that the MAXNBIN= option provides an alternative to the MAXSIGMAS= option.

MAXSIGMAS=value
limits the number of bins to be displayed to a range of value standard deviations (of the data in the key cell) above and below the mean of the data in the key cell. This option is useful in situations where the scales or ranges of the data distributions differ greatly from cell to cell. By default, the bin size and midpoints are determined for the key cell, and then the midpoint list is extended to accommodate the data ranges for the remaining cells. If the cell scales differ considerably, however, the resulting number of bins may be so great that each cell histogram is scaled into a narrow region. By limiting the number of bins with the MAXSIGMAS= option, you narrow the window about the data distribution in the key cell. Note that the MAXSIGMAS= option provides an alternative to the MAXNBIN= option.

MIDPOINTS=value-list | KEY | UNIFORM
specifies how midpoints are determined for the bins in the comparative histogram. The method you specify is used for all process variables analyzed with the COMPHISTOGRAM statement.

If you specify MIDPOINTS=value-list, the values must be listed in increasing order and must be evenly spaced. The difference between consecutive midpoints is used as the width of the histogram bars. If the range of the values does not cover the range of the data as well as any specification limits (LSL and USL) that are given, the list is extended in either direction as necessary. See Example 3.1 for an illustration.

If you specify MIDPOINTS=KEY, the procedure first determines the midpoints for the data in the key cell. The initial number of midpoints is based on the number of observations in the key cell using the method of Terrell and Scott (1985). The midpoint list for the key cell is then extended in either direction as necessary until it spans the data in the remaining cells. If you specify MIDPOINTS=UNIFORM, the procedure determines the midpoints using all the observations as if there were no cells. In other words, the number of midpoints is based on the total sample size using the method of Terrell and Scott (1985).

By default, MIDPOINTS=KEY. However, if the key cell contains no observations, the default is MIDPOINTS=UNIFORM.

MISSING1
specifies that missing values of the first CLASS= variable are to be treated as a level of the CLASS= variable. If the first CLASS= variable is a character variable, a missing value is defined as a blank internal (unformatted) value. If the process variable is numeric, a missing value is defined as any of the SAS System missing values. If you do not specify MISSING1, observations for which the first CLASS= variable is missing are excluded from the analysis.

MISSING2
specifies that missing values of the second CLASS= variable are to be treated as a level of the CLASS= variable. If the second CLASS= variable is a character variable, a missing value is defined as a blank internal (unformatted) value. If the process variable is numeric, a missing value is defined as any of the SAS System missing values. If you do not specify MISSING2, observations for which the second CLASS= variable is missing are excluded from the analysis.

MU=value
specifies the parameter \mu for the normal density curves requested with the NORMAL option. Enclose the MU= option in parentheses after the NORMAL option. The default value is the sample mean of the observations in the cell.

NAME='string'
specifies a name for the plot, up to eight characters, that appears in the PROC GREPLAY master menu. The default is 'CAPABILI'.

NCOLS=n
NCOL=n
specifies the number of columns in a comparative histogram. You can use the NCOLS= option with the NROWS= option if you specify two CLASS= variables. See Output 3.2.1 for an example of a two-way comparative histogram using the NCOLS= option. By default, NCOLS=1 (and NROWS=2) if you specify only one CLASS= variable, and NCOLS=2 (and NROWS=2) if you specify two CLASS= variables.

NLEGEND<='label'>
specifies the form of a legend that is displayed inside each tile and indicates the sample size of the cell. The following two forms are available:

See Figure 3.2 for an example. You can use the CFRAMENLEG= option to frame the sample size legend. The variable _TILELB_ in a CLASSSPECS= data set overrides the NLEGEND option. By default, no legend is displayed.

NLEGENDPOS=NW | NE
specifies the position of the legend requested with the NLEGEND option or the variable _TILELB_ in a CLASSSPEC= data set. If NLEGENDPOS=NW, the legend is displayed in the northwest corner of the tile; if NLEGENDPOS=NE, the legend is displayed in the northeast corner of the tile. See Figure 3.2 for an illustration. The default is NE.

NOBARS
suppresses the display of the bars in a comparative histogram.

NOCHART
suppresses the creation of a comparative histogram. This is an alias for NOPLOT.

NOFRAME
suppresses the frame around each tile. The NOFRAME option cannot be specified with the CFRAME= option.

NOHLABEL
suppresses the label for the horizontal axis. This is useful for avoiding clutter.

NOKEYMOVE
suppresses the rearrangement of cells that occurs by default when you use the CLASSKEY= option to specify the key cell. For details, see the entry for the CLASSKEY= option.

NOPLOT
suppresses the creation of a comparative histogram. This option is useful when you are using the COMPHISTOGRAM statement solely to create an output data set.

NORMAL<(normal-options)>
displays a normal density curve for each cell of the comparative histogram. The equation of the normal density curve is
p(x) =
\frac{h x 100\% }
 {\sigma\sqrt{2\pi}}
 \exp (-\frac{1}2
 (\frac{x - \mu}{\sigma})^2)
 & {for -\infty \lt x \lt \infty}

where 
 
		 \mu = mean
		 \sigma = standard deviation (\sigma \gt) 
		 h = width of histogram interval
If you specify values for \mu and \sigma with the MU= and SIGMA= normal-options, the same curve is displayed for each cell. By default, a distinct curve is displayed for each cell based on the sample mean and standard deviation for that cell. For example, the following statements display a distinct curve for each level of the variable SUPPLIER:

   proc capability noprint;
      comphist width / class=supplier normal(color=red l=2);
   run;


The curves are drawn in red with a line style of 2 (a dashed line). See Figure 3.3 for another illustration. Table 3.1 lists options that can be specified in parentheses after the NORMAL option.

NOVLABEL
suppresses the label for the vertical axis.

NOVTICK
suppresses the tick marks and tick mark labels for the vertical axis. If you specify the NOVTICK option, the NOVLABEL option is assumed.

NROWS=n
NROW=n
specifies the number of rows in a comparative histogram. You can use the NROWS= option with the NCOLS= option if you specify two CLASS= variables. See Figure 3.2 for a one-way comparative histogram using the NROWS= option, and see Output 3.2.1 for a two-way comparative histogram using the NROWS= and NCOLS= options. The default is 2.

ORDER1=INTERNAL | FORMATTED | DATA | FREQ
specifies the display order for the values of the first CLASS= variable.

The levels of the first CLASS= variable are always constructed using the formatted values of the variable, and the formatted values are always used to label the rows (columns) of a comparative histogram. You can use the ORDER1= option to determine the order of the rows (columns) corresponding to these values, as follows:



By default, ORDER1=INTERNAL.

ORDER2=INTERNAL | FORMATTED | DATA | FREQ
specifies the display order for the values of the second CLASS= variable.

The levels of the second CLASS= variable are always constructed using the formatted values of the variable, and the formatted values are always used to label the columns of a two-way comparative histogram. You can use the ORDER2= option to determine the order of the columns.

The layout of a two-way comparative histogram is determined by using the ORDER1= option to obtain the order of the rows from top to bottom (recall that ORDER1=INTERNAL by default). Then the ORDER2= option is applied to the observations corresponding to the first row to obtain the order of the columns from left to right. If any columns remain unordered (that is, the categories are unbalanced), the ORDER2= option is applied to the observations in the second row, and so on, until all the columns have been ordered.

The values of the ORDER2= option are interpreted as described for the ORDER1= option. By default, ORDER2=INTERNAL.

OUTHISTOGRAM=SAS-data-set
creates a SAS data set that saves the midpoints of the histogram intervals, the observed percent of observations in each interval, and (optionally) the percent of observations in each interval estimated from a fitted normal distribution.

PFILL=pattern
specifies a pattern used to fill the bars of the histograms (or the areas under a fitted curve if you also specify the FILL option). See the entries for the CFILL= and FILL options for additional details. Refer to SAS/GRAPH Software: Reference for a list of pattern values. By default, the bars and curve areas are not filled.

RTINCLUDE
includes the right endpoint of each histogram interval in that interval. The left endpoint is included by default.

SIGMA=value
specifies the parameter \sigma for normal density curves requested with the NORMAL option. Enclose the SIGMA= option in parentheses after the NORMAL option. The default value is the sample standard deviation of the observations in the cell.

TILELEGLABEL='label'
specifies a label displayed to the left of the legend that is created when you provide _CTILE_ and _TILELG_ variables in a CLASSSPEC= data set. The label can be up to 16 characters and must be enclosed in quotes. The default label is Tiles:.

TURNVLABEL
TURNVLABELS
specifies that the characters in the labels for the vertical axis are to be turned and strung out vertically. This happens by default when a hardware font is used.

VAXIS=value-list
specifies tick mark values for the vertical axis. The values must be equally spaced and in increasing order, and the first value must be zero. You must scale the values in the same units as the bars (see the VSCALE= option), and the last value must be greater than or equal to the height of the largest bar. See Output 3.2.1 for an example.

VAXISLABEL='label'
specifies a label (up to 40 characters) for the vertical axis.

VOFFSET=value
specifies the offset in percent screen units at the upper end of the vertical axis.

VREF=value-list
draws reference lines perpendicular to the vertical axis at the values specified. For an illustration, see Output 2.2.1.

VREFLABELS='label1'...'labeln'
VREFLABEL='label1'...'labeln'
VREFLAB='label1'...'labeln'
specifies labels for the lines requested with the VREF= option. The number of labels must equal the number of lines. Enclose the labels in quotes. Labels can be up to 16 characters. For an illustration, see Output 2.2.1.

VREFLABPOS=n
specifies the horizontal position of VREFLABELS= labels as follows: VREFLABPOS=1 positions the labels at the left of the tile, and VREFLABPOS=2 positions the labels at the right. The default is 1.

VSCALE=PERCENT | COUNT | PROPORTION
specifies the scale of the vertical axis. The value COUNT scales the data in units of the number of observations per data unit. The value PERCENT scales the data in units of percent of observations per data unit. The value PROPORTION scales the data in units of proportion of observations per data unit. The default is PERCENT.

W=n
specifies the width in pixels of the curve. Enclose the W= option in parentheses after the NORMAL option or the KERNEL option. The default is 1.

WAXIS=n
specifies the line thickness (in pixels) for the axes and frame. The default is 1.

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.