Chapter Contents

Previous

Next
The UNIVARIATE Procedure

PROC UNIVARIATE Statement


PROC UNIVARIATE <option(s)>;

To do this Use this option
Specify the input data set DATA=
Specify the input data set that contains annotate variables ANNOTATE=
Specify the SAS catalog to save high-resolution graphics output GOUT=
Control the statistical analysis

Request all statistics and tables that the FREQ, MODES, NEXTRVAL=, PLOT, and CIBASIC options generate ALL

Specify the confidence level for the confidence limits ALPHA=

Request confidence limits for the mean, standard deviation, and variance based on normally distributed data CIBASIC

Request confidence limits for quantiles using a distribution-free method CIPCTLDF

Request confidence limits for quantiles based on normally distributed data CIPCTLNORMAL

Exclude observations with nonpositive weights from the analysis EXCLNPWGT

Specify the value of the mean or location parameter MU0=

Specify the number of extreme observations displayed NEXTROBS=

Specify the number of extreme values displayed NEXTRVAL=

Request tests for normality NORMAL

Specify the mathematical definition used to compute quantiles PCTLDEF=

Compute robust estimates of scale ROBUSTSCALE

Specify the units to round the analysis variable prior to computing statistics ROUND=

Compute trimmed means TRIMMED=

Specify the variance divisor VARDEF=

Compute Winsorized means WINSORIZED=
Control the displayed output

Request a frequency table FREQ

Request a table that shows number of observations greater than, equal to, and less than MU0= LOCCOUNT

Request a table of all possible modes MODES

Suppress side-by-side plots NOBYPLOT

Suppress tables of descriptive statistics NOPRINT

Create low-resolution stem-and-leaf, box, and normal probability plots PLOTS

Specify the approximate number of rows the plots use PLOTSIZE=


Options

ALL
requests all statistics and tables that the FREQ, MODES, NEXTRVAL=5, PLOT, and CIBASIC options generate. If the analysis variables are not weighted, this option also requests the statistics and tables that the CIPCTLDF, CIPCTLNORMAL, LOCCOUNT, NORMAL, ROBUSTCALE, TRIMMED=.25, and WINSORIZED=.25 options generate. PROC UNIVARIATE also uses any values that you specify for ALPHA=, MU0=, NEXTRVAL=, CIBASIC, CIPCTLDF, CIPCTLNORMAL, TRIMMED=, or WINSORIZED= to produce the output.

ALPHA=value
specifies the default confidence level to compute confidence limits. The percentage for the confidence limits is (1-value) × 100. For example, ALPHA=.05 results in a 95 percent confidence limit.
Default: .05
Range: between 0 and 1
Main discussion: Confidence Limits for Parameters of the Normal Distribution
Featured in: Performing a Sign Test Using Paired Data and Examining the Data Distribution and Saving Percentiles

ANNOTATE=SAS-data-set
specifies an input data set that contains annotate variables as described in SAS/GRAPH Software: Reference. You can use this data set to add features to your high-resolution graphics. PROC UNIVARIATE adds the features in this data set to every high-resolution graph that is produced in the PROC step.
Alias: ANNO=
Interaction: PROC UNIVARIATE does not use the ANNOTATE= data set unless you create a high-resolution graph with the HISTOGRAM, PROBPLOT, or QQPLOT statement.
Tip: Use the ANNOTATE= option in the HISTOGRAM, PROBPLOT, or QQPLOT statement if you want to add a feature to a specific graphics display.

CIBASIC<(<TYPE=keyword> <ALPHA=value>)>
requests confidence limits for the mean, standard deviation, and variance based on the assumption that the data are normally distributed. For large sample sizes, this assumption is not required for the mean because of the Central Limit Theorem.

TYPE=keyword
specifies the type of confidence limit, where keyword is LOWER, UPPER, or TWOSIDED.
Default: TWOSIDED

ALPHA=value
specifies the confidence level to compute the confidence limit. The percentage for the confidence limits is (1-value) × 100. For example, ALPHA=.05 results in a 95 percent confidence limit.
Default: The value of ALPHA= in the PROC statement
Range: between 0 and 1

Requirement: You must use the default value of VARDEF=, which is DF.
Main discussion: Confidence Limits for Parameters of the Normal Distribution
Featured in: Performing a Sign Test Using Paired Data and Examining the Data Distribution and Saving Percentiles

CIPCTLDF<(<TYPE=keyword> <ALPHA=value>)>
requests confidence limits for quantiles by using a method that is distribution-free. In other words, no specific parametric distribution such as the normal is assumed for the data. PROC UNIVARIATE uses order statistics (ranks) to compute the confidence limits as described by Hahn and Meeker (1991).

TYPE=keyword
specifies the type of confidence limit, where keyword is LOWER, UPPER, SYMMETRIC, or ASYMMETRIC.
Default: SYMMETRIC

ALPHA=value
specifies the confidence level to compute the confidence limit. The percentage for the confidence limits is (1-value) × 100. For example, ALPHA=.05 results in a 95 percent confidence limit.
Default: The value of ALPHA= in the PROC statement
Range: between 0 and 1

Alias: CIQUANTDF
Restriction: This option is not available if you specify a WEIGHT statement.
Main discussion: Confidence Limits for Quantiles
Featured in: Performing a Sign Test Using Paired Data

CIPCTLNORMAL <(<TYPE=keyword> <ALPHA=value>)>
requests confidence limits for quantiles based on the assumption that the data are normally distributed.

TYPE=keyword
specifies the type of confidence limit, where keyword is LOWER, UPPER, or TWOSIDED.
Default: TWOSIDED

ALPHA=value
specifies the confidence level to compute the confidence limit. The percentage for the confidence limits is (1-value) × 100. For example, ALPHA=.05 results in a 95 percent confidence limit.
Default: The value of ALPHA= in the PROC statement
Range: between 0 and 1

Alias: CIQUANTNORMAL
Requirement: You must use the default value of VARDEF=, which is DF.
Restriction: This option is not available if you specify a WEIGHT statement.
Main discussion: Confidence Limits for Quantiles
Featured in: Examining the Data Distribution and Saving Percentiles

DATA=SAS-data-set
specifies the input SAS data set.
Main discussion: Input Data Sets

EXCLNPWGT
excludes observations with nonpositive weight values (zero or negative) from the analysis. By default, PROC UNIVARIATE treats observations with negative weights like those with zero weights and counts them in the total number of observations.
Requirement: You must use a WEIGHT statement.
See also: WEIGHT Statement

FREQ
requests a frequency table that consists of the variable values, frequencies, cell percentages, and cumulative percentages.
Interaction: If you specify the WEIGHT statement, PROC UNIVARIATE includes the weighted count in the table and uses this value to compute the percentages.
Featured in: Rounding an Analysis Variable and Identifying Extreme Values

GOUT=graphics-catalog
specifies the SAS catalog that PROC UNIVARIATE uses to save the high-resolution graphics output.
Tip: If you omit the libref, PROC UNIVARIATE looks for the catalog in the temporary library called WORK and creates the catalog if it does not exist.
See also: For information on storing graphics output in SAS catalogs, see SAS/GRAPH Software: Reference

LOCCOUNT
requests a table that shows the number of observations greater than, equal to, and less than the value of MU0=. PROC UNIVARIATE uses these values to construct the sign test and the signed rank test.
Restriction: This option is not available if you specify a WEIGHT statement.
See also: MU0=
Featured in: Performing a Sign Test Using Paired Data

MODES
requests a table of all possible modes. By default, when the data contain multiple modes, PROC UNIVARIATE displays the lowest mode in the table of basic statistical measures. When all the values are unique, PROC UNIVARIATE does not produce a table of modes.
Alias: MODE
Main discussion: Calculating the Mode
Featured in: Performing a Sign Test Using Paired Data

MU0=value(s)
specifies the value of the mean or location parameter ( [IMAGE]) in the null hypothesis for tests of location. If you specify one value, PROC UNIVARIATE tests the same null hypothesis for all analysis variables. If you specify multiple values, a VAR statement is required, and PROC UNIVARIATE tests a different null hypothesis for each analysis variable in the corresponding order.
Alias: LOCATION=
Default: 0
Main discussion: Tests for Location
Example: The following statement tests if the mean of the first variable equals 0 and the mean of the second variable equals 0.5.
proc univariate mu0=0 0.5;
Featured in: Examining the Data Distribution and Saving Percentiles

NEXTROBS=n
specifies the number of extreme observations that PROC UNIVARIATE lists in the table of extreme observations. The table lists the n lowest observations and the n highest observations.
Default: 5
Range: an integer between 0 and the half the maximum number of observations
Tip: Use NEXTROBS=0 to suppress the table of extreme observations.
Featured in: Rounding an Analysis Variable and Identifying Extreme Values and Creating Schematic Plots and an Output Data Set with BY Groups

NEXTRVAL=n
specifies the number of extreme values that PROC UNIVARIATE lists in the table of extreme values. The table lists the n lowest unique values and the n highest unique values.
Default: 0
Range: an integer between 0 and half the maximum number of observations
Featured in: Rounding an Analysis Variable and Identifying Extreme Values

NOBYPLOT
suppresses side-by-side box plots when you use the BY statement and the ALL option or the PLOT option in the PROC statement.

NOPRINT
suppresses all the tables of descriptive statistics that the PROC UNIVARIATE statement creates. NOPRINT does not suppress the tables that the HISTOGRAM statement creates.
Tip: Use NOPRINT when you want to create an OUT= output data set only.
Featured in: Creating an Output Data Set with Multiple Analysis Variables and Fitting Density Curves

NORMAL
requests tests for normality that include the Shapiro-Wilk test and a series of goodness-of-fit tests based on the empirical distribution function.
Alias: NORMALTEST
Restriction: This option is not available if you specify a WEIGHT statement.
Main discussion: Goodness-of-Fit Tests
Featured in: Examining the Data Distribution and Saving Percentiles

PCTLDEF=value
specifies the definition that PROC UNIVARIATE uses to calculate quantiles.
Alias: DEF=
Default: 5
Range: 1, 2, 3, 4, 5
Restriction: You cannot use PCTLDEF= when you compute weighted quantiles.
Main discussion: Percentile and Related Statistics

PLOTS
produces a stem-and-leaf plot (or a horizontal bar chart), a box plot, and a normal probability plot. If you use a BY statement, side-by-side box plots that are labeled Schematic Plots appear after the univariate analysis for the last BY group.
Alias: PLOT
Main discussion: Generating Line Printer Plots
Featured in: Examining the Data Distribution and Saving Percentiles and Creating Schematic Plots and an Output Data Set with BY Groups

PLOTSIZE=n
specifies the approximate number of rows that the plots use. If n is larger than the value of the SAS system option PAGESIZE=, PROC UNIVARIATE uses the value of PAGESIZE=. If n is less than eight, PROC UNIVARIATE uses eight rows to draw the plots.
Default: the value of PAGESIZE=
Range: 8 to the value of PAGESIZE=
Featured in: Examining the Data Distribution and Saving Percentiles and Creating Schematic Plots and an Output Data Set with BY Groups

ROBUSTSCALE
produces a table with robust estimates of scale. The statistics include the interquartile range, Gini's mean difference, the median absolute deviation about the median (MAD), and two statistics proposed by Rousseeuw and Croux (1993), [IMAGE], and [IMAGE].
Restriction: This option is not available if you specify a WEIGHT statement.
Main discussion: Robust Measures of Scale
Featured in: Computing Robust Estimators

ROUND=unit(s)
specifies the units to use to round the analysis variables prior to computing statistics. If you specify one unit, PROC UNIVARIATE uses this unit to round all analysis variables. If you specify multiple units, a VAR statement is required, and each unit rounds the values of the corresponding analysis variable. If ROUND=0, no rounding occurs.
Default: 0
Tip: ROUND= reduces the number of unique variable values, thereby reducing the memory requirements.
Range: [ge] 0
Main discussion: Rounding
Example: To make 1 the rounding unit for the first analysis variable and 0.5 the rounding unit for second analysis variable, submit the statement
proc univariate round=1 0.5;
Featured in: Rounding an Analysis Variable and Identifying Extreme Values

TRIMMED=value(s) <(<TYPE=keyword> <ALPHA=value>)>
requests a table of trimmed means, where value specifies the number or the proportion of observations that PROC UNIVARIATE trims. If value is a proportion p between 0 and .5, the number of observations that PROC UNIVARIATE trims is the smallest integer that is greater than or equal to np, where n is the number of observations.

TYPE=keyword
specifies the type of confidence limit for the mean, where keyword is LOWER, UPPER, or TWOSIDED.
Default: TWOSIDED

ALPHA=value
specifies the confidence level to compute the confidence limit. The percentage for the confidence limits is (1-value) × 100. For example, ALPHA=.05 results in a 95 percent confidence limit.
Default: The value of ALPHA= in the PROC statement
Range: between 0 and 1

Alias: TRIM=
Range: between 0 and half the number of nonmissing observations. When a proportion is specified, value must be less than .5.
Requirement: To compute confidence limits for the mean and the Student's t test, you must use the default value of VARDEF=, which is DF.
Restriction: This option is not available if you specify a WEIGHT statement.
Main discussion Trimmed Means
Featured in: Computing Robust Estimators

VARDEF=divisor
specifies the divisor to use in the calculation of variances and standard deviation. Possible Values for VARDEF= shows the possible values for divisor and associated divisors.

Possible Values for VARDEF=
Value Divisor Formula for Divisor
DF degrees of freedom n - 1
N number of observations n
WDF sum of weights minus one ([Sigma]iwi) - 1
WEIGHT|WGT sum of weights [Sigma]iwi

The procedure computes the variance as [IMAGE], where [IMAGE] is the corrected sums of squares and equals [IMAGE]. When you weight the analysis variables, [IMAGE] equals [IMAGE], where [IMAGE] is the weighted mean.
Default: DF
Requirement: To compute the standard error of the mean, confidence limits, and Student's t test, use the default value of VARDEF=.
Tip: When you use the WEIGHT statement and VARDEF=DF, the variance is an estimate of [IMAGE], where the variance of the ith observation is [IMAGE] and [IMAGE] is the weight for the ith observation. This yields an estimate of the variance of an observation with unit weight.
Tip: When you use the WEIGHT statement and VARDEF=WGT, the computed variance is asymptotically (for large n) an estimate of [IMAGE], where [IMAGE] is the average weight. This yields an asymptotic estimate of the variance of an observation with average weight.
See also: Keywords and Formulas and WEIGHT Statement

WINSORIZED=value(s) <(<TYPE=keyword> <ALPHA=value>)>
requests of a table of Winsorized means, where value is the number or the proportion of observations that PROC UNIVARIATE uses to compute the Winsorized mean. If value is a proportion p between 0 and .5, the number of observations that PROC UNIVARIATE uses is equal to the smallest integer that is greater than or equal to np, where n is the number of observations.

TYPE=keyword
specifies the type of confidence limit for the mean, where keyword is LOWER, UPPER, or TWOSIDED.
Default: TWOSIDED

ALPHA=value
specifies the confidence level to compute the confidence limit. The percentage for the confidence limits is (1-value) × 100. For example, ALPHA=.05 results in a 95 percent confidence limit.
Default: The value of ALPHA= in the PROC statement
Range: between 0 and 1

Alias: WINSOR=
Range: between 0 and half the number of nonmissing observations. When a proportion is specified, value must be less than .5.
Requirement: To compute confidence limits and the Student's t test, you must use the default value of VARDEF=, which is DF.
Restriction: This option is not available if you specify a WEIGHT statement.
Main discussion Winsorized Means
Featured in: Computing Robust Estimators


Chapter Contents

Previous

Next

Top of Page

Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.