Chapter Contents

Previous

Next
Statements with the Same Function in Multiple Procedures

WEIGHT


Specifies weights for analysis variables in the statistical calculations.

Tip: You can use a WEIGHT statement and a FREQ statement in the same step of any procedure that supports both statements.


WEIGHT variable;


Required Arguments

variable
specifies a numeric variable whose values weight the values of the analysis variables. The values of the variable do not have to be integers. The behavior of the procedure when it encounters a nonpositive weight variable value is as follows:

Weight value ... The procedure ...
0 counts the observation in the total number of observations
less than 0 converts the weight value to zero and counts the observation in the total number of observations
missing excludes the observation from the analysis

Different behavior for nonpositive values is discussed in the WEIGHT statement syntax under the individual procedure.

Prior to Version 7 of the SAS System, no base procedure excluded the observations with missing weights from the analysis. Most SAS/STAT procedures, such as PROC GLM, have always excluded not only missing weights but also negative and zero weights from the analysis. You can achieve this same behavior in a base procedure that support the WEIGHT statement by using EXCLNPWGT in the PROC statement.

The procedure substitutes the value of the WEIGHT variable for [IMAGE], which appears in Keywords and Formulas .


Procedures That Support the WEIGHT Statement

Note:   In PROC FREQ, the value of the variable in the WEIGHT statement represents the frequency of occurrence for each observation. See WEIGHT Statement for more information.  [cautionend]


Calculating Weighted Statistics
The procedures that support the WEIGHT statement also support the VARDEF= option, which lets you specify a divisor to use in the calculation of the variance and standard deviation.

By using a WEIGHT statement to compute moments, you assume that the ith observation has a variance that is equal to [IMAGE]. When you specify VARDEF=DF (the default), the computed variance is a weighted least squares estimate of [IMAGE]. Similarly, the computed standard deviation is an estimate of [sigma]. Note that the computed variance is not an estimate of the variance of the ith observation, because this variance involves the observation's weight which varies from observation to observation.

If the values of your variable are counts that represent the number of occurrences of each observation, use this variable in the FREQ statement rather than in the WEIGHT statement. In this case, because the values are counts, they should be integers. (The FREQ statement truncates any noninteger values.) The variance that is computed with a FREQ variable is an estimate of the common variance, [IMAGE], of the observations.

Note:   If your data come from a stratified sample where the weights [IMAGE] represent the strata weights, neither the WEIGHT statement nor the FREQ statement provides appropriate stratified estimates of the mean, variance, or variance of the mean. To perform the appropriate analysis, consider using PROC SURVEYMEANS which is a SAS/STAT procedure that is documented in the SAS/STAT User's Guide.  [cautionend]


Example
As an example of the WEIGHT statement, suppose 20 people are asked to estimate the size of a 12-inch-wide object. Each person is placed at a different distance from the object. As the distance from the object increases, the estimates should become less precise.

The SAS data set SIZE contains the estimate (ObjectSize) at each distance (Distance), and the precision (Precision) for each estimate. Notice that the largest deviation (an overestimate by 8 inches) came at the largest distance (25 feet). As a measure of precision, 1/Distance gives more weight to estimates that were made closer to the object and less weight to estimates that were made at greater distances.

The following statements create the data set SIZE:

options nodate pageno=1 linesize=64 pagesize=60;

data size;
   input Distance ObjectSize @@;
   Precision=1/distance;
   datalines;
 5 12   5  8   5 12   5 10
10 17  10 13  10 10  10 12
15 10  15 14  15 19  15 13
20 17  20 14  20  9  20 19
25 12  25 10  25 20  25 15
;
The following PROC MEANS step computes the average estimate of the object size while ignoring the weights. Without a WEIGHT variable, PROC MEANS uses the default weight of 1 for every observation. Thus, the estimates of object size at all distances are given equal weight. The average estimate of the object size is overestimated by 1.3 inches.
proc means data=size maxdec=3 n mean var stddev;
   var objectsize;
   title1 'Unweighted Analysis of the SIZE Data Set';
run;
[HTML Output]  [Listing Output]The next two PROC MEANS steps use the precision measure (Precision) in the WEIGHT statement and show the effect of using different values of the VARDEF= option. The first PROC step creates an output data set that contains the variance and standard deviation. By down weighting the estimates made at greater distances, the weighted average estimate of the object size is closer to the actual size.
proc means data=size maxdec=3 n mean var stddev;
   weight precision;
   var objectsize;
   output out=wtstats var=Est_SigmaSq std=Est_Sigma;
   title1 'Weighted Analysis Using Default VARDEF=DF';
run;

proc means data=size maxdec=3 n mean var std
                     vardef=weight;
   weight precision;
   var objectsize;
   title1 'Weighted Analysis Using VARDEF=WEIGHT';
run;
In the first PROC MEANS step, the variance is an estimate of [IMAGE], where the variance of the ith observation is assumed to be [IMAGE] and [IMAGE] is the weight for the ith observation. In the second PROC MEANS step, the computed variance is an estimate of [IMAGE], where [IMAGE] is the average weight. For large n, this is an approximate estimate of the variance of an observation with average weight. [HTML Output]  [Listing Output]The following statements create and print a data set with the weighted variance and weighted standard deviation of each observation. The DATA step combines the output data set that contains the variance and the standard deviation from the weighted analysis with the original data set. The variance of each observation is computed by dividing Est_SigmaSq, the estimate of [IMAGE] from the weighted analysis when VARDEF=DF, by each observation's weight (Precision). The standard deviation of each observation is computed by dividing Est_Sigma, the estimate of [IMAGE] from the weighted analysis when VARDEF=DF, by the square root of each observation's weight (Precision).
data wtsize(drop=_freq_ _type_);
   set size;
   if _n_=1 then set wtstats;
   Est_VarObs=est_sigmasq/precision;
   Est_StdObs=est_sigma/sqrt(precision);

proc print data=wtsize noobs;
   title 'Weighted Statistics';
   by distance;
   format est_varobs est_stdobs
          est_sigmasq est_sigma precision 6.3;
run;
[HTML Output]  [Listing Output]


Chapter Contents

Previous

Next

Top of Page

Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.