Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The SURVEYREG Procedure

Example 62.6: Stratum Collapse

In a stratified sample, it is possible that some strata will have only one sampling unit. When this happens, PROC SURVEYREG collapses these strata that contain single sampling unit into a pooled stratum. For more detailed information on stratum collapse, see the section "Stratum Collapse".

Suppose that you have the following data.

   data Sample; 
      input Stratum X Y; 
      datalines;
   10 0 0
   10 1 1
   11 1 1
   11 1 2
   12 3 3
   33 4 4
   14 6 7
   12 3 4
   ;

The variable Stratum is the stratification variable, the variable X is the independent variable, and the variable Y is the dependent variable. You want to regress Y on X. In the data set Sample, both Stratum=33 and Stratum=14 contain one observation. By default, PROC SURVEYREG collapses these strata into one pooled stratum in the regression analysis.

To input the finite population correction information, you create the SAS data set StratumTotal.

   data StratumTotal; 
      input Stratum _TOTAL_;
      datalines;
   10 10
   11 20
   12 32
   33 40
   33 45
   14 50
   15  .
   66 70
   ;

The variable Stratum is the stratification variable, and the variable _TOTAL_ contains the stratum totals. The data set StratumTotal contains more strata than the data set Sample. Also in the data set StratumTotal, more than one observation contains the stratum totals for Stratum=33.

   33 40
   33 45
PROC SURVEYREG allows this type of input. The procedure simply ignores the strata that are not present in the data set Sample; for the multiple entries of a stratum, the procedure uses the first observation. In this example, Stratum=33 has the stratum total _TOTAL_=40.

The following SAS statements perform the regression analysis.

   title1 'Stratified Sample with Single Sampling Unit in Strata';
   title2 'With Stratum Collapse';
   proc SURVEYREG data=Sample total=StratumTotal;
      strata Stratum/list;
      model Y=X;
   run;

Output 62.6.1: Summary of Data and Regression
 
Stratified Sample with Single Sampling Unit in Strata
With Stratum Collapse

The SURVEYREG Procedure
Regression Analysis for Dependent Variable Y

Data Summary
Number of Observations 8
Mean of Y 2.75000
Sum of Y 22.00000
 
Design Summary
Number of Strata 5
Number of Strata Collapsed 2
 
Fit Statistics
R-square 0.9555
Root MSE 0.5129
Denominator DF 4

Output 62.6.1 displays that there are a total of 5 strata in the input data set, and 2 strata are collapsed into a pooled stratum. The denominator degrees of freedom is 4, due to the collapse (see the section "Denominator Degrees of Freedom").

Output 62.6.2: Stratification Information
 
Stratified Sample with Single Sampling Unit in Strata
With Stratum Collapse

The SURVEYREG Procedure
Regression Analysis for Dependent Variable Y

Stratum Information
Stratum
Index
Collapsed Stratum N Obs Population Total Sampling
Rate
1   10 2 10 0.20
2   11 2 20 0.10
3   12 2 32 0.06
4 Yes 14 1 50 0.02
5 Yes 33 1 40 0.03
0 Pooled   2 90 0.02

NOTE: Strata with only one observation are collapsed into the stratum with Stratum Index "0".


Output 62.6.2 displays the stratification information, including stratum collapse. Under the column Collapsed, the fourth (Stratum Index=4) stratum and the fifth (Stratum Index=5) stratum are marked as "Yes," which indicates that these two strata are collapsed into the pooled stratum (Stratum Index=0). The sampling rate for the pooled stratum is 2%, which combined from the 4th stratum and the 5th stratum (see the section "Sampling Rate of the Pooled Stratum from Collapse").

Output 62.6.3: Parameter Estimates and Effect Tests
 
Stratified Sample with Single Sampling Unit in Strata
With Stratum Collapse

The SURVEYREG Procedure
Regression Analysis for Dependent Variable Y

Tests of Model Effects
Effect Num DF F Value Pr > F
Model 1 155.62 0.0002
Intercept 1 0.24 0.6503
X 1 155.62 0.0002

NOTE: The denominator degrees of freedom for the F tests is 4.

 

Estimated Regression Coefficients
Parameter Estimate Standard Error t Value Pr > |t|
Intercept 0.13004484 0.26578532 0.49 0.6503
X 1.10313901 0.08842825 12.47 0.0002

NOTE: The denominator degrees of freedom for the t tests is 4.


Output 62.6.3 displays the parameter estimates and the tests of the significance of the model effects.

Alternatively, if you prefer not to collapse the strata that have single sampling unit, you can specify the NOCOLLAPSE option in the STRATA statement.

   title1 'Stratified Sample with Single Sampling Unit in Strata';
   title2 'Without Stratum Collapse';
   proc SURVEYREG data=Sample total=StratumTotal;
      strata Stratum/list nocollapse;
   model Y = X;
   run;

Output 62.6.4: Summary of Data and Regression
 
Stratified Sample with Single Sampling Unit in Strata
Without Stratum Collapse

The SURVEYREG Procedure
Regression Analysis for Dependent Variable Y

Data Summary
Number of Observations 8
Mean of Y 2.75000
Sum of Y 22.00000
 
Design Summary
Number of Strata 5
 
Fit Statistics
R-square 0.9555
Root MSE 0.5129
Denominator DF 3

Output 62.6.4 does not contain stratum collapse information as compared to Output 62.6.1. The denominator degrees of freedom is 3 instead of 4 as in Output 62.6.1.

Output 62.6.5: Stratification Information
 
Stratified Sample with Single Sampling Unit in Strata
Without Stratum Collapse

The SURVEYREG Procedure
Regression Analysis for Dependent Variable Y

Stratum Information
Stratum
Index
Stratum N Obs Population Total Sampling
Rate
1 10 2 10 0.20
2 11 2 20 0.10
3 12 2 32 0.06
4 14 1 50 0.02
5 33 1 40 0.03

In Output 62.6.5, although the fourth stratum and the fifth stratum contain only one observation, no stratum collapse occurs as in Output 62.6.2.

Output 62.6.6: Parameter Estimates and Effect Tests
 
Stratified Sample with Single Sampling Unit in Strata
Without Stratum Collapse

The SURVEYREG Procedure
Regression Analysis for Dependent Variable Y

Tests of Model Effects
Effect Num DF F Value Pr > F
Model 1 391.94 0.0003
Intercept 1 0.25 0.6508
X 1 391.94 0.0003

NOTE: The denominator degrees of freedom for the F tests is 3.

 

Estimated Regression Coefficients
Parameter Estimate Standard Error t Value Pr > |t|
Intercept 0.13004484 0.25957741 0.50 0.6508
X 1.10313901 0.05572135 19.80 0.0003

NOTE: The denominator degrees of freedom for the t tests is 3.


As a result of not collapsing strata, the standard error estimates of the parameters are different from those in Output 62.6.3, the tests of the significance of model effects are different as well.

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.