Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The SURVEYREG Procedure

Stratified Sampling

Suppose that the previous student sample is actually drawn from a stratified sampling. The strata are grades in the junior high school: the 7th grade, the 8th grade, and the 9th grade. Within strata, simple random samples are selected. Table 62.1 provides the number of students in each grade.

Table 62.1: Students in Grades
Grade Number of Students
71,824
81,025
31,151
Total4,000

In order to analyze this sample using PROC SURVEYREG, you need to input the stratification information by creating a SAS data set for Table 62.1. The following SAS statements create a data set called StudentTotal.

   data StudentTotal;
      input Grade _TOTAL_; 
      datalines;
   7 1824
   8 1025
   9 1151
   ;

The variable Grade is the stratification variable, and the variable _TOTAL_ contains the total numbers of students in the strata in the survey population. PROC SURVEYREG requires you to use the keyword _TOTAL_ as the name of the variable that contains the population total information.

The following statements demonstrate how you can fit the linear model while incorporating the sample design information (stratification).

   title1 'Ice Cream Spending Analysis';
   title2 'Stratified Simple Random Sampling Design';
   proc surveyreg data=IceCream total=StudentTotal;
      strata Grade /list; 
      class Kids;
      model Spending = Income Kids / solution;
      run;

By comparing these statements to those in the section "Simple Random Sampling", the TOTAL=StudentTotal option replaces the previous TOTAL=4000 option. When the population totals and sample sizes differ among strata, the population totals must be provided by a data set.

The STRATA statement specifies the stratification variable Grade. The LIST option in the STRATA statement requests that the stratification information be included in the output.

 
Ice Cream Spending Analysis
Stratified Simple Random Sampling Design

The SURVEYREG Procedure
Regression Analysis for Dependent Variable Spending

Data Summary
Number of Observations 40
Mean of Spending 8.75000
Sum of Spending 350.00000
 
Design Summary
Number of Strata 3
 
Fit Statistics
R-square 0.8132
Root MSE 2.4506
Denominator DF 37
Figure 62.4: Summary of the Regression

Figure 62.4 summarizes the data information, the sample design information, and the fit information. Note that, due to the stratification, the denominator degrees of freedom for F tests and t tests is 37, which is different from the analysis in Figure 62.1.

 
Ice Cream Spending Analysis
Stratified Simple Random Sampling Design

The SURVEYREG Procedure
Regression Analysis for Dependent Variable Spending

Stratum Information
Stratum
Index
Grade N Obs Population Total Sampling
Rate
1 7 20 1824 0.01
2 8 9 1025 0.01
3 9 11 1151 0.01
 
Class Level Information
Class Variable Levels Values
Kids 4 1 2 3 4
Figure 62.5: Stratification and Classification Information

Figure 62.5 displays the identifications of strata, numbers of observations or sample sizes in strata, total numbers of students in strata, and calculated sampling rates or sampling fractions in strata.

 
Ice Cream Spending Analysis
Stratified Simple Random Sampling Design

The SURVEYREG Procedure
Regression Analysis for Dependent Variable Spending

ANOVA for Dependent Variable Spending
Source DF Sum of Squares Mean Square F Value Pr > F
Model 4 915.310 228.8274 38.10 <.0001
Error 35 210.190 6.0054    
Corrected Total 39 1125.500      
 
Tests of Model Effects
Effect Num DF F Value Pr > F
Model 4 114.60 <.0001
Intercept 1 150.05 <.0001
Income 1 317.63 <.0001
Kids 3 0.93 0.4355

NOTE: The denominator degrees of freedom for the F tests is 37.

Figure 62.6: Testing Effects

Figure 62.6 displays the ANOVA table for the regression and the tests for the significance of model effects under the stratified sample design. The income effect is significant, while the kids effect is not significant at the 5% level.

 
Ice Cream Spending Analysis
Stratified Simple Random Sampling Design

The SURVEYREG Procedure
Regression Analysis for Dependent Variable Spending

Estimated Regression Coefficients
Parameter Estimate Standard Error t Value Pr > |t|
Intercept -26.084677 2.48241893 -10.51 <.0001
Income 0.775330 0.04350401 17.82 <.0001
Kids 1 0.897655 1.11778377 0.80 0.4271
Kids 2 1.494032 1.25209199 1.19 0.2404
Kids 3 -0.513181 1.36853454 -0.37 0.7098
Kids 4 0.000000 0.00000000 . .

NOTE: The denominator degrees of freedom for the t tests is 37.
Matrix X'X is singular and a generalized inverse was used to solve the normal equations. Estimates are not unique.

Figure 62.7: Regression Coefficients

The regression coefficient estimates for the stratified sample are displayed in Figure 62.7. The standard errors of the estimates and associated t tests are also shown in this table.

You can request other statistics and tests using PROC SURVEYREG. You can also analyze data from a more complex sample design. The remainder of this chapter provides more detailed information.

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.