Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The SURVEYMEANS Procedure

Stratified Sampling

Suppose that the sample of students described in the previous section was actually selected using stratified random sampling. In stratified sampling, the study population is divided into nonoverlapping strata, and samples are selected independently from each stratum.

The list of students in this junior high school was stratified by grade, yielding three strata: grades 7, 8, and 9. A simple random sample of students was selected from each grade. Table 61.1 shows the total number of students in each grade.

Table 61.1: Number of Students by Grade
Grade Number of Students
71,824
81,025
91,151
Total4,000

To analyze this stratified sample from a finite population, you need to provide the population totals for each stratum to PROC SURVEYMEANS. The SAS data set named StudentTotal contains the information from Table 61.1.

   data StudentTotal;
      input Grade _total_; datalines;
   7 1824
   8 1025
   9 1151
   ;

The variable Grade is the stratum identification variable, and the variable _TOTAL_ contains the total number of students for each stratum. PROC SURVEYMEANS requires you to store the stratum population totals in a variable named _TOTAL_.

The procedure uses the stratum population totals to adjust variance estimates for the effects of sampling from a finite population. If you do not provide population totals or sampling rates, then the procedure assumes that the proportion of the population in the sample is very small, and it does not include a finite population correction in the computations.

The following SAS statements perform the analysis of the survey data.

   title1 'Analysis of Ice Cream Spending';
   title2 'Stratified Simple Random Sampling Design';
   proc surveymeans data=IceCream total=StudentTotal;
      stratum Grade / list; 
      var Spending Group;
   run;

The PROC SURVEYMEANS statement invokes the procedure. The DATA= option names the SAS data set IceCream as the input data set to be analyzed. The TOTAL= option names the data set StudentTotal as the input data set containing the stratum population totals. Comparing this to the analysis in the "Simple Random Sampling" section, notice that the TOTAL=StudentTotal option is used here instead of the TOTAL=4000 option. In this stratified sample design, the population totals are different for different strata, and so they are provided to PROC SURVEYMEANS in a SAS data set.

The STRATA statement identifies the stratification variable Grade. The LIST option in the STRATA statement requests that the procedure display stratum information.

Analysis of Ice Cream Spending
Stratified Simple Random Sampling Design

The SURVEYMEANS Procedure

Data Summary
Number of Strata 3
Number of Observations 40

Class Level Information
Class Variable Levels Values
Group 2 less more

Figure 61.2: Data Summary

Figure 61.2 provides information on the input data set. There are three strata in the design, and 40 observations in the sample. The categorical variable Group has two levels, 'less' and 'more'.

Analysis of Ice Cream Spending
Stratified Simple Random Sampling Design

The SURVEYMEANS Procedure

Stratum Information
Stratum
Index
Grade Population Total Sampling
Rate
N Obs Variable N
1 7 1824 0.01 20 Spending
Group = less
Group = more
20
17
3
2 8 1025 0.01 9 Spending
Group = less
Group = more
9
0
9
3 9 1151 0.01 11 Spending
Group = less
Group = more
11
6
5

Figure 61.3: Stratum Information

Figure 61.3 displays information for each stratum. The table displays a Stratum Index and the values of the STRATA variable. The Stratum Index identifies each stratum by a sequentially-assigned number. For each stratum, the table gives the population total (total number of students), the sampling rate, and the sample size. The stratum sampling rate is the ratio of the number of students in the sample to the number of students in the population for that stratum. The table also lists each analysis variable and the number of stratum observations for that variable. For categorical variables, the table lists each level and the number of sample observations in that level.

Analysis of Ice Cream Spending
Stratified Simple Random Sampling Design

The SURVEYMEANS Procedure

Statistics
Variable N Mean Std Error of Mean Lower 95%
CL for Mean
Upper 95%
CL for Mean
Spending
Group = less
Group = more
40
23
17
8.750000
0.575000
0.425000
0.530531
0.059299
0.059299
7.675043
0.454850
0.304850
9.824957
0.695150
0.545150

Figure 61.4: Ice Cream Spending Analysis, Stratified SRS Design

Figure 61.4 shows that the estimate of average weekly ice cream expense is $8.75 for students in this school, with a standard error of $0.53, and a 95% confidence interval from $7.68 to $9.82. The mean estimate of $8.75 is the same as the one shown in Figure 61.1, which was computed under the assumption of a simple random sampling design without stratification. However, the standard error computed for the stratified design is $0.53, which is less than the standard error of $0.84 shown in Figure 61.1.

Figure 61.4 shows that an estimate of 57.5% of all students spend less than $10 weekly on ice cream, and 42.5% spend more, with a standard error of 0.06%.

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.