Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The SURVEYREG Procedure

Example 62.7: Domain Analysis

Recall that in the section "Getting Started", you collected a stratified simple random sample from a junior high school to examine how household income and the number of children in a household affect students' average weekly spending for ice cream. You can also use the same sample to estimate the average weekly spending among male and female students, respectively. This is often called domain analysis (subgroup analysis). You can use PROC SURVEYREG to perform domain analysis as in the following example.

   data IceCreamData;
      input Grade Spending Income Gender$ @@;
      if Gender='M' then Male=1; else Male=0;
      if Gender='F' then Female=1; else Female=0;
      datalines; 
    7   7  39  M   7   7  38  F   8  12  47  F 
    9  10  47  M   7   1  34  M   7  10  43  M
    7   3  44  M   8  20  60  F   8  19  57  M
    7   2  35  M   7   2  36  F   9  15  51  F
    8  16  53  F   7   6  37  F   7   6  41  M
    7   6  39  M   9  15  50  M   8  17  57  F
    8  14  46  M   9   8  41  M   9   8  41  F
    9   7  47  F   7   3  39  F   7  12  50  M
    7   4  43  M   9  14  46  F   8  18  58  M
    9   9  44  F   7   2  37  F   7   1  37  M
    7   4  44  M   7  11  42  M   9   8  41  M 
    8  10  42  M   8  13  46  F   7   2  40  F
    9   6  45  F   9  11  45  M   7   2  36  F
    7   9  46  F
    ;

In the data set IceCreamData, the variable Grade indicates a student's grade, which is the stratification variable. The variable Spending contains the dollar amount of each student's average weekly spending for ice cream. The variable Income specifies the household income, in thousands of dollars. The variable Gender indicates a student's gender. Male and Female are two indicator variables that identify the subgroups of male and female students, respectively.

   data StudentTotal;
      input Grade _TOTAL_; 
      datalines;
   7 1824
   8 1025
   9 1151
   ;

In the data set StudentTotal, the variable Grade is the stratification variable, and the variable _TOTAL_ contains the total numbers of students in the strata in the survey population.

The following statements demonstrate how you can estimate the average spending in the subgroup of male students.

   title1 'Ice Cream Spending Analysis';
   title2 'Domain Analysis for Subgroup: Male Students';
   proc surveyreg data=IceCreamData total=StudentTotal;
      strata Grade; 
      model Spending = Male / noint; 
      ods select ParameterEstimates;
   run;

Output 62.7.1: Domain Analysis for Male Students
 
Ice Cream Spending Analysis
Domain Analysis for Subgroup: Male Students

The SURVEYREG Procedure
Regression Analysis for Dependent Variable Spending

Estimated Regression Coefficients
Parameter Estimate Standard Error t Value Pr > |t|
Male 8.57142857 0.97971846 8.75 <.0001

NOTE: The denominator degrees of freedom for the t tests is 37.


Output 62.7.1 shows that average spending for the subgroup of male students is $8.57 with a standard error of $.99.

Similarly, you can obtain a domain analysis for the subgroup of female students with the following statements.

   title1 'Ice Cream Spending Analysis';
   title2 'Domain Analysis for Subgroup: Female Students';
   proc surveyreg data=IceCreamData total=StudentTotal;
      strata Grade /list; 
      model Spending = Female / noint; 
   run;

Output 62.7.2: Domain Analysis for Female Students
 
Ice Cream Spending Analysis
Domain Analysis for Subgroup: Female Students

The SURVEYREG Procedure
Regression Analysis for Dependent Variable Spending

Estimated Regression Coefficients
Parameter Estimate Standard Error t Value Pr > |t|
Female 8.94736842 1.06370643 8.41 <.0001

NOTE: The denominator degrees of freedom for the t tests is 37.


Output 62.7.2 shows that average spending for the subgroup of female students is $8.95 with a standard error of $1.06.

Note that you would not obtain the same results by using a subset of your sample, for example, by restricting the analysis to male students using a WHERE clause or a BY statement. This is because the domain sample size is not fixed in the original sample design, but is actually a random variable. The variance estimation for the domain mean must include this variability of the sample size. Refer to Cochran (1977) for more details.

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.