Chapter Contents |
Previous |
Next |
The SURVEYSELECT Procedure |
The data set TravelExpense contains the dollar amount of all employee travel expense transactions during the past month.
data TravelExpense; input ID$ Amount @@; if (Amount < 500) then Level='1_Low '; else if (Amount > 1500) then Level='3_High'; else Level='2_Avg '; datalines; 110 237.18 002 567.89 234 118.50 743 74.38 411 1287.23 782 258.10 216 325.36 174 218.38 568 1670.80 302 134.71 285 2020.70 314 47.80 139 1183.45 775 330.54 425 780.10 506 895.80 239 620.10 011 420.18 672 979.66 142 810.25 738 670.85 192 314.58 243 87.50 263 1893.40 496 753.30 332 540.65 486 2580.35 614 230.56 654 185.60 308 688.43 784 505.14 017 205.48 162 650.42 289 1348.34 691 30.50 545 2214.80 517 940.35 382 217.85 024 142.90 478 806.90 107 560.72 ;
In the SAS data set TravelExpense, the variable ID identifies the travel expense report. The variable Amount contains the dollar amount of the reported expense. The variable Level equals `1_Low', `2_Avg', or `3_High', depending on the value of Amount.
In the sample design for this audit, expense reports are stratified by Level. This ensures that each of these expense levels is included in the sample and also permits a disproportionate allocation of the sample, selecting proportionately more of the expense reports from the higher levels. Within strata, the sample of expense reports is selected with probability proportional to the amount of the expense, thus giving a greater chance of selection to larger expenses. In auditing terms, this is known as monetary-unit sampling. Refer to Wilburn (1984).
PROC SURVEYSELECT requires that the input data set be sorted by the STRATA variables. The following PROC SORT statements sort the TravelExpense data set by the stratification variable Level.
proc sort data=TravelExpense; by Level; run;
The following PROC PRINT statements display the sampling frame data set TravelExpense, which contains 41 observations.
title1 'Travel Expense Audit'; proc print data=TravelExpense; run;Output 63.3.1: Sampling Frame
title1 'Travel Expense Audit'; proc surveyselect data=TravelExpense method=pps n=(6 10 4) seed=47279 out=AuditSample; size Amount; strata Level; run;
The STRATA statement names the stratification variable Level. The SIZE statement specifies the size measure variable Amount. In the PROC SURVEYSELECT statement, the METHOD=PPS option requests sample selection with probability proportional to size and without replacement. The N=(6 10 4) option specifies the stratum sample sizes, listing the sample sizes in the same order that the strata apear in the TravelExpense data set. The sample size of 6 corresponds to the first stratum, Level = `1_Low', the sample size of 10 corresponds to the second stratum, Level = `2_Avg', and 4 corresponds to the last stratum, Level = `3_High'. The SEED=47279 option specifies '47279' as the initial seed for random number generation.
Figure 63.3.2 displays the output from PROC SURVEYSELECT. A total of 20 expense reports is selected for audit. The data set AuditSample contains the sample of travel expense reports.
Output 63.3.2: Sample Selection Summary
|
title1 'Travel Expense Audit Sample'; title2 'Sample Selected by Stratified PPS Design'; proc print data=AuditSample; run;Output 63.3.3: Audit Sample
|
Chapter Contents |
Previous |
Next |
Top |
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.