Example 22.11: Predicted Probabilities
Suppose you have collected marketing research data to
examine the relationship between a prospect's likelihood
of buying your product and their education and income.
Specifically, the variables are as follows.
|
Variable
|
Levels
|
Interpretation
|
| Education | high, low | prospect's education level |
| Income | high, low | prospect's income level |
| Purchase | yes, no | Did prospect purchase product? |
The following statements first create a data set,
loan, that contains the marketing research data, then they use
the CATMOD procedure to fit a model, obtain the parameter
estimates, and obtain the predicted probabilities of
interest. These statements produce Output 22.11.1 through
Output 22.11.5.
data loan;
input Education $ Income $ Purchase $ wt;
datalines;
high high yes 54
high high no 23
high low yes 41
high low no 12
low high yes 35
low high no 42
low low yes 19
low low no 8
;
ods output PredictedValues=Predicted
(keep=Education Income PredFunction);
proc catmod data=loan order=data;
weight wt;
response marginals;
model Purchase=Education Income / pred;
run;
proc sort data=Predicted;
by descending PredFunction;
run;
proc print data=Predicted;
run;
Notice that the preceding statements use the Output Delivery
system (ODS) to output the parameter estimates instead of
the OUT= option, though either can be used.
Output 22.11.1: Marketing Research Data: Obtaining Predicted Probabilities
|
| Response |
Purchase |
Response Levels |
2 |
| Weight Variable |
wt |
Populations |
4 |
| Data Set |
LOAN |
Total Frequency |
234 |
| Frequency Missing |
0 |
Observations |
8 |
|
Output 22.11.2: Profiles and Design Matrix
|
| Population Profiles |
| Sample |
Education |
Income |
Sample Size |
| 1 |
high |
high |
77 |
| 2 |
high |
low |
53 |
| 3 |
low |
high |
77 |
| 4 |
low |
low |
27 |
| Response Profiles |
| Response |
Purchase |
| 1 |
yes |
| 2 |
no |
| Sample |
Response Function |
Design Matrix |
| 1 |
2 |
3 |
| 1 |
0.70130 |
1 |
1 |
1 |
| 2 |
0.77358 |
1 |
1 |
-1 |
| 3 |
0.45455 |
1 |
-1 |
1 |
| 4 |
0.70370 |
1 |
-1 |
-1 |
|
Output 22.11.3: ANOVA Table and Parameter Estimates
|
| Analysis of Variance |
| Source |
DF |
Chi-Square |
Pr > ChiSq |
| Intercept |
1 |
418.36 |
<.0001 |
| Education |
1 |
8.85 |
0.0029 |
| Income |
1 |
4.70 |
0.0302 |
| Residual |
1 |
1.84 |
0.1745 |
| Analysis of Weighted Least Squares Estimates |
| Effect |
Parameter |
Estimate |
Standard Error |
Chi- Square |
Pr > ChiSq |
| Intercept |
1 |
0.6481 |
0.0317 |
418.36 |
<.0001 |
| Education |
2 |
0.0924 |
0.0311 |
8.85 |
0.0029 |
| Income |
3 |
-0.0675 |
0.0312 |
4.70 |
0.0302 |
|
Output 22.11.4: Predicted Values and Residuals
|
| Predicted Values for Response Functions |
| Sample |
Education |
Income |
Function Number |
Observed |
Predicted |
Residual |
| Function |
Standard Error |
Function |
Standard Error |
| 1 |
high |
high |
1 |
0.7012987 |
0.052158 |
0.67293982 |
0.047794 |
0.02835888 |
| 2 |
high |
low |
1 |
0.77358491 |
0.057487 |
0.80803395 |
0.051586 |
-0.034449 |
| 3 |
low |
high |
1 |
0.45454545 |
0.056744 |
0.48811031 |
0.051077 |
-0.0335649 |
| 4 |
low |
low |
1 |
0.7037037 |
0.087877 |
0.62320444 |
0.064867 |
0.08049927 |
|
Output 22.11.5: Predicted Probabilities Data Set
| Obs |
Education |
Income |
PredFunction |
| 1 |
high |
low |
0.80803395 |
| 2 |
high |
high |
0.67293982 |
| 3 |
low |
low |
0.62320444 |
| 4 |
low |
high |
0.48811031 |
|
You can use the predicted values (values of PredFunction in
Output 22.11.5) as scores representing the
likelihood that a randomly chosen subject from one of these
populations will purchase the product. Notice that the
Response Profiles in Output 22.11.2 show you that the first
sorted level of Purchase is "yes," indicating
that the predicted probabilities are for
Pr(Purchase='yes'). For example, someone with high
education and low income has an estimated probability of
purchase of 0.808. As with any response function estimate
given by PROC CATMOD, this estimate can be obtained by
cross-multiplying the row from the design matrix
corresponding to the sample (sample number 2 in this case)
with the vector of parameter estimates
((1*0.6481)+(1*0.0924)+(-1*(-0.0675))).
This ranking of scores can help in decision making (for
example, with respect to allocation of advertising dollars,
choice of advertising media, choice of print media, and so
on).
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.