next up previous

STAT 350: Lecture 30

Power and Sample Size Calculations: Examples

SAMPLE SIZE NEEDED using t test: SAND and FIBRE example.

Now for the same assumed values of the parameters how many replicates of the basic design (using 9 combinations of sand and fibre contents) would I need to get a power of 0.95? The matrix tex2html_wrap_inline91 for m replicates of the design actually used is m times the same matrix for 1 replicate. This means that tex2html_wrap_inline97 will be 1/m times the same quantity for 1 replicate. Thus the value of tex2html_wrap_inline101 for m replicates will be tex2html_wrap_inline105 times the value for our design, which was 2. With m replicates the degrees of freedom for the t-test will be 18m-4. We now need to find a value of m so that in the row in Table B 5 across from 18m-4 degrees of freedom and the column corresponding to

displaymath117

we find 0.95. To simplify we try just assuming that the solution m is quite large and use the last line of the table. We get tex2html_wrap_inline101 between 3 and 4 - say about 3.75. Now set tex2html_wrap_inline123 and solve to find m=3.42 which would have to be rounded to 4 meaning a total sample size of tex2html_wrap_inline127 . For this value of m the non-centrality parameter is actually 4 (not the target of 3.75 because of rounding) and the power is 0.98. Notice that for this value of m the degrees of freedom for error is 66 which is so far down the table that the powers are not much different from the tex2html_wrap_inline133 line.

Technically it would be pretty easy to imagine using 3.5 replicates - each combination of SAND and FIBRE would be tried 7 times giving 63-4=59 degrees of freedom for error. The achieved power would then be quite close to 0.95.

POWER of F test: SAND and FIBRE example.

Now consider the power of the test that all the higher order terms are 0 in the model

displaymath137

that is the power of the F test of tex2html_wrap_inline141 .

You will need to specify the non-centrality parameter for this F test. In general the noncentrality parameter for a F test based on tex2html_wrap_inline147 numerator degrees of freedom is given by

displaymath149

This quantity needs to be worked out algebraically for each separate case, however, some general points can be made.

Now consider the sand and fibre example and assume tex2html_wrap_inline177 , tex2html_wrap_inline179 and tex2html_wrap_inline181 . The following SAS code computes the required numerator.

  data plaster;
  infile 'plaster.dat';
  input sand fibre hardness strength;
  newx = -0.004*fibre*fibre -0.005*sand*sand 
    +0.001*sand*fibre;
  proc reg  data=plaster;
   model newx = sand fibre ;
  run;
The output shows that the error sum of squares regressing newx on sand, fibre and an intercept is 31.1875. Taking tex2html_wrap_inline183 to be 7 we get a noncentrality parameter of roughly 4.55. Now compute the quantity tex2html_wrap_inline185 needed for table B 11. For 3 numerator and 18-6=12 denominator degrees of freedom we get a power between 0.27 and 0.56 but close to 0.27.

SAMPLE SIZE for F test: SAND and FIBRE example.

Now for the same basic problem and parameter values how many times would we need to replicate the design to get a power of 0.95? Again the non-centrality parameter for m replicates is m times that for 1 replicate; in terms of the parameter tex2html_wrap_inline193 used in the tables the value is proportional to tex2html_wrap_inline105 . With m replicates we now have 18m-6 denominator degrees of freedom. Again if 18m-6 is reasonably large then we can use the tex2html_wrap_inline133 line and see that tex2html_wrap_inline205 must be around 2.2 making m roughly 4 ( tex2html_wrap_inline209 ).

Table B 12 can be used directly. Table 12 gives values of n/r where n is the total sample size, the degrees of freedom in the numerator of the F-test are r-1, the degrees of freedom for error are n-r and the non-centrality parameter tex2html_wrap_inline221 is given by

displaymath223

If your basic design has tex2html_wrap_inline225 data points and p parameters and your F test is based on tex2html_wrap_inline147 degrees of freedom then when you replicate the design m times you get tex2html_wrap_inline235 total data points, tex2html_wrap_inline237 degrees of freedom for error and tex2html_wrap_inline147 degrees of freedom for the numerator of the F test.

To use the table take tex2html_wrap_inline243 . Then work out tex2html_wrap_inline245 by taking the value of the noncentrality parameter tex2html_wrap_inline247 for one replicate of the basic design and computing

displaymath249

. Look up n/r in the table and take that to be m. You will be making a small mistake unless tex2html_wrap_inline255 (which is the case for the overall F test in the basic ANOVA table). The problem is that you will be pretending you have tex2html_wrap_inline259 degrees of freeedom for error instead of tex2html_wrap_inline237 . As long as these are both large all is well.

In our example for a power of 0.95 and m replicates of the 18 point design we have tex2html_wrap_inline265 as above. We have r=3+1=4. We get tex2html_wrap_inline269 . For a level 0.05 test we then look on page 1362 and get m=5 for a total sample size of 90. The degrees of freedom for error will really be 84 but the table pretends that the degrees of freedom for error will be tex2html_wrap_inline273 . The latter is pretty small. The table supposes a small number of error df which would decrease the power of a test. This means that m=5 is probably an overestimate of the required sample size.

A better answer can be had by looking at replicates of the 9 point design. For 9 data points the nonecntrality parameter would have been tex2html_wrap_inline277 . This would give tex2html_wrap_inline279 and m of 9 or 10. For m=10 we would have the same design as before. For m=9 we would have only 72 data points. At this point you go back to Table B 11 to work out the power properly for 72 or 80 data points and see if 72 is enough.

Heteroscedastic Errors

If plots and/or tests show that the error variances tex2html_wrap_inline287 depend on i there are several standard approaches to fixing the problem, depending on the nature of the dependence.

Weighted Least Squares

If

displaymath319

and

displaymath321

and the errors are independent with normal distributions then the likelihood is

displaymath323

To choose tex2html_wrap_inline325 to maximize this likelihood we minimize the quantity

displaymath327

The process is called weighted least squares.

Algebraically it is easy to see how to do the minimization. Rewrite the quantity to be minimized as

displaymath329

This is just an ordinary least squares problem with the response variable being

displaymath331

and the covariates being

displaymath333

The calculation can be written in matrix form. If tex2html_wrap_inline335 is a diagonal matrix with tex2html_wrap_inline337 in the ith diagonal position then put tex2html_wrap_inline341 and tex2html_wrap_inline343 . Then

displaymath345

becomes

displaymath347

If tex2html_wrap_inline349 had mean 0, independent entries and tex2html_wrap_inline351 then tex2html_wrap_inline353 has mean 0, independent entries tex2html_wrap_inline355 and tex2html_wrap_inline357 so that ordinary multiple regression theory applies. The estimate of tex2html_wrap_inline325 is

displaymath361

where now tex2html_wrap_inline363 is a diagonal matrix with tex2html_wrap_inline365 on the diagonal. This estimate is unbiased and has variance covariance matrix

displaymath367

Example

It is possible to do weighted least squares in SAS fairly easily. As an example we consider using the SENIC data set taking the variance of RISK to be proportional to 1/CENSUS. (Motivation: RISK is an estimated proportion; variance of a Binomial proportion is inversely proportional to the sample size. This makes the weight just CENSUS.

proc reg  data=scenic;
  model Risk = Culture Stay Nratio Chest Facil;
  weight Census;
run ;

EDITED OUTPUT (Complete output)

Dependent Variable: RISK                                               
                        Analysis of Variance
                        Sum of         Mean
  Source       DF      Squares       Square      F Value       Prob>F
  Model         5  12876.94280   2575.38856       17.819       0.0001
  Error       107  15464.46721    144.52773
  C Total     112  28341.41001
      Root MSE      12.02197     R-square       0.4544
      Dep Mean       4.76215     Adj R-sq       0.4289
      C.V.         252.44833
                              Parameter Estimates
                    Parameter      Standard    T for H0:               
   Variable  DF      Estimate         Error   Parameter=0    Prob > |T|
   INTERCEP   1      0.468108    0.62393433         0.750        0.4547
   CULTURE    1      0.030005    0.00891714         3.365        0.0011
   STAY       1      0.237420    0.04444810         5.342        0.0001
   NRATIO     1      0.623850    0.34803271         1.793        0.0759
   CHEST      1      0.003547    0.00444160         0.799        0.4263
   FACIL      1      0.008854    0.00603368         1.467        0.1452
EDITED OUTPUT FOR UNWEIGHTED CASE (Complete output)
Dependent Variable: RISK                                               
                           Analysis of Variance
                              Sum of         Mean
  Source       DF      Squares       Square      F Value       Prob>F
  Model         5    108.32717     21.66543       24.913       0.0001
  Error       107     93.05266      0.86965
  C Total     112    201.37982
      Root MSE       0.93255     R-square       0.5379
      Dep Mean       4.35487     Adj R-sq       0.5163
      C.V.          21.41399
                        Parameter Estimates
                 Parameter      Standard    T for H0:               
  Variable  DF      Estimate         Error   Parameter=0    Prob > |T|
  INTERCEP   1     -0.768043    0.61022741        -1.259        0.2109
  CULTURE    1      0.043189    0.00984976         4.385        0.0001
  STAY       1      0.233926    0.05741114         4.075        0.0001
  NRATIO     1      0.672403    0.29931440         2.246        0.0267
  CHEST      1      0.009179    0.00540681         1.698        0.0925
  FACIL      1      0.018439    0.00629673         2.928        0.0042


next up previous



Richard Lockhart
Fri Mar 21 10:49:40 PST 1997