Philosophia Vol. 3 Nos. 2-3 Pp. 167-178 April-July 1973
This paper is a slightly revised version of a paper read (May 31, 1972) before the Canadian Society for the Study of the History and Philosophy of Science. The Society's meeting was held under the auspices of the Learned Societies, at its Conference at McGill University, Montreal, Quebec.
A NEW PARADOX IN PROBABILITY THEORY
INTRODUCTION: THE PARADOX STATED IN QUALITATIVE TERMS
I wish to draw attention to a new, simply stateable, but difficult paradox in probability theory.
For a finite universe of discourse, if
Where, if at all, has this argument gone astray? If the result is unacceptable, some premise or premises in the preceding argument must be given up. All, however, seem to be fairly well-established truths of formal logic, its standard semantics, and probability theory.
In this paper various solutions to the paradox are examined. I shall use Carnap's system of confirmation as a point of reference because it embodies this paradox, and moreover does so in a quantitative fashion, assigning specific numbers to the probabilities in question. The paradox, however, it should already be clear, is not unique to Carnap's system but seems to be indigenous to much fairly recent probability theory.
THE PARADOX DERIVED QUANTITATIVELY
Carnap's early attempts circa 1950 () to construct a quantitative measure, the system c*, of the degree of confirmation or logical probability obtaining between any two propositions is justifiably well-admired for its ambition and verve. In subsequent years certain well-known flaws and alleged flaws in that theory have been pointed out. We mention three, each of which in its own way bears on the problem at hand.
One is the fact that in (viz. an artificial language consisting of an infinite number of individual constants and a finite number of monadic predicates), the probability of any universally general proposition on finite evidence is zero. Two is the fact, emphasized by Salmon (), that the system is linguistically variant, that is, the probability of a given hypothesis on unchanging evidence varies from language to language as the number of predicates in a 'family' changes. For example, for a fixed population of individuals, the probability that all apples are red on the evidence that one particular apple is red decreases monotonically as more predicates are added to the 'color-family' in successive languages. And three is the fact that this early system of Carnap's embodies perfectly the so-called "Raven Paradox" discovered by Hempel (, chapter 1): insofar as
These features of the system are, as remarked, now well-known. Nonetheless, I for one would not consider the first two of these to be flaws of the system. Rather I think they graphically point out the differences between a measure of logical probability and a measure of a posteriori probability. The first two at least of these corollaries of the system seem to me to be precisely right and if they are incompatible with our pre-analytic beliefs or expectations in this matter of partial entailment then so much the worse for our pre-analytic beliefs: they ought to be revised accordingly. As regards the conceptual re-adjustments needed to accommodate Hempel's Paradox I am less sure where to make them: whether to readjust our concept of just what a confirming instance is, or whether to preserve a 'near-Nicodian' view of confirming instances and instead condemn Carnap's (and others') systems of confirmation.
To these three peculiarities, whatever we may say about them, we must now add another. This newly uncovered feature, like the one Salmon calls our attention to, involves a curious systematic variation in probability-values from language to language. But where Salmon's concerned the change in values as a function of the number and kind of predicates, this new peculiarity involves a change brought about as a function of the number of individuals.
Consider first an exceedingly simple language consisting of two predicates, "R" (raven) and "B" (black) and one individual constant, "a". What, in this language, is the a priori or absolute logical probability that all ravens are black? We wish to calculate the functor,
c*[(x)(Rx ⊃ Bx), t]. [Note 1]The algorithmic aspects of the system are now standard fare in textbooks in probability theory, and an application of their methods readily yields the result that
c*[(x)(Rx ⊃ Bx), t] = ¾ or 75%.Concomitantly for this particular language, the c*-value of the corresponding I-proposition (viz.,
Consider now increasing our model by one individual constant. It is again an easy exercise to calculate the absolute logical probability of the A and I propositions. We find these results:
c*[(x)(Rx ⊃ Bx), t] = 60%.We already know from our discussion of the first of the three features recalled above that in ,
In short, then, we have again derived the paradox but this time in quantitative terms: for the system, c*, in a world of one individual in the absence of any contingent evidence it is three times as likely that all ravens are black (75%) than that some ravens are black (25%); and in a world of two individuals the former hypothesis is half again as likely (60%) as the latter (40%). [Note 3]
And as if these results were not sufficient cause for conceptual discomfort, still another, albeit related, paradox can be derived. The substitution of a negated term for an unnegated one in any hypothesis will leave the absolute logical probability of the resulting proposition unchanged from the original one. Thus, for example, in ,
ATTEMPTED SOLUTIONS OF THE PARADOX
How, then, shall we accommodate these various significant departures from our intuitive expectations regarding all- and some-propositions? Our choices seem to be limited to two. We can regard these paradoxes as veridical ones which merely reveal the unexpected but necessary consequences of relatively immutable laws of logic and probability theory. [Note 4] Or, eschewing this choice, we can regard these paradoxes as being falsidical ones revealing a fatal flaw somewhere within the theory itself.
My own predilection is to favor the latter view, to regard these paradoxes as falsidical ones. But before I proceed to examine ways one might try to modify the theory in order to avoid these paradoxes, I would like to examine briefly the argument which might be adduced in support of the opposite point of view, that of regarding the paradoxes as being veridical.
If the paradoxes are to be regarded as veridical, we shall have to argue that the conflict that arises between the results derived and the presupposition that all-propositions are in every case less probable than are the corresponding some-propositions is resolvable by recognizing the falsity of the presupposition. Such an argument would insist upon the correctness of the various premises we used in initially deriving the first paradox. To repeat, in a world of one individual, a, the hypothesis,
Shall we accept this argument? I leave the final choice to the reader. But in the meantime I review four attempts (and their attendant problems) to solve the problem by the other route, that of preserving the described presupposition which I am sure, to some readers at least, would appear to be not quite so readily dismissible as the above argument would have us believe.
Repair #1: No doubt the way that comes to mind most readily to correct the excessive probability-value deduced for the all-propositions (both the A and the E), is to revert from the Boolean rendition to a modern-day Aristotelian one, that is, one in which there is added to the hypothesis an explicit conjunct stating that the class referred to by the subject term of the hypothesis is non-empty. Obviously, if the A and E propositions are expressed as,
"(x)(Rx ⊃ Bx) & (∃x)(Rx)" and "(x)(Rx ⊃ ~Bx) & (∃x)(Rx)",then their respective probabilities can be no greater than those of the I and O propositions which they respectively entail.
For the case of a world of just one individual, each member of the pairs, A and E, and E and O, will imply the other member. This is in conformity with another expectation regarding probabilities: in a world of only one individual the probability that all ravens are black should be precisely equal to the probability that some raven is black. There is, then, adopting this repair no crossing of the A and I curves; they each start at the value 25% for the case of one individual and diverge thereafter, the A curve approaching the value zero and the I curve, the value one.
Objection to Repair #1: The cost involved in our theory of logical probability in reverting to quasi-Aristotelian formulations is much remarked upon in the literature which has sprung up in response to Hempel's paradox for which the same solution has also been offered. Scheffler (, pp. 261-263) reviews various problems with this attempted solution. Two of his most important objections are these: first, A-propositions can be cast into a variety of logically equivalent forms with non-equivalent subject terms so that it is indeterminate as to which existential assumption should be made explicit and conjoined to the Boolean expression; and second he remarks that certain hypothetical reasoning explicitly eschews the existential component.
Repair #2: Another way to preserve the presupposition in question is to resolve to restrict the application of probability theory to languages describing fairly populous worlds. We simply acknowledge the inadequacy of the system to deal with small worlds and let it go at that.
While this repair does smack of ad hocness, it is not without familial precedent in logic. One is reminded of the "virtuous circle" spoken of by Nelson Goodman (, pp. 62-66) in which formalized logic and intuitive logic successively interact to modify one another. Restrictions in applicability of various logical operations are already countenanced in logic, e.g., contraposition is invalid for the I-proposition; this one more can be tolerated.
Objection to Repair #2: This repair is seriously disanalogous with the case to which it is compared, i.e., contraposition for the I-proposition. An argument to be judged valid must be such that in any possible world, if the premises are true, then so too will the conclusion. Remember that the measure of the absolute logical probability of an hypothesis can be construed as a measure of the degree to which that hypothesis is entailed by a necessary proposition. While we are prepared to allow that the degree of entailment should depend on the particular language used (again recall feature #1 above) we are not prepared to allow a system to give erroneous results for some languages, e.g. L1 and L2 which give rise to descriptions of small worlds. It is a violation of the very spirit of the logical enterprise to make the correctness of the measure of entailment depend on the particular world described.
Repair #3: There is, of course, no formal paradox in the above deductions. The trouble occurs in trying to outfit Carnap's system with a semantics, in particular, in interpreting expressions of the form
Objection to Repair #3: We shall in any case desire a complete semantics for the system. What translational rule shall we then associate with
Reply to Objection #3: The immediately foregoing objection to Repair #3 is too hasty. It is not the case that we would wish to say of every universally general hypothetical that it ought to be less probable than the corresponding I-proposition. Consider this hypothetical, "If anything is a witch, then it has supernatural powers." This hypothetical clearly is more probable than the corresponding I-proposition, "Some witches have supernatural powers." This hypothetical, being necessarily true, has a probability-value of one, while the I-proposition has a probability-value (we would guess to be) very near zero. Thus the objection to Repair #3 does not hold up.
Revised Objection to Repair #3: The criticism is well-taken against the Objection. It does not, however, tell us how to make the needed repair; indeed all it tells us is that our statement of the Objection to Repair #3 was careless and in need of revising. Let it be granted that should a universally general hypothetical be necessary, then its probability-value in all worlds will be one. But every respectable theory of probability embodies this thesis and it has no bearing on the current problem. The crucial point concerns non-necessary universally general hypotheticals.
Perhaps the spirit of the Reply citing the case of witches is that the solution to the problem lies in looking at hypotheticals whose subject classes are (for all we know) empty. Let us try one. Consider: "If anything is a witch, then it wears a pointed hat." Do we want to say of this proposition, as we would be required by the theory here being criticized, that it is more probable than the corresponding I-proposition in small worlds and less in large? Surely not.
Thus the Objection to Repair #3 stands. Translating
Repair #4: Obviously more drastic methods are going to be needed than any that appear above. Let us examine the original version of the paradox rather more closely. As it is presented, the argument would seem to commit the fallacy of equivocation. [Note 5] It speaks at one point simply of "equivalence" between certain general propositions and their singular expansions. And then at a later point it proceeds to transfer the probability-values from each of the singular expansions back to the general statements from which they were generated. This transference of probability-values would clearly be warranted if the equivalences in question were logical (or analytic ones), but they are not: they are material equivalences. The fallacious inference is being disguised by speaking carelessly of "equivalence" tout court, rather than of "material equivalence".
But if in general it is fallacious to assign equi-probability-values to each member of a pair of materially equivalent propositions, [Note 6] it need not be improper to do so for selected pairs of such propositions. The fallacy can be repaired by viewing the argument as an enthymeme requiring an additional premise. In essence we require an axiom which would allow us in a language Ln to assign the same probability-value to a general statement as we do to its singular expansion in that language. This axiom would read [Note 7]
Pn(p) = P(∑n(p)),where,
∑n[(x)(Φx ⊃ Ψx)] =df (~Φa1 ∨ Ψa1) & (~Φa2 ∨ Ψa2) & ... & (~Φan ∨ Ψan)and
∑n[(∃x)(Φx & Ψx) =df (Φa1 & Ψa1) ∨ (Φa2 & Ψa2) ∨ ... ∨ (Φan & Ψan).Now it is clear that this axiom characterizes both Carnap's system, c*, and indeed much of our recent thinking about logical probability. But having now explicitly brought it out in the open, we can see that we are not bound to accept this axiom; indeed there is good reason to reject it. For if we discard this axiom, both the qualitative and quantitative versions of the paradox are avoided.
Reply to Repair #4: There is an intuitive obviousness to the axiom we have just exposed, and consequently we shall require fairly persuasive arguments that it is on just this point that the system should be modified. Perhaps in newer theories of probability we can come to be persuaded that this particular axiom is dispensable. We shall have to wait to see. But in the meantime, as we test new theories to see whether they successfully avoid the many traditional paradoxes, we might with profit also inquire whether they avoid the new one here revealed. Any that do not, would seem to harbor a serious shortcoming.
Return/transfer to Norman Swartz's Home Page