Philosophia Vol. 3 Nos. 2-3 Pp. 167-178 April-July 1973

This paper is a slightly revised version of a paper read (May 31, 1972) before the Canadian Society for the Study of the History and Philosophy of Science. The Society's meeting was held under the auspices of the Learned Societies, at its Conference at McGill University, Montreal, Quebec.




I wish to draw attention to a new, simply stateable, but difficult paradox in probability theory.

For a finite universe of discourse, if Φ → Ψ and ~(Ψ → Φ), then P(Ψ) > P(Φ), i.e., there is always a loss of information, there is an increase in probability, in a non-reversible implication. But consider the two propositions, "All ravens are black", (i.e., "(x)(Rx ⊃ Bx)"), and "Some ravens are black" (i.e., "(∃x)(Rx & Bx)"). In a world of one individual, called "a", these two propositions are equivalent to "~Ra ∨ Ba" and "Ra & Ba" respectively. However, (Ra & Ba) → (~Ra ∨ Ba) and ~[(~Ra ∨ Ba) → (Ra & Ba)]. Consequently, in a world of one individual it is more probable that all ravens are black than that some ravens are black!

Where, if at all, has this argument gone astray? If the result is unacceptable, some premise or premises in the preceding argument must be given up. All, however, seem to be fairly well-established truths of formal logic, its standard semantics, and probability theory.

In this paper various solutions to the paradox are examined. I shall use Carnap's system of confirmation as a point of reference because it embodies this paradox, and moreover does so in a quantitative fashion, assigning specific numbers to the probabilities in question. The paradox, however, it should already be clear, is not unique to Carnap's system but seems to be indigenous to much fairly recent probability theory.


Carnap's early attempts circa 1950 ([1]) to construct a quantitative measure, the system c*, of the degree of confirmation or logical probability obtaining between any two propositions is justifiably well-admired for its ambition and verve. In subsequent years certain well-known flaws and alleged flaws in that theory have been pointed out. We mention three, each of which in its own way bears on the problem at hand.

One is the fact that in (viz. an artificial language consisting of an infinite number of individual constants and a finite number of monadic predicates), the probability of any universally general proposition on finite evidence is zero. Two is the fact, emphasized by Salmon ([5]), that the system is linguistically variant, that is, the probability of a given hypothesis on unchanging evidence varies from language to language as the number of predicates in a 'family' changes. For example, for a fixed population of individuals, the probability that all apples are red on the evidence that one particular apple is red decreases monotonically as more predicates are added to the 'color-family' in successive languages. And three is the fact that this early system of Carnap's embodies perfectly the so-called "Raven Paradox" discovered by Hempel ([3], chapter 1): insofar as "(x)(Rx ⊃ Bx)" is logically equivalent to "(x)(~Bx ⊃ ~Rx)" both will receive precisely the same degree of confirmation on any evidential claim. Thus not only will "Ra & Ba" increase the degree of confirmation of "(x)(Rx ⊃ Bx)" but to an equal degree so too will "~Ra & ~Ba".

These features of the system are, as remarked, now well-known. Nonetheless, I for one would not consider the first two of these to be flaws of the system. Rather I think they graphically point out the differences between a measure of logical probability and a measure of a posteriori probability. The first two at least of these corollaries of the system seem to me to be precisely right and if they are incompatible with our pre-analytic beliefs or expectations in this matter of partial entailment then so much the worse for our pre-analytic beliefs: they ought to be revised accordingly. As regards the conceptual re-adjustments needed to accommodate Hempel's Paradox I am less sure where to make them: whether to readjust our concept of just what a confirming instance is, or whether to preserve a 'near-Nicodian' view of confirming instances and instead condemn Carnap's (and others') systems of confirmation.

To these three peculiarities, whatever we may say about them, we must now add another. This newly uncovered feature, like the one Salmon calls our attention to, involves a curious systematic variation in probability-values from language to language. But where Salmon's concerned the change in values as a function of the number and kind of predicates, this new peculiarity involves a change brought about as a function of the number of individuals.

Consider first an exceedingly simple language consisting of two predicates, "R" (raven) and "B" (black) and one individual constant, "a". What, in this language, is the a priori or absolute logical probability that all ravens are black? We wish to calculate the functor,
c*[(x)(Rx ⊃ Bx), t].  [Note 1]
The algorithmic aspects of the system are now standard fare in textbooks in probability theory, and an application of their methods readily yields the result that
c*[(x)(Rx ⊃ Bx), t] = ¾ or 75%.
Concomitantly for this particular language, the c*-value of the corresponding I-proposition (viz., (∃x)(Rx & Bx),) is 25%.

Consider now increasing our model by one individual constant. It is again an easy exercise to calculate the absolute logical probability of the A and I propositions. We find these results:
c*[(x)(Rx ⊃ Bx), t] = 60%.
c*[(∃x)(Rx & Bx), t] = 40%.
We already know from our discussion of the first of the three features recalled above that in , c*[(x)(Rx ⊃ Bx), t] = 0 and derivatively that c*[(∃x)(Rx & Bx), t] = 1. The c*-values of the absolute logical probabilities of the A and the I propositions asymptotically approach 0 and 1 respectively as we continue to increase the number of individual constants from language to language. What has been hitherto overlooked is where these two curves originate. The A-curve starts at a value greater than 50% and falls, while the I-curve starts at a value less than 50% and rises. There must then be a point of intersection and it occurs for some wholly arbitrary number of individual constants. [Note 2]

In short, then, we have again derived the paradox but this time in quantitative terms: for the system, c*, in a world of one individual in the absence of any contingent evidence it is three times as likely that all ravens are black (75%) than that some ravens are black (25%); and in a world of two individuals the former hypothesis is half again as likely (60%) as the latter (40%). [Note 3]

And as if these results were not sufficient cause for conceptual discomfort, still another, albeit related, paradox can be derived. The substitution of a negated term for an unnegated one in any hypothesis will leave the absolute logical probability of the resulting proposition unchanged from the original one. Thus, for example, in , c*[(x)(Rx ⊃ Bx), t] = c*[(x)(Rx ⊃ ~Bx), t) = 0, and c*[(∃x)(Rx & Bx), t] = c*[(∃x)(Rx & ~Bx), t] = 1. This preservation of c*-values under this particular substitution leads immediately to the result that in the first language specified above (viz. one individual constant), c*[(x)(Rx ⊃ Bx), t] = c*[(x)(Rx ⊃ ~Bx), t] = 75%. One would have thought that if the probability that all ravens are black is greater than 50%, (which it is in this case), then the probability that no ravens are black ought to be less than 50%, (which it is not in this case.)


How, then, shall we accommodate these various significant departures from our intuitive expectations regarding all- and some-propositions? Our choices seem to be limited to two. We can regard these paradoxes as veridical ones which merely reveal the unexpected but necessary consequences of relatively immutable laws of logic and probability theory. [Note 4]  Or, eschewing this choice, we can regard these paradoxes as being falsidical ones revealing a fatal flaw somewhere within the theory itself.

My own predilection is to favor the latter view, to regard these paradoxes as falsidical ones. But before I proceed to examine ways one might try to modify the theory in order to avoid these paradoxes, I would like to examine briefly the argument which might be adduced in support of the opposite point of view, that of regarding the paradoxes as being veridical.

If the paradoxes are to be regarded as veridical, we shall have to argue that the conflict that arises between the results derived and the presupposition that all-propositions are in every case less probable than are the corresponding some-propositions is resolvable by recognizing the falsity of the presupposition. Such an argument would insist upon the correctness of the various premises we used in initially deriving the first paradox. To repeat, in a world of one individual, a, the hypothesis, "(x)(Rx ⊃ Bx)", is equivalent to "~Ra ∨ Ba"; similarly, "(∃x)(Rx & Bx)" is equivalent to "Ra & Ba". But since "Ra & Ba", entails "~Ra ∨ Ba", and not conversely, it is not only a matter not to be concerned about, it is a logical necessity that in this case the all-proposition should be more probable than the some-proposition. And since we know that for populous worlds the oddity is repaired – i.e. that the probability of some-propositions does exceed that of all-propositions – it should be a matter of no wonderment that the values must 'cross-over' for some seemingly arbitrary number of individuals. The problematic presupposition which gives rise to the seeming difficulties has been conditioned by our habit of thinking primarily about populous worlds. But this 'law' which we discover to hold for large worlds does not have unrestricted applicability, it is in the end no more than a circumscribed rule of thumb. Extrapolation, whether to the very large or to the very small, is logically a species of induction with all the latter's potential for failure. And here in these paradoxes we have discovered just such an instance of failure of extrapolation. The solution is simply to recognize the falsity of the unrestricted claim that universal propositions are in every case less probable than the corresponding particular propositions.

Shall we accept this argument? I leave the final choice to the reader. But in the meantime I review four attempts (and their attendant problems) to solve the problem by the other route, that of preserving the described presupposition which I am sure, to some readers at least, would appear to be not quite so readily dismissible as the above argument would have us believe.

Repair #1: No doubt the way that comes to mind most readily to correct the excessive probability-value deduced for the all-propositions (both the A and the E), is to revert from the Boolean rendition to a modern-day Aristotelian one, that is, one in which there is added to the hypothesis an explicit conjunct stating that the class referred to by the subject term of the hypothesis is non-empty. Obviously, if the A and E propositions are expressed as,
"(x)(Rx ⊃ Bx) & (∃x)(Rx)" and "(x)(Rx ⊃ ~Bx) & (∃x)(Rx)",
then their respective probabilities can be no greater than those of the I and O propositions which they respectively entail.

For the case of a world of just one individual, each member of the pairs, A and E, and E and O, will imply the other member. This is in conformity with another expectation regarding probabilities: in a world of only one individual the probability that all ravens are black should be precisely equal to the probability that some raven is black. There is, then, adopting this repair no crossing of the A and I curves; they each start at the value 25% for the case of one individual and diverge thereafter, the A curve approaching the value zero and the I curve, the value one.

Objection to Repair #1: The cost involved in our theory of logical probability in reverting to quasi-Aristotelian formulations is much remarked upon in the literature which has sprung up in response to Hempel's paradox for which the same solution has also been offered. Scheffler ([6], pp. 261-263) reviews various problems with this attempted solution. Two of his most important objections are these: first, A-propositions can be cast into a variety of logically equivalent forms with non-equivalent subject terms so that it is indeterminate as to which existential assumption should be made explicit and conjoined to the Boolean expression; and second he remarks that certain hypothetical reasoning explicitly eschews the existential component.

Repair #2: Another way to preserve the presupposition in question is to resolve to restrict the application of probability theory to languages describing fairly populous worlds. We simply acknowledge the inadequacy of the system to deal with small worlds and let it go at that.

While this repair does smack of ad hocness, it is not without familial precedent in logic. One is reminded of the "virtuous circle" spoken of by Nelson Goodman ([2], pp. 62-66) in which formalized logic and intuitive logic successively interact to modify one another. Restrictions in applicability of various logical operations are already countenanced in logic, e.g., contraposition is invalid for the I-proposition; this one more can be tolerated.

Objection to Repair #2: This repair is seriously disanalogous with the case to which it is compared, i.e., contraposition for the I-proposition. An argument to be judged valid must be such that in any possible world, if the premises are true, then so too will the conclusion. Remember that the measure of the absolute logical probability of an hypothesis can be construed as a measure of the degree to which that hypothesis is entailed by a necessary proposition. While we are prepared to allow that the degree of entailment should depend on the particular language used (again recall feature #1 above) we are not prepared to allow a system to give erroneous results for some languages, e.g. L1 and L2 which give rise to descriptions of small worlds. It is a violation of the very spirit of the logical enterprise to make the correctness of the measure of entailment depend on the particular world described.

Repair #3: There is, of course, no formal paradox in the above deductions. The trouble occurs in trying to outfit Carnap's system with a semantics, in particular, in interpreting expressions of the form "(x)(Φx ⊃ Ψx)", as "All Φ's are Ψ's". The paradox is forestalled if we refrain from making this particular semantic assignment.

Objection to Repair #3: We shall in any case desire a complete semantics for the system. What translational rule shall we then associate with "(x)(Φx ⊃ Ψx)"? The next most obvious choice is the explicit hypothetical, "If anything is a Φ, then it is a Ψ". Unfortunately this manoeuvre suppresses the paradox at one point only to have it re-emerge at another. Consider the hypothetical proposition, "If anything is a raven, then it is black." Even of this proposition, it seems intuitively clear that it, too, just like the propositions, "All ravens are black," ought in every case to be less probable than the corresponding I-proposition, "Some ravens are black." The suggestion that we render, "(x)(Rx ⊃ Bx)" in English as a hypothetical rather than as a categorical proposition does not, it seems to me, do anything to obviate the paradox.

Reply to Objection #3: The immediately foregoing objection to Repair #3 is too hasty. It is not the case that we would wish to say of every universally general hypothetical that it ought to be less probable than the corresponding I-proposition. Consider this hypothetical, "If anything is a witch, then it has supernatural powers." This hypothetical clearly is more probable than the corresponding I-proposition, "Some witches have supernatural powers." This hypothetical, being necessarily true, has a probability-value of one, while the I-proposition has a probability-value (we would guess to be) very near zero. Thus the objection to Repair #3 does not hold up.

Revised Objection to Repair #3: The criticism is well-taken against the Objection. It does not, however, tell us how to make the needed repair; indeed all it tells us is that our statement of the Objection to Repair #3 was careless and in need of revising. Let it be granted that should a universally general hypothetical be necessary, then its probability-value in all worlds will be one. But every respectable theory of probability embodies this thesis and it has no bearing on the current problem. The crucial point concerns non-necessary universally general hypotheticals.

Perhaps the spirit of the Reply citing the case of witches is that the solution to the problem lies in looking at hypotheticals whose subject classes are (for all we know) empty. Let us try one. Consider: "If anything is a witch, then it wears a pointed hat." Do we want to say of this proposition, as we would be required by the theory here being criticized, that it is more probable than the corresponding I-proposition in small worlds and less in large? Surely not.

Thus the Objection to Repair #3 stands. Translating "(x)(Φx ⊃ Ψx)" as a hypothetical, even one with no existential commitment, does not solve the paradox.

Repair #4: Obviously more drastic methods are going to be needed than any that appear above. Let us examine the original version of the paradox rather more closely. As it is presented, the argument would seem to commit the fallacy of equivocation. [Note 5]  It speaks at one point simply of "equivalence" between certain general propositions and their singular expansions. And then at a later point it proceeds to transfer the probability-values from each of the singular expansions back to the general statements from which they were generated. This transference of probability-values would clearly be warranted if the equivalences in question were logical (or analytic ones), but they are not: they are material equivalences. The fallacious inference is being disguised by speaking carelessly of "equivalence" tout court, rather than of "material equivalence".

But if in general it is fallacious to assign equi-probability-values to each member of a pair of materially equivalent propositions, [Note 6]  it need not be improper to do so for selected pairs of such propositions. The fallacy can be repaired by viewing the argument as an enthymeme requiring an additional premise. In essence we require an axiom which would allow us in a language Ln to assign the same probability-value to a general statement as we do to its singular expansion in that language. This axiom would read [Note 7]
Pn(p) = P(∑n(p)),
n[(x)(Φx ⊃ Ψx)] =df (~Φa1 ∨ Ψa1) & (~Φa2 ∨ Ψa2) & ... & (~Φan ∨ Ψan)
n[(∃x)(Φx & Ψx) =df (Φa1 & Ψa1) ∨ (Φa2 & Ψa2) ∨ ... ∨ (Φan & Ψan).
Now it is clear that this axiom characterizes both Carnap's system, c*, and indeed much of our recent thinking about logical probability. But having now explicitly brought it out in the open, we can see that we are not bound to accept this axiom; indeed there is good reason to reject it. For if we discard this axiom, both the qualitative and quantitative versions of the paradox are avoided.

Reply to Repair #4: There is an intuitive obviousness to the axiom we have just exposed, and consequently we shall require fairly persuasive arguments that it is on just this point that the system should be modified. Perhaps in newer theories of probability we can come to be persuaded that this particular axiom is dispensable. We shall have to wait to see. But in the meantime, as we test new theories to see whether they successfully avoid the many traditional paradoxes, we might with profit also inquire whether they avoid the new one here revealed. Any that do not, would seem to harbor a serious shortcoming.


  1. C* is a functor defined on an ordered pair of propositions, the first of which is usually referred to as the "hypothesis" and the second, the "evidence". This functor is intended to be a measure of the degree to which the second proposition entails, or probabilities, the first; it is a measure of the relative probability of the first proposition given the second. However, a measure of the absolute probability of a proposition can be calculated using this functor by simply letting the second member of the ordered pair be any necessarily-true proposition whatever (symbolized indiscriminantly by "t".) The absolute probability of a proposition is its probability in the absence of any contingent evidence. Return

  2. Topologically similar results also hold for Carnap's earlier system, c†, where for the first language described above, the A- and I-values are 75% and 25% respectively, and for the second language, 56.25% and 43.75%. Return

  3. Note that in this system of quantitative probability, the paradox persists into a world of two individuals even though the (particular) I-proposition ceases to entail the (universal) A-proposition. We have here derived the paradox in an instance where it is underivable on qualitative considerations alone. Return

  4. For a detailed discussion of the distinction between veridical and falsidical paradoxes see [4]. Return

  5. I owe this point to my colleague, R.D. Bradley. Return

  6. It is easy to see why this is a fallacy. Like-(truth)-valued propositions need not be equiprobable. Material equivalence has to do with truth-value in this (actual) world; equi-probability has to do with the number of possible worlds in which the propositions hold. Return

  7. The "P" which occurs to the right hand side of the "=" need not be subscripted. Since its argument is a quantifier-free proposition, its value is not to be a function of n. Return

[1]CARNAP, R. Logical Foundations of Probability, Chicago, 1950 (Second edition, 1962).
[2]GOODMAN, N. Fact, Fiction, and Forecast, Indianapolis, 1965.
[3]HEMPEL, C. Aspects of Scientific Explanation, New York, 1965.
[4]QUINE.W.V. "The Ways of Paradox", in The Ways of Paradox, New York, 1966, pp. 3-20.
[5] SALMON, W. "Vindication of Induction", in Current Issues in the Philosophy of Science, ed. by H. Feigl and G. Maxwell, New York, 1961, pp. 245-256.
[6]SCHEFFLER, I. The Anatomy of Inquiry, New York, 1963.

Return/transfer to Norman Swartz's Philosophical Notes

Return/transfer to Norman Swartz's Home Page