Philosophia Vol. 3 Nos. 2-3 Pp. 167-178 April-July 1973
This paper is a slightly revised version of a paper read (May 31, 1972) before
the Canadian Society for the Study of the History and Philosophy of Science. The
Society's meeting was held under the auspices of the Learned Societies, at its
Conference at McGill University, Montreal, Quebec.
ABSOLUTE PROBABILITY IN SMALL WORLDS:
A NEW PARADOX IN PROBABILITY THEORY NORMAN SWARTZ
INTRODUCTION: THE PARADOX STATED IN QUALITATIVE TERMS
I wish to draw attention to a new, simply stateable, but difficult paradox in probability theory. For a finite universe of discourse, if more probable that all ravens are black than that
some ravens are black!
Where, if at all, has this argument gone astray? If the result is unacceptable, some premise or premises in the preceding argument must be given up. All, however, seem to be fairly well-established truths of formal logic, its standard semantics, and probability theory. In this paper various solutions to the paradox are examined. I shall use Carnap's system of confirmation as a point of reference because it embodies this paradox, and moreover does so in a quantitative fashion, assigning specific numbers to the probabilities in question. The paradox, however, it should already be clear, is not unique to Carnap's system but seems to be indigenous to much fairly recent probability theory. THE PARADOX DERIVED QUANTITATIVELY
Carnap's early attempts circa 1950 ([1]) to construct a quantitative measure, the
system c*, of the degree of confirmation or logical probability
obtaining between any two propositions is justifiably well-admired for
its ambition and verve. In subsequent years certain well-known flaws and
alleged flaws in that theory have been pointed out. We mention three,
each of which in its own way bears on the problem at hand.
One is the fact that in (viz. an artificial language consisting of an infinite number of individual constants and a finite number of monadic predicates), the probability of any universally general proposition on finite evidence is zero. Two is the fact, emphasized by Salmon ([5]), that the system is linguistically variant, that is, the probability of a given hypothesis on unchanging evidence varies from language to language as the number of predicates in a 'family' changes. For example, for a fixed population of individuals, the probability that all apples are red on the evidence that one particular apple is red decreases monotonically as more predicates are added to the 'color-family' in successive languages. And three is the fact that this early system of Carnap's embodies perfectly the so-called "Raven Paradox" discovered by Hempel ([3], chapter 1): insofar as These features of the system are, as remarked, now well-known. Nonetheless, I for one would not consider the first two of these to be flaws of the system. Rather I think they graphically point out the differences between a measure of logical probability and a measure of a posteriori probability. The first two at
least of these corollaries of the system seem to me to be precisely
right and if they are incompatible with our pre-analytic beliefs or
expectations in this matter of partial entailment then so much the worse
for our pre-analytic beliefs: they ought to be revised accordingly. As
regards the conceptual re-adjustments needed to accommodate Hempel's
Paradox I am less sure where to make them: whether to readjust our
concept of just what a confirming instance is, or whether to preserve a
'near-Nicodian' view of confirming instances and instead condemn
Carnap's (and others') systems of confirmation.
To these three peculiarities, whatever we may say about them, we must now add another. This newly uncovered feature, like the one Salmon calls our attention to, involves a curious systematic variation in probability-values from language to language. But where Salmon's concerned the change in values as a function of the number and kind of predicates, this new peculiarity involves a change brought about as a function of the number of individuals. Consider first an exceedingly simple language consisting of two predicates, "R" (raven) and "B" (black) and one individual constant, "a". What, in this language, is the a priori or absolute logical
probability that all ravens are black? We wish to calculate the functor,
c*[(x)(Rx ⊃ Bx), t]. [Note 1]The algorithmic aspects of the system are now standard fare in textbooks in probability theory, and an application of their methods readily yields the result that c*[(x)(Rx ⊃ Bx), t] = ¾ or 75%.Concomitantly for this particular language, the c*-value of the corresponding I-proposition (viz., Consider now increasing our model by one individual constant. It is again an easy exercise to calculate the absolute logical probability of the A and I propositions. We find these results: c*[(x)(Rx ⊃ Bx), t] = 60%.We already know from our discussion of the first of the three features recalled above that in , originate. The A-curve starts at a value greater than 50% and falls,
while the I-curve starts at a value less than 50% and rises. There must
then be a point of intersection and it occurs for some wholly arbitrary
number of individual constants. [Note 2]
In short, then, we have again derived the paradox but this time in quantitative terms: for the system, c*, in a world of one individual in the absence of any contingent evidence it is three times as likely that all ravens are black (75%) than that some ravens are black (25%); and in a world of two individuals the former hypothesis is half again as likely (60%) as the latter (40%). [Note 3] And as if these results were not sufficient cause for conceptual discomfort, still another, albeit related, paradox can be derived. The substitution of a negated term for an unnegated one in any hypothesis will leave the absolute logical probability of the resulting proposition unchanged from the original one. Thus, for example, in , all ravens are black is greater than 50%, (which it is
in this case), then the probability that no ravens are black ought to be
less than 50%, (which it is not in this case.)
ATTEMPTED SOLUTIONS OF THE PARADOX
How, then, shall we accommodate these various significant departures from our intuitive expectations regarding all- and some-propositions? Our choices seem to be limited to two. We can regard these paradoxes as veridical ones which merely reveal the unexpected but necessary consequences of relatively immutable laws of logic and probability theory. [Note 4] Or, eschewing this choice, we can regard these paradoxes as being falsidical ones revealing a fatal flaw somewhere within the theory itself. My own predilection is to favor the latter view, to regard these paradoxes as falsidical ones. But before I proceed to examine ways one might try to modify the theory in order to avoid these paradoxes, I would like to examine briefly the argument which might be adduced in support of the opposite point of view, that of regarding the paradoxes as being veridical. If the paradoxes are to be regarded as veridical, we shall have to argue that the conflict that arises between the results derived and the presupposition that all-propositions are in every case less probable than are the corresponding some-propositions is resolvable by recognizing the falsity of the presupposition. Such an argument would insist upon the correctness of the various premises we used in initially deriving the first paradox. To repeat, in a world of one individual, a, the hypothesis, Shall we accept this argument? I leave the final choice to the reader. But in the meantime I review four attempts (and their attendant problems) to solve the problem by the other route, that of preserving the described presupposition which I am sure, to some readers at least, would appear to be not quite so readily dismissible as the above argument would have us believe. Repair #1: No
doubt the way that comes to mind most readily to correct the excessive
probability-value deduced for the all-propositions (both the A and the
E), is to revert from the Boolean rendition to a modern-day Aristotelian
one, that is, one in which there is added to the hypothesis an explicit
conjunct stating that the class referred to by the subject term of the
hypothesis is non-empty. Obviously, if the A and E propositions are
expressed as,
"(x)(Rx ⊃ Bx) & (∃x)(Rx)" and "(x)(Rx ⊃ ~Bx) & (∃x)(Rx)",then their respective probabilities can be no greater than those of the I and O propositions which they respectively entail. For the case of a world of just one individual, each member of the pairs, A and E, and E and O, will imply the other member. This is in conformity with another expectation regarding probabilities: in a world of only one individual the probability that all ravens are black should be precisely equal to the probability that some raven is black. There is, then, adopting this repair no crossing of the A and I curves; they each start at the value 25% for the case of one individual and diverge thereafter, the A curve approaching the value zero and the I curve, the value one. Objection to Repair #1: The cost involved in our theory of logical
probability in reverting to quasi-Aristotelian formulations is much
remarked upon in the literature which has sprung up in response to
Hempel's paradox for which the same solution has also been offered.
Scheffler ([6], pp. 261-263) reviews various problems with this
attempted solution. Two of his most important objections are these:
first, A-propositions can be cast into a variety of logically equivalent
forms with non-equivalent subject terms so that it is indeterminate as
to which existential assumption should be made explicit and conjoined to
the Boolean expression; and second he remarks that certain hypothetical
reasoning explicitly eschews the existential component.
Repair #2: Another way to preserve the presupposition in question is to resolve to
restrict the application of probability theory to languages describing
fairly populous worlds. We simply acknowledge the inadequacy of the
system to deal with small worlds and let it go at that.
While this repair does smack of ad hocness, it is not without familial precedent in logic. One is reminded of the "virtuous circle" spoken of by Nelson Goodman ([2], pp. 62-66) in which formalized logic and intuitive logic successively interact to modify one another. Restrictions in applicability of various logical operations are already countenanced in logic, e.g., contraposition is invalid for the I-proposition; this one more can be tolerated. Objection to Repair #2: This repair is seriously
disanalogous with the case to which it is compared, i.e.,
contraposition for the I-proposition. An argument to be judged valid
must be such that in any possible world, if the premises are true, then
so too will the conclusion. Remember that the measure of the absolute
logical probability of an hypothesis can be construed as a measure of
the degree to which that hypothesis is entailed by a necessary
proposition. While we are prepared to allow that the degree of
entailment should depend on the particular language used (again recall
feature #1 above) we are not prepared to allow a system to give
erroneous results for some languages, e.g. L_{1} and L_{2}
which give rise to
descriptions of small worlds. It is a violation of the very spirit of
the logical enterprise to make the correctness of the measure of
entailment depend on the particular world described.
Repair #3: There
is, of course, no formal paradox in the above deductions. The trouble
occurs in trying to outfit Carnap's system with a semantics, in
particular, in interpreting expressions of the form Objection to Repair #3: We
shall in any case desire a complete semantics for the system. What
translational rule shall we then associate with
hypothetical proposition, "If anything is a raven, then it is black."
Even of this proposition, it seems intuitively clear that it, too, just
like the propositions, "All ravens are black," ought in every case to be
less probable than the corresponding I-proposition, "Some ravens are
black." The suggestion that we render, Reply to Objection #3: The
immediately foregoing objection to Repair #3 is too hasty. It is not the
case that we would wish to say of every universally general hypothetical
that it ought to be less probable than the corresponding I-proposition.
Consider this hypothetical, "If anything is a witch, then it has
supernatural powers." This hypothetical clearly is more probable than
the corresponding I-proposition, "Some witches have supernatural
powers." This hypothetical, being necessarily true, has a
probability-value of one, while the I-proposition has a
probability-value (we would guess to be) very near zero. Thus the
objection to Repair #3 does not hold up.
Revised Objection to Repair #3:
The criticism is well-taken against the Objection. It does not,
however, tell us how to make the needed repair; indeed all it tells us
is that our statement of the Objection to Repair #3 was careless and in
need of revising. Let it be granted that should a universally general
hypothetical be necessary, then its probability-value in all worlds will
be one. But every respectable theory of probability embodies this thesis
and it has no bearing on the current problem. The crucial point concerns
non-necessary universally general hypotheticals.
Perhaps the spirit of the Reply citing the case of witches is that the solution to the problem lies in looking at hypotheticals whose subject classes are (for all we know) empty. Let us try one. Consider: "If anything is a witch, then it wears a pointed hat." Do we want to say of this proposition, as we would
be required by the theory here being criticized, that it is more probable
than the corresponding I-proposition in small worlds and less in large?
Surely not.
Thus the Objection to Repair #3 stands. Translating not solve the paradox.
Repair #4: Obviously more drastic methods are
going to be needed than any that appear above. Let us examine the
original version of the paradox rather more closely. As it is presented,
the argument would seem to commit the fallacy of
equivocation. [Note 5]
It speaks at one point simply of "equivalence" between certain general
propositions and their singular expansions. And then at a later point it
proceeds to transfer the probability-values from each of the singular
expansions back to the general statements from which they were
generated. This transference of probability-values would clearly be
warranted if the equivalences in question were logical (or analytic
ones), but they are not: they are material equivalences. The fallacious
inference is being disguised by speaking carelessly of "equivalence"
tout court, rather than of "material equivalence".
But if in general it
is fallacious to assign equi-probability-values to each member of a
pair of materially equivalent
propositions, [Note 6]
it need not be improper to
do so for selected pairs of such propositions. The fallacy can be
repaired by viewing the argument as an enthymeme requiring an additional
premise. In essence we require an axiom which would allow us in a
language L_{n} to assign the same probability-value to a general statement
as we do to its singular expansion in that language. This axiom would
read [Note 7]
Pwhere, ∑and ∑Now it is clear that this axiom characterizes both Carnap's system, c*, and indeed much of our recent thinking about logical probability. But having now explicitly brought it out in the open, we can see that we are not bound to accept this axiom; indeed there is good reason to reject it. For if we discard this axiom, both the qualitative and quantitative versions of the paradox are avoided. Reply to Repair #4: There is an
intuitive obviousness to the axiom we
have just exposed, and consequently we shall require fairly persuasive
arguments that it is on just this point that the system should be
modified. Perhaps in newer theories of probability we can come to be
persuaded that this particular axiom is dispensable. We shall have to
wait to see. But in the meantime, as we test new theories to see whether
they successfully avoid the many traditional paradoxes, we might with
profit also inquire whether they avoid the new one here revealed. Any
that do not, would seem to harbor a serious shortcoming.
NOTES
- C* is a functor defined on
an ordered pair of propositions, the first of which is usually referred
to as the "hypothesis" and the second, the "evidence". This functor is
intended to be a measure of the degree to which the second proposition
entails, or probabilities, the first; it is a measure of the relative
probability of the first proposition given the second. However, a
measure of the absolute probability of a proposition can be calculated
using this functor by simply letting the second member of the ordered
pair be any necessarily-true proposition whatever (symbolized
indiscriminantly by "t".) The absolute probability of a proposition is
its probability in the absence of any contingent evidence. Return
- Topologically similar results also hold for Carnap's earlier system, c†,
where for the first language described above, the A- and I-values are
75% and 25% respectively, and for the second language, 56.25% and
43.75%. Return
- Note that in this system of quantitative probability, the
paradox persists into a world of two individuals even though the
(particular) I-proposition ceases to entail the (universal)
A-proposition. We have here derived the paradox in an instance where it
is underivable on qualitative considerations alone. Return
- For a detailed
discussion of the distinction between veridical and falsidical paradoxes
see [4]. Return
- I owe this point to my colleague, R.D. Bradley. Return
- It is easy
to see why this is a fallacy. Like-(truth)-valued propositions need not
be equiprobable. Material equivalence has to do with truth-value in this
(actual) world; equi-probability has to do with the number of possible
worlds in which the propositions hold. Return
- The "P" which occurs to the
right hand side of the "=" need not be subscripted. Since its argument
is a quantifier-free proposition, its value is
*not*to be a function of*n*. Return
BIBLIOGRAPHY
Return/transfer to Norman Swartz's Home Page |