While poking around on JSTOR (thanks, grad school!) I found an interesting article in the *Oxford Journal of Legal Studies* called “**Probability – The Logic of the Law**“. In it, Bernard Robertson and G. A. Vignaux argue that probability is, you guessed it, the logic behind legal analysis and arbitration.

So not only do we have arguments in favor of probability being the logic of science (Jaynes), probability being the logic of historical analysis (Tucker, Carrier), but we now have an argument that probability is the logic of the legal world, too.

Here’s how Robertson and Vignaux derive Bayes Theorem in the article:

It has been argued that the axioms of probability do not apply in court cases, or that court cases out not to be thought about in this way even if they do apply. Alternatively, it is argued that some special kind of probability applies in legal cases, with its own axioms and rules… with the result that conventional probability has become known in the jurisprudential word

Pascalian… In practice one commonly finds statements such as:The concept of ‘probability’ in the legal sense is certainly different from the mathematical concept; indeed, it is rare to find a situation in which these two usages co-exist, although when they do, the mathematical probability has to be taken into assessment of probability in the legal sense and given its appropriate weight

This paper aims to show that this view is based upon a series of false assumptions.

The authors then go into some detail about common objections to the “mathematical” view of probability and why people think it doesn’t apply to the law:

1. Things either Happen or They Don’t; They Don’t Probably Happen

An example of this argument is provided by Jaffee: ‘Propositions are true or false; they are not “probable”

2. A Court is Concerned not with Long Runs but with Single Instances

Thus, descriptively:

Trials do not typically involve matters analogous to flipping coins. They involve unique events, and thus there is no relative

frequency[my emphasis] to measureAnd normatively:

Application of substantive legal principles relies on, and due process considerations require, that triers must make individualistic judgements about how they think a particular event (or series of events) occurred

3. Frequency Approaches Hide Causes and Other Relevant Information which Should Be Investigated

For an extended example of this argument see Ligertwood

Australian Evidence(p14)4. Evidence Must Be Interpreted

The implicit conception [in the probability debate] of ‘evidence’ is that which is plopped down on the factfinder at trial… the evidence must bear its own inferences… each bit of evidence manifests explicitly its characteristics. This assumption is false. Evidence takes on meaning for trials only through the process of being considered by a human being… the underlying experiences of each deliberator become part of the process, yet the probability debates proceed as though this were not so

5. People Actually Compare Hypotheses

Meaning is assigned to trial evidence through the incorporation of that evidence into one or more plausible stories which describe ‘what happened’ during events testified to at trial …The level of acceptance will be determined by the coverage, coherence and uniqueness of the ‘best’ story.

6. Assessment of Prior Odds ‘Appears to Fly in the Face of the Presumption of Innocence’

7. The Legal System is Not Supposed to be Subjective

Allen refers to

the desire to have disputes settled by reference to reality rather than the subjective state of mind of the decision maker

As you can see, a lot of the objections to probability here are continually raised in the frequentist vs. Bayesian interpretation of probability. But following in the steps of E. T. Jaynes, Robertson and Vignaux demonstrate that probability can be derived from some basic assumptions about propositional logic.

The authors then go on to explain the different “types” of probability, which is probably (heh) sowing confusion:

A prioriprobability refers to cases where there are a finite number of possible outcomes each of which is assumed to be equally probable. Probability refers to the chance of a particular outcome occurring under these conditions. Thus there is a 1 in 52 chance of drawing the King of Hearts from a pack of cards under these conditions and the axioms of probability can be used to answer questions like: ‘what is the probability of drawing a red court card?’ or ‘what is the probability of drawing a card which is (n)either red (n)or a court card?’Empirical probability refers to some observation that has been carried out that in a series Y event X occurs in a certain proportion of cases. Thus surveys of weather, life expectancy, reliability of machinery, blood groups, will all produce figures which may then be referred to as the probability that X will occur under conditions Y.

Subjective probability refers to a judgement as to the chances of some event occurring based upon evidence. Unfortunately, Twining treats any judgement a person might make and might choose to express in terms of ‘probability’ as a ‘subjective probability’. This leads him to say that subjective probabilities ‘may or may not be Pascalian’.

[…]

This analysis of probability into different types invites the conclusion that ‘mathematical probability’ is just one type of probability, perhaps not appropriate to all circumstances…

The adoption of any of the definitions of probability other than as a measure of strength of belief can lead to an unfortunate effect known as the Mind Projection Fallacy. This is the fallacy of regarding probability as a property of objects and processes in the real world rather than a measure of our own uncertainty. [my emphasis]

An instance of this fallacy is something called the Gambler’s fallacy. Indeed, in that post of mine I pretty much wrote what I emphasized in the quote above.

The authors then point out something pretty obvious: That flipping a coin is subject to the laws of physics. If we knew every single factor that went into each coin toss (e.g., strength of the flip, density of the air, the angle in which it was flipped, how long it spins in the air, the firmness of the surface it is landing on, etc.) we would know which side of the coin would be facing up without any uncertainty.

However, we *don’t* know every factor that goes into a coin toss, or drawing cards from a deck, or marbles from a jar (including the social influences of the marble picker). So there is a practical wall of separation between epistemology and ontology; a wall between how we know what we know and the *actual nature* of what we’re observing.

The authors continue with three minimal requirements for rational analysis of competing explanations:

Desiderata:

1. If a conclusion can be reasoned out in more than one way, then every possible way should lead to the same results.

2. Equivalent states of knowledge and belief should be represented by equivalent plausibility statements. Closely approximate states should have closely approximate expressions; divergent states should have divergent expressions.

The only way consistently to achieve requirement 2 is by the use of real numbers to represent states of belief. It is an obvious requirement of rationality that if A is greater than B and B is greater than C then A must be greater than C. It will be found that any system which obeys this requirement will reduce to real numbers. Only real numbers can ensure some uniformity of meaning and some method of comparison.

3. All relevant information should be considered. None should be excluded for ideological reasons. If this requirement is not fulfilled then obviously different people could come to different conclusions if they exclude different facts from consideration.

Clearly the legal system does exclude evidence for ideological reasons. Rules about illegally obtained evidence and the various privileges constitute obvious examples. It is important therefore, that there should be some degree of consensus as to what information is to be excluded in order to prevent inconsistent results. It is also important that we are explicit about exclusions for ideological reasons and do not pretend to argue that better decisions will be made by excluding certain evidence. This pretence is one of the justifications for the hearsay rule, for example, and it is clear from these cases from a variety of jurisdictions that judges are increasingly impatient with this claim.

The next section I will try to sum up where possible:

Rules to Satisfy the Desiderata:

1. The statement ‘A and B are both true’ is equivalent to the statement ‘B and A are both true’.

2. It is certainly true that A is either true or false.

The statment ‘A and B are both true’ can be represented by the symbol ‘AB’. So proposition 1 becomes ‘AB = BA’.

This is the basic rule for conjunction in propositional logic. P ^ Q is equivalent to Q ^ P.

How do we assess the plausibility of the statement AB given certain information I, symbolically P(AB | I)?

First consider the plausibility of A given I, P(A | I), then the plausibility of B given I and that A is true, P(B | A, I)… Thus in order to determine P(AB | I) the only plausibilities that need to be considered are P(A | I) and P(B | A, I). Since P(BA | I) = P(AB | I) (above)… [c]learly, P(AB | I) is a function of P(A | I) and P(B | A, I) and it can be show that the two terms are simply to be multiplied. This is called the ‘product rule’.

And because of the product rule, and because of requirement 2 above, the numbers we should assign to our certainties of “absolutely true” and “absolutely false” are 1 and zero, respectively.

Next, since we know that absolute certainty is 1, then the statement P(A, ~A) — that is, the probability that A is true or false — should be 1. And from that it follows that if P(A) + P(~A) = 1, then however much P(A) increases, P(~A) is equal to 1 minus P(A). This, the authors call the addition rule.

We may wish to assess how plausible it is that at least one of A or B is true…

P(A or B) = P(A)P(B | A) + P(A)P(~B | A) + P(~A)P(B | ~A)

Now, the first two terms on the right hand side can be expressed as:

P(A)P(B | A) + P(A)P(~B | A) = P(A)P(B or ~B | A) = P(A, B or ~B) = P(A)

And the third term, P(~A)P(B | ~A) as P(B, ~A) by the product rule.

Hence P(A or B) = P(A) + P(B, ~A).

This means that if we are interested in a proposition, C, which will be true if either (or both) A or B is true we can assess the probability of C from those of A and B. Thus, if the defendant is liable if either (or both) of two propositions were true then the probability that the defendant is liable is equal to the union of the probabilities of the two propositions. Courts appear to find this rule troublesome. The Supreme Court of Canada applied it correctly in

Thatcher v The Queenbut in New Zealand the Court of Appeal failed to apply it in R v Chingell and the High Court failed to apply it inStratford v MOT.3. If P(A | I)P(B | A, I) = P(B | I)P(A | B, I) (the product rule) then if we divide both sides of the equation by P(B | I) we get

P(B | I)P(A | B, I) / P(B | I) = P(A | I)P(B | A, I) / P(B | I)

The two P(B | I)’s on the left hand side cancel out and we have

P(A | B, I) = P(A | I)P(B | A, I) / P(B | I)

This is

Bayes’ Theorem.

From here, the authors begin going over objections to probability and its utility in the law; objections that are borne of the misconceptions about probability and its utility *outside* the law. Most of these objectsions, in fact, are due to a frequentist view of probability; thinking of probability as a fundamental aspect of the object or event we’re looking at instead of a description of our uncertainty. As a matter of fact, that view should be put to rest by the authors’ demonstration of only using logic to derive Bayes Theorem. At no point did they use frequencies or any appeal to the nature of an object.

I did read one response to this article in the same publication in JSTOR, but it amounted to basically “This would be really hard to do” and not “this is invalid and/or it doesn’t follow from the rules of logic”.