# Category Archives: Bayes

## How do you explain the mysterious beauty of this planet without referring to a supreme being?

I’m from NYC. This question would be the equivalent of me saying “how do you explain that I was born in the most awesome city on the planet without a supreme being?”

Of course, almost everyone says their home town is the best ever. Why do you think that is? I think the answer to that is the same as the answer to your question.

But let’s get a bit deeper into the assumptions behind your question. What’s your logical link from “Earth is beautiful” to “therefore a supreme being”? In other words, what makes something a good explanation?

If I were to say that something is a chair, there are qualities that chairs have in common that define them as chairs instead of beanbags: Chairs have four legs, a back support part, a part to sit on, etc. There should be some similar consistent criteria for what constitutes a good explanation, and why you think this creates the necessary link between “Beautiful Earth” and “Supreme Creator”.

If you get home late and your boyfriend/girlfriend asks why you’re late, what would be a good explanation? Why is “I got stuck in traffic” better than “I was kidnapped by aliens”? We know the former is more believable, but why?

Well, you might say something like “traffic causes people to be late more than getting kidnapped by aliens does”. And that would be correct. But I argue that this isn’t enough to separate good explanations from bad explanations, and it isn’t enough to explain why your link from “Beautiful Earth” to “Supreme Being” is a strong or weak link.

Since this isn’t a dialog, I’ll have to just explain another quality of a good explanation: Good explanations are specialized. Meaning, they explain what they intend to explain and that’s it. An explanation that can be used to explain some situation, but then can also be used to explain its polar opposite, isn’t a good explanation.

So, if instead of getting home late, you got home early, and your boyfriend/girlfriend asks why you’re early, then saying “because I got stuck in traffic” doesn’t make sense. The stuck-in-traffic explanation is specialized for only making people late. But “I got kidnapped by aliens” works just as well for making someone late as it does for making someone early. Once you invoke aliens, then anything is possible.

Let me repeat that last sentence more generally: Once you invoke [bad explanation], anything is possible.

This is a real important concept to grasp. Bad explanations, because they’re not specialized, allow for any possible outcome. And the more possibilities your explanation allows, the less likely it is that your explanation is responsible for a specific problem. There’s only one explanation that can allow for any possible outcome: Pure randomness.

Both qualities of good explanations I’ve enumerated here — a good explanation is more commonplace (e.g., “traffic causes people to be late more than getting kidnapped by aliens does”) and more specialized — follow directly from probability theory. So they’re not things I’ve just made up.

So back to the question at hand: How do you explain the mysterious beauty of this planet without referring to a supreme being? Why do you think a supreme being is a good explanation? Are supreme beings commonplace? Are supreme beings only responsible for beauty, or is anything possible for a supreme being?

I think we know the answers to those questions.

Comments Off on How do you explain the mysterious beauty of this planet without referring to a supreme being?

Posted by on May 14, 2018 in Bayes, Quora answers, religion

## The Monty Hall Problem Refutes Your Religion

Well the title of this post is a bit inflammatory. So I won’t be arguing that it “refutes” your religion, but will be arguing more that it’s weak Bayesian evidence against your religion.

So. The Monty Hall problem is an illustration of how our intuitions of probability don’t always match up with reality. In its original formulation, you’re given a choice between three doors. One door has a prize, the other two do not. If you choose one of the doors, then another door that doesn’t have a prize is shown to you. You then have the option of staying with the door you chose or switching doors.

Most people think that it either doesn’t matter whether you switch or that switching lowers your probability of winning. Neither of those is true!

Your initial probability of winning the prize is 1 out of 3. Once one of the doors is opened, the probability that you had picked the correct door stays at 1 out of 3 whereas the other non-picked door now contains the remaining probability of 2 out of 3. Because you have to do a Bayesian update once new information — in this case, the one door revealed to not have the prize — is introduced.

I’ve gone over this before. Yet, I want to add an additional wrinkle to the problem to make intuition fall more in line with Bayesian reasoning.

If, instead of picking one door out of three to win the prize, what if it were one door out of 100? And once you’ve made your selection, 98 other doors are opened up to show that they have no prize, leaving only your choice and one other unknown door? In this case it seems more obvious that something is suspicious about the only other door that wasn’t opened up. And this intuition lines up with a Bayesian update using the same scenario:

P(H): 1 out of 100 or .01

P(~H): 99 out of 100, or .99

P(E | H): Probability of all other doors besides yours and one other being opened to reveal no prize given that you’ve picked the correct door: 100%.

P(E | ~H): Probability of all other doors besides yours and one other being opened to reveal no prize given that you’ve picked the incorrect door is 100%.

This is an easy Bayesian update to do. Both conditional probabilities, P(E | H) and P(E | ~H) are both 100%. Meaning the likelihood ratio is 1, and your posterior probability is the same as your prior probability. But now your selection is still 1 out of 100 and the only other remaining door has a probability of 99 out of 100 of having a prize! So in this case, both Bayesian reasoning and intuition line up: There is something suspicious about the only other door that wasn’t opened.

How does this relate to religion? Specifically, the religion that you grew up with?

Using Willy Wonka’s logic in the meme above, the chance that you just happened to grow up with the correct religion is pretty low. Instead of the chance of picking the correct door out of 3, or out of 100, you’ve picked a door out of thousands of religions; many of which no longer exist. They are “opened doors” revealing no prize in the analogy.

So a Bayesian update will work the same way as it did with picking one door out of 100. Meaning, your religion is probably wrong. And you should probably switch religions. The only reason I say this is weak Bayesian evidence is because there are still a few religions to choose from. But their joint probability of being correct is yet higher than the single chance that your family religion is the correct one.

Analogously, it would be like if, say, you had a choice between choosing one door out of 10,000, and after your choice all but 10 of the doors are closed. Your initial chance of having chosen the correct door is still 1 out of 10,000, but the 10 doors that remained open after closing the rest have a joint probability of 9,999 out of 10,000 of being the correct door: Those 10 other doors individually have (approximately) 10% chance of being the correct door. As opposed to your original selection’s probability of 1 out of 10,000.

So the Monty Hall problem is weak Bayesian evidence against your religion.

Posted by on March 5, 2018 in Bayes, religion

## Can Subjective Probability Be Expressed As A Number? What Does The CIA Say?

The Psychology of Intelligence Analysis

Amazon.com summary:

This volume pulls together and republishes, with some editing, updating, and additions, articles written during 1978-86 for internal use within the CIA Directorate of Intelligence. The information is relatively timeless and still relevant to the never-ending quest for better analysis. The articles are based on reviewing cognitive psychology literature concerning how people process information to make judgments on incomplete and ambiguous information. Richard Heur has selected the experiments and findings that seem most relevant to intelligence analysis and most in need of communication to intelligence analysts. He then translates the technical reports into language that intelligence analysts can understand and interpreted the relevance of these findings to the problems intelligence analysts face.

Money quote, chapter 12 pages 152 – 156

Expression of Uncertainty

Probabilities may be expressed in two ways. Statistical probabilities are based on empirical evidence concerning relative frequencies. Most intelligence judgments deal with one-of-a-kind situations for which it is impossible to assign a statistical probability. Another approach commonly used in intelligence analysis is to make a “subjective probability” or “personal probability” judgment. Such a judgment is an expression of the analyst’s personal belief that a certain explanation or estimate is correct. It is comparable to a judgment that a horse has a three-to-one chance of winning a race.

Verbal expressions of uncertainty—such as “possible,” “probable,” “unlikely,” “may,” and “could”—are a form of subjective probability judgment, but they have long been recognized as sources of ambiguity and misunderstanding. To say that something could happen or is possible may refer to anything from a 1-percent to a 99-percent probability. To express themselves clearly, analysts must learn to routinely communicate uncertainty using the language of numerical probability or odds ratios. As explained in Chapter 2 on “Perception,” people tend to see what they expect to see, and new information is typically assimilated to existing beliefs. This is especially true when dealing with verbal expressions of uncertainty.

By themselves, these expressions have no clear meaning. They are empty shells. The reader or listener fills them with meaning through the context in which they are used and what is already in the reader’s or listener’s mind about that context. When intelligence conclusions are couched in ambiguous terms, a reader’s interpretation of the conclusions will be biased in favor of consistency with what the reader already believes. This may be one reason why many intelligence consumers say they do not learn much from intelligence reports.

It is easy to demonstrate this phenomenon in training courses for analysts. Give students a short intelligence report, have them underline all expressions of uncertainty, then have them express their understanding of the report by writing above each expression of uncertainty the numerical probability they believe was intended by the writer of the report. This is an excellent learning experience, as the differences among students in how they understand the report are typically so great as to be quite memorable.

In one experiment, an intelligence analyst was asked to substitute numerical probability estimates for the verbal qualifiers in one of his own earlier articles. The first statement was: “The cease-fire is holding but could be broken within a week.” The analyst said he meant there was about a 30-percent chance the cease-fire would be broken within a week. Another analyst who had helped this analyst prepare the article said she thought there was about an 80-percent chance that the cease-fire would be broken. Yet, when working together on the report, both analysts had believed they were in agreement about what could happen. Obviously, the analysts had not even communicated effectively with each other, let alone with the readers of their report.

Sherman Kent, the first director of CIA’s Office of National Estimates, was one of the first to recognize problems of communication caused by imprecise statements of uncertainty. Unfortunately, several decades after Kent was first jolted by how policymakers interpreted the term “serious possibility” in a national estimate, this miscommunication between analysts and policymakers, and between analysts, is still a common occurrence.

I personally recall an ongoing debate with a colleague over the bona fides of a very important source. I argued he was probably bona fide. My colleague contended that the source was probably under hostile control. After several months of periodic disagreement, I finally asked my colleague to put a number on it. He said there was at least a 51-percent chance of the source being under hostile control. I said there was at least a 51-percent chance of his being bona fide. Obviously, we agreed that there was a great deal of uncertainty. That stopped our disagreement. The problem was not a major difference of opinion, but the ambiguity of the term probable.

The table in Figure 18 shows the results of an experiment with 23 NATO military officers accustomed to reading intelligence reports. They were given a number of sentences such as: “It is highly unlikely that. . . .” All the sentences were the same except that the verbal expressions of probability changed. The officers were asked what percentage probability they would attribute to each statement if they read it in an intelligence report. Each dot in the table represents one officer’s probability assignment.

While there was broad consensus about the meaning of “better than even,” there was a wide disparity in interpretation of other probability expressions. The shaded areas in the table show the ranges proposed by Kent.

The main point is that an intelligence report may have no impact on the reader if it is couched in such ambiguous language that the reader can easily interpret it as consistent with his or her own preconceptions. This ambiguity can be especially troubling when dealing with low-probability, high-impact dangers against which policymakers may wish to make contingency plans.

Consider, for example, a report that there is little chance of a terrorist attack against the American Embassy in Cairo at this time. If the Ambassador’s preconception is that there is no more than a one-in-a-hundred chance, he may elect to not do very much. If the Ambassador’s preconception is that there may be as much as a one-in-four chance of an attack, he may decide to do quite a bit.

The term “little chance” is consistent with either of those interpretations, and there is no way to know what the report writer meant. Another potential ambiguity is the phrase “at this time.” Shortening the time frame for prediction lowers the probability, but may not decrease the need for preventive measures or contingency planning.

An event for which the timing is unpredictable may “at this time” have only a 5-percent probability of occurring during the coming month, but a 60-percent probability if the time frame is extended to one year (5 percent per month for 12 months). How can analysts express uncertainty without being unclear about how certain they are? Putting a numerical qualifier in parentheses after the phrase expressing degree of uncertainty is an appropriate means of avoiding misinterpretation. This may be an odds ratio (less than a one-in-four chance) or a percentage range (5 to 20 percent) or (less than 20 percent). Odds ratios are often preferable, as most people have a better intuitive understanding of odds than of percentages.

I’ll probably (heh) use the ranges in the figure to do Bayesian updates in the app I’m coding.

Posted by on December 27, 2017 in Bayes

## Probability Only Exists In Your Head

As I wrote in that post that I linked to above, probabilities aren’t facts about objects or phenomena that we look at or experience. If you flip a coin and it lands heads twice, the probability of it landing tails on the third flip is the same as the probability of it landing heads on that third flip.

But people who think that probability is an aspect of the coin similar to its weight or its color will think that 50% probability is physically tied to the coin, so it *must* account for the lack of landing tails on the next flip. As though there is a god of coin flips who has to make sure that the books are accounted for.

Again, this is wrong. And this next scenario I think explains why.

In a standard deck of cards, there’s a 1/52 chance of pulling any specific card, right? What if we have two people, Alice and Bob, who want to pull from the deck. Except, Alice has memorized the order of the cards in the deck and Bob hasn’t.

What is the probability of Bob drawing an Ace of Spades on the first draw? For us and Bob, it’s 1/52. But for Alice — because she’s memorized the order of the cards — it’s virtually certain (e.g., 99.99% or 0.000…1%) to her which card Bob will draw.

If 1/52 was some intrinsic aspect of the deck of cards, then how can there be two different probabilities? Obviously, because probability is a description of our uncertainty. It only exists in our minds. The reader of that thought experiment and Bob are operating under uncertainty. Alice, on the other hand, is not because she’s memorized the order of the cards.

Furthermore, Bayes is all about updating on new evidence. What if there was some third actor, Chad, who mixed up the deck of cards outside of Alice’s knowledge? Now, Alice may think that the next card’s probability is either 100% or 0%, but this is not true either. Now Chad has the certainty.

If Bob draws a card that Alice doesn’t think he should draw, how can she possibly do a Bayesian update on either 0% or 100%? She has to do the equivalent of moving faster than the speed of light in order to update; it literally takes infinite bits of data in order to update from 0% or 100% to some other number. Try it:

P(H | E) = P(E | H) * P(H) / P(E)
50% = ??? * 0% / 1.9% or

50% = ??? * 100% / 1.9%

This situation can be repeated over and over again, introducing new characters manipulating the deck outside of other people’s knowledge. And this demonstrates that not only is probability subjective and in your head, but that a Bayesian probability of 0% or 100% is not a probability at all because those numbers cannot be updated.

Posted by on September 14, 2017 in Bayes

## Simpson’s Paradox And The Positive/Negative Effect of Religious Belief

While not necessarily related to Bayes Theorem, something like this has been popping up in my mind whenever I read news stories dealing with statistics so I thought I would make a post about it.

In simplest terms, aggregate data might have different statistical properties than subsets of the aggregate data. As a matter of fact, the aggregate data might show the completely opposite effect when looked at in subsets.

An intuitive example of this is weather. You can average the temperature over the course of the year, or you could find the average of temperature over the course of six months. It might be that temperature over the course of the year has a slightly positive upward slope, yet temperature from June to December has a negative slope.

This seems obvious. But what if you’re dealing with something that’s not so obvious?

The example Wikipedia gives that I think is a non-controversial example is kidney stone treatment. Say you have Treatment A for either large or small kidney stones and Treatment B for large or small kidney stones.

Treatment A is effective on 81 out of 87 (93%) small kidney stones while Treatment B is effective on 87% (234/270) small kidney stones. For large kidney stones, Treatment A is effective 73% (192/263) of the time and Treatment B is effective 69% (55/80) of the time.

Clearly, Treatment A is what you should use for both small and large kidney stones. But what happens when we aggregate over both small and large kidney stones? Treatment A is 81/87 + 192/263 = 273/350 (78%) while Treatment B is 234/270 + 55/80 = 289/350 (83%). Now it turns out that Treatment B is better than Treatment A!

Therein lies Simpson’s Paradox. What happens when we have something controversial? Wikipedia also has the example of apparent sexism in graduate school admissions (which it still seems like no one has tried to account for this paradox when talking about modern controversies like the gender wage gap). But this is mainly a religion blog: So what about whether religion is good or bad for people or society?

Very religious Americans […] have high overall wellbeing, leading healthier lives, and are less likely to have ever been diagnosed with depression… These positive associations between religious engagement and the good life are reverse when comparing more versus less religious places rather than individuals…

Gallup World Poll data from 152 countries [show] a striking negative correlation between these countries’ population percentages declaring that religion is “important in your daily life” and their average life satisfaction score…

Across US states, religious attendance rates predict modestly lower emotional well-being…

Epidemiological studies reveal that religious engagement predicted longer life expectancy…

Across states, religious engagement predicts shorter life expectancy…

Across states religious engagement predicts higher crime rates. But across individuals, it predicts lower crime rates…

If you want to make religion look good, cite individual data. If you want to make it look bad, cite aggregate data…

Stunning individual versus aggregate paradoxes appear in other realms as well. Low-income states and high-income individuals have [recently] voted Republican…

Liberal countries and conservative individuals express greater well-being…

Highly religious states, and less religious individuals, do more Google “sex” searching…

One might wonder if the religiosity-happiness association is mediated by income — which has some association with happiness. But though richer people are happier than poor people, religiously engaged individuals tend to have lower incomes — despite which, they express greater happiness.

This is from a conference paper. I’m not actually sure if this is an example of Simpson’s Paradox, but the larger point remains. Breaking up data along different axes might yield paradoxical results. As the author says, if you want to make religion look bad, cite aggregate data. If you want to make religion look good, cite individual data.

But which statistic should one use? The aggregate data or the individual data? They’re both true, for lack of a better word, so it’s not like one is “lying”. I would tend to lean towards using the aggregate data if forced to choose. But there’s no harm in looking at both. And if both paint the same picture that just means that you have a more complete view of the phenomenon at hand.

Comments Off on Simpson’s Paradox And The Positive/Negative Effect of Religious Belief

Posted by on June 26, 2017 in Bayes, economics/sociology, religion

## Probability: The Logic of the Law

While poking around on JSTOR (thanks, grad school!) I found an interesting article in the Oxford Journal of Legal Studies called “Probability – The Logic of the Law“. In it, Bernard Robertson and G. A. Vignaux argue that probability is, you guessed it, the logic behind legal analysis and arbitration.

So not only do we have arguments in favor of probability being the logic of science (Jaynes), probability being the logic of historical analysis (Tucker, Carrier), but we now have an argument that probability is the logic of the legal world, too.

Here’s how Robertson and Vignaux derive Bayes Theorem in the article:

It has been argued that the axioms of probability do not apply in court cases, or that court cases out not to be thought about in this way even if they do apply. Alternatively, it is argued that some special kind of probability applies in legal cases, with its own axioms and rules… with the result that conventional probability has become known in the jurisprudential word Pascalian… In practice one commonly finds statements such as:

The concept of ‘probability’ in the legal sense is certainly different from the mathematical concept; indeed, it is rare to find a situation in which these two usages co-exist, although when they do, the mathematical probability has to be taken into assessment of probability in the legal sense and given its appropriate weight

This paper aims to show that this view is based upon a series of false assumptions.

The authors then go into some detail about common objections to the “mathematical” view of probability and why people think it doesn’t apply to the law:

1. Things either Happen or They Don’t; They Don’t Probably Happen

An example of this argument is provided by Jaffee: ‘Propositions are true or false; they are not “probable”

2. A Court is Concerned not with Long Runs but with Single Instances

Thus, descriptively:

Trials do not typically involve matters analogous to flipping coins. They involve unique events, and thus there is no relative frequency [my emphasis] to measure

And normatively:

Application of substantive legal principles relies on, and due process considerations require, that triers must make individualistic judgements about how they think a particular event (or series of events) occurred

3. Frequency Approaches Hide Causes and Other Relevant Information which Should Be Investigated

For an extended example of this argument see Ligertwood Australian Evidence (p14)

4. Evidence Must Be Interpreted

The implicit conception [in the probability debate] of ‘evidence’ is that which is plopped down on the factfinder at trial… the evidence must bear its own inferences… each bit of evidence manifests explicitly its characteristics. This assumption is false. Evidence takes on meaning for trials only through the process of being considered by a human being… the underlying experiences of each deliberator become part of the process, yet the probability debates proceed as though this were not so

5. People Actually Compare Hypotheses

Meaning is assigned to trial evidence through the incorporation of that evidence into one or more plausible stories which describe ‘what happened’ during events testified to at trial …The level of acceptance will be determined by the coverage, coherence and uniqueness of the ‘best’ story.

6. Assessment of Prior Odds ‘Appears to Fly in the Face of the Presumption of Innocence’

7. The Legal System is Not Supposed to be Subjective

Allen refers to

the desire to have disputes settled by reference to reality rather than the subjective state of mind of the decision maker

As you can see, a lot of the objections to probability here are continually raised in the frequentist vs. Bayesian interpretation of probability. But following in the steps of E. T. Jaynes, Robertson and Vignaux demonstrate that probability can be derived from some basic assumptions about propositional logic.

The authors then go on to explain the different “types” of probability, which is probably (heh) sowing confusion:

A priori probability refers to cases where there are a finite number of possible outcomes each of which is assumed to be equally probable. Probability refers to the chance of a particular outcome occurring under these conditions. Thus there is a 1 in 52 chance of drawing the King of Hearts from a pack of cards under these conditions and the axioms of probability can be used to answer questions like: ‘what is the probability of drawing a red court card?’ or ‘what is the probability of drawing a card which is (n)either red (n)or a court card?’

Empirical probability refers to some observation that has been carried out that in a series Y event X occurs in a certain proportion of cases. Thus surveys of weather, life expectancy, reliability of machinery, blood groups, will all produce figures which may then be referred to as the probability that X will occur under conditions Y.

Subjective probability refers to a judgement as to the chances of some event occurring based upon evidence. Unfortunately, Twining treats any judgement a person might make and might choose to express in terms of ‘probability’ as a ‘subjective probability’. This leads him to say that subjective probabilities ‘may or may not be Pascalian’.

[…]

This analysis of probability into different types invites the conclusion that ‘mathematical probability’ is just one type of probability, perhaps not appropriate to all circumstances… The adoption of any of the definitions of probability other than as a measure of strength of belief can lead to an unfortunate effect known as the Mind Projection Fallacy. This is the fallacy of regarding probability as a property of objects and processes in the real world rather than a measure of our own uncertainty. [my emphasis]

An instance of this fallacy is something called the Gambler’s fallacy. Indeed, in that post of mine I pretty much wrote what I emphasized in the quote above.

The authors then point out something pretty obvious: That flipping a coin is subject to the laws of physics. If we knew every single factor that went into each coin toss (e.g., strength of the flip, density of the air, the angle in which it was flipped, how long it spins in the air, the firmness of the surface it is landing on, etc.) we would know which side of the coin would be facing up without any uncertainty.

However, we don’t know every factor that goes into a coin toss, or drawing cards from a deck, or marbles from a jar (including the social influences of the marble picker). So there is a practical wall of separation between epistemology and ontology; a wall between how we know what we know and the actual nature of what we’re observing.

The authors continue with three minimal requirements for rational analysis of competing explanations:

Desiderata:

1. If a conclusion can be reasoned out in more than one way, then every possible way should lead to the same results.

2. Equivalent states of knowledge and belief should be represented by equivalent plausibility statements. Closely approximate states should have closely approximate expressions; divergent states should have divergent expressions.

The only way consistently to achieve requirement 2 is by the use of real numbers to represent states of belief. It is an obvious requirement of rationality that if A is greater than B and B is greater than C then A must be greater than C. It will be found that any system which obeys this requirement will reduce to real numbers. Only real numbers can ensure some uniformity of meaning and some method of comparison.

3. All relevant information should be considered. None should be excluded for ideological reasons. If this requirement is not fulfilled then obviously different people could come to different conclusions if they exclude different facts from consideration.

Clearly the legal system does exclude evidence for ideological reasons. Rules about illegally obtained evidence and the various privileges constitute obvious examples. It is important therefore, that there should be some degree of consensus as to what information is to be excluded in order to prevent inconsistent results. It is also important that we are explicit about exclusions for ideological reasons and do not pretend to argue that better decisions will be made by excluding certain evidence. This pretence is one of the justifications for the hearsay rule, for example, and it is clear from these cases from a variety of jurisdictions that judges are increasingly impatient with this claim.

The next section I will try to sum up where possible:

Rules to Satisfy the Desiderata:

1. The statement ‘A and B are both true’ is equivalent to the statement ‘B and A are both true’.

2. It is certainly true that A is either true or false.

The statment ‘A and B are both true’ can be represented by the symbol ‘AB’. So proposition 1 becomes ‘AB = BA’.

This is the basic rule for conjunction in propositional logic. P ^ Q is equivalent to Q ^ P.

How do we assess the plausibility of the statement AB given certain information I, symbolically P(AB | I)?

First consider the plausibility of A given I, P(A | I), then the plausibility of B given I and that A is true, P(B | A, I)… Thus in order to determine P(AB | I) the only plausibilities that need to be considered are P(A | I) and P(B | A, I). Since P(BA | I) = P(AB | I) (above)… [c]learly, P(AB | I) is a function of P(A | I) and P(B | A, I) and it can be show that the two terms are simply to be multiplied. This is called the ‘product rule’.

And because of the product rule, and because of requirement 2 above, the numbers we should assign to our certainties of “absolutely true” and “absolutely false” are 1 and zero, respectively.

Next, since we know that absolute certainty is 1, then the statement P(A, ~A) — that is, the probability that A is true or false — should be 1. And from that it follows that if P(A) + P(~A) = 1, then however much P(A) increases, P(~A) is equal to 1 minus P(A). This, the authors call the addition rule.

We may wish to assess how plausible it is that at least one of A or B is true…

P(A or B) = P(A)P(B | A) + P(A)P(~B | A) + P(~A)P(B | ~A)

Now, the first two terms on the right hand side can be expressed as:

P(A)P(B | A) + P(A)P(~B | A) = P(A)P(B or ~B | A) = P(A, B or ~B) = P(A)

And the third term, P(~A)P(B | ~A) as P(B, ~A) by the product rule.

Hence P(A or B) = P(A) + P(B, ~A).

This means that if we are interested in a proposition, C, which will be true if either (or both) A or B is true we can assess the probability of C from those of A and B. Thus, if the defendant is liable if either (or both) of two propositions were true then the probability that the defendant is liable is equal to the union of the probabilities of the two propositions. Courts appear to find this rule troublesome. The Supreme Court of Canada applied it correctly in Thatcher v The Queen but in New Zealand the Court of Appeal failed to apply it in R v Chingell and the High Court failed to apply it in Stratford v MOT.

3. If P(A | I)P(B | A, I) = P(B | I)P(A | B, I) (the product rule) then if we divide both sides of the equation by P(B | I) we get

P(B | I)P(A | B, I) / P(B | I) = P(A | I)P(B | A, I) / P(B | I)

The two P(B | I)’s on the left hand side cancel out and we have

P(A | B, I) = P(A | I)P(B | A, I) / P(B | I)

This is Bayes’ Theorem.

From here, the authors begin going over objections to probability and its utility in the law; objections that are borne of the misconceptions about probability and its utility outside the law. Most of these objectsions, in fact, are due to a frequentist view of probability; thinking of probability as a fundamental aspect of the object or event we’re looking at instead of a description of our uncertainty. As a matter of fact, that view should be put to rest by the authors’ demonstration of only using logic to derive Bayes Theorem. At no point did they use frequencies or any appeal to the nature of an object.

I did read one response to this article in the same publication in JSTOR, but it amounted to basically “This would be really hard to do” and not “this is invalid and/or it doesn’t follow from the rules of logic”.

Comments Off on Probability: The Logic of the Law

Posted by on June 16, 2015 in Bayes

## The Rules Of Logic Are Just Probability Theory Without Uncertainty

So it looks like for the summer I won’t be having any grad courses. Which means I can go back to blogging a bit and commenting on the multitude of things I find dealing with religion and/or rationality that I come across on the web. Maybe even finish reading some books I’ve bought and blogging about them too!

One thing I read on Quora is an intersection of religion and rationality: Using Bayes Theorem in history. Unfortunately this won’t be a post praising the argument; rather, it’ll be one explaining the author’s fail at rationality:

To begin with, it’s illustrative to note who uses Bayes Theorem to analyse history and who does not. In the first category we have William Lane Craig, the conservative Christian apologist, who uses Bayes Theorem to “prove” that Jesus actually did rise from the dead. And we also have Richard Carrier, the anti-Christian activist, who uses Bayes Theorem to “prove” that Jesus didn’t exist at all. Right away, a curious observer would find themselves wondering how, if this Theorem is the wonderful instrument of historical objectivity both Craig and Carrier claim it to be, two people can apply it and come to two completely contradictory historical conclusions. After all, if Jesus didn’t exist, he didn’t do anything at all, let alone something as remarkable as rise from the dead. So both Carrier and Craig can’t both be right. Yet they both use Bayes Theorem to “prove” historical things. Something does not make sense here.

Yes something doesn’t make sense here, and one can tell what that is by inference from the title of this current blog post.

As I wrote above, logic is just probability without the attendant uncertainty. Which should sorta be uncontroversial since logic and math are highly interconnected, just like math and probability are interconnected. I’m also not the first to point this out; I first read this connection in Jaynes.

But let me offer a couple of demonstrations. How about the basic syllogism with a conjunction as the major premise:

``` 1. P ^ Q (true) 2. P (true) Therefore Q ```

If I give a probability value to the major and minor premise, we can find out what conclusion follows:

``` 1. P ^ Q (100%) 2. P (100%) Therefore Q (100%) ```

This follows both logically and mathematically / probabilistically. If P * Q is 1, and P is 1, then Q must also be 1. So the answer is the same for both the formal logic formulation and the probabilistic formulation. Another example, using the same format:

``` 1. ~(P ^ Q) 2. P (true) Therefore ~Q ```

So if you can’t understand the fancy symbols, this reads that if you have a conjunction P and Q that is false, and you also know that P is true, then it follows necessarily that Q is false. The same conclusion will follow if we substitute probabilities:

``` 1. P ^ Q (0%) 2. P (100%) Therefore Q (0%) ```

This reads if the probability of P and Q is 0%, and we know that P is 100% then it must mean that Q is 0%. It’s a straightforward algebraic solve-for-x deal. The conjunction of the major premise of this case can be converted into a disjunction using DeMorgan’s law:

``` 1. ~P v ~Q (true) 2. P (true) 3. Therefore ~Q ```

Does using probability yield the same conclusion?

``` 1. ~P v ~Q (100%) 2. P (100%) 3. Therefore ~Q (100%) ```

Since this is a disjunction, we are no longer using multiplication to find the answer.

The point with this is that the underlying mechanisms are the same: conjunctions in propositional logic have the same “mechanism” for finding conclusions that math/probability do. The main difference between logic and probability is that logic is binary (yes/no) whereas probability is comparative. If we know that A is greater than B, and B is greater than C, then A must be greater than C. The shortcut for those sorts of comparisons is using numbers. And more relevantly, if history is about comparing explanations — which is a measure of uncertainty — the only clear way to do so is by using numbers: Probability.

So let’s substitute “Bayes theorem” with “propositional logic” in the original quote and see if this still makes sense:

To begin with, it’s illustrative to note who uses [modus tollens] to analyse history and who does not. In the first category we have William Lane Craig, the conservative Christian apologist, who uses Bayes Theorem to “prove” that Jesus actually did rise from the dead. And we also have Richard Carrier, the anti-Christian activist, who uses [modus tollens] to “prove” that Jesus didn’t exist at all. Right away, a curious observer would find themselves wondering how, if this [propositional logic] is the wonderful instrument of historical objectivity both Craig and Carrier claim it to be, two people can apply it and come to two completely contradictory historical conclusions. After all, if Jesus didn’t exist, he didn’t do anything at all, let alone something as remarkable as rise from the dead. So both Carrier and Craig can’t both be right. Yet they both use [modus tollens] to “prove” historical things. Something does not make sense here

And there we have it. It is indeed true that both Carrier and Craig have attempted to use propositional logic to defend their cases. This must mean that historians need to do away with using formal rules of logical inference because they can lead to different, contradictory conclusions. Clearly, this now means that the whole gamut of logical fallacies is now in play to argue anything one wants in historical analysis!

This reminds me of how Creationists and other anti-science types think that the scientific enterprise is wholly corrupt because sometimes the scientific method produces two contradictory studies.

But yes. Both probability and logic (and science) follow the GIGO rule: Garbage in, garbage out. We can’t argue against a tool just because it follows GIGO.

Comments Off on The Rules Of Logic Are Just Probability Theory Without Uncertainty

Posted by on June 5, 2015 in Bayes

NeuroLogica Blog

Slate Star Codex

SELF-RECOMMENDING!

Κέλσος

Matthew Ferguson Blogs

The Wandering Scientist

What a lovely world it is

NT Blog

PsyBlog

Understand your mind with the science of psychology -

Vridar

Musings on biblical studies, politics, religion, ethics, human nature, tidbits from science

Maximum Entropy

Skepticism, Properly Applied

Criticism is not uncivil

Say..

Research Digest

Disrupting Dinner Parties

Feminism is for everyone!