RSS

Category Archives: Bayes

Probability: The Logic of the Law

bayes theorem

While poking around on JSTOR (thanks, grad school!) I found an interesting article in the Oxford Journal of Legal Studies called “Probability – The Logic of the Law“. In it, Bernard Robertson and G. A. Vignaux argue that probability is, you guessed it, the logic behind legal analysis and arbitration.

So not only do we have arguments in favor of probability being the logic of science (Jaynes), probability being the logic of historical analysis (Tucker, Carrier), but we now have an argument that probability is the logic of the legal world, too.

Here’s how Robertson and Vignaux derive Bayes Theorem in the article:

It has been argued that the axioms of probability do not apply in court cases, or that court cases out not to be thought about in this way even if they do apply. Alternatively, it is argued that some special kind of probability applies in legal cases, with its own axioms and rules… with the result that conventional probability has become known in the jurisprudential word Pascalian… In practice one commonly finds statements such as:

The concept of ‘probability’ in the legal sense is certainly different from the mathematical concept; indeed, it is rare to find a situation in which these two usages co-exist, although when they do, the mathematical probability has to be taken into assessment of probability in the legal sense and given its appropriate weight

This paper aims to show that this view is based upon a series of false assumptions.

The authors then go into some detail about common objections to the “mathematical” view of probability and why people think it doesn’t apply to the law:

1. Things either Happen or They Don’t; They Don’t Probably Happen

An example of this argument is provided by Jaffee: ‘Propositions are true or false; they are not “probable”

2. A Court is Concerned not with Long Runs but with Single Instances

Thus, descriptively:

Trials do not typically involve matters analogous to flipping coins. They involve unique events, and thus there is no relative frequency [my emphasis] to measure

And normatively:

Application of substantive legal principles relies on, and due process considerations require, that triers must make individualistic judgements about how they think a particular event (or series of events) occurred

3. Frequency Approaches Hide Causes and Other Relevant Information which Should Be Investigated

For an extended example of this argument see Ligertwood Australian Evidence (p14)

4. Evidence Must Be Interpreted

The implicit conception [in the probability debate] of ‘evidence’ is that which is plopped down on the factfinder at trial… the evidence must bear its own inferences… each bit of evidence manifests explicitly its characteristics. This assumption is false. Evidence takes on meaning for trials only through the process of being considered by a human being… the underlying experiences of each deliberator become part of the process, yet the probability debates proceed as though this were not so

5. People Actually Compare Hypotheses

Meaning is assigned to trial evidence through the incorporation of that evidence into one or more plausible stories which describe ‘what happened’ during events testified to at trial …The level of acceptance will be determined by the coverage, coherence and uniqueness of the ‘best’ story.

6. Assessment of Prior Odds ‘Appears to Fly in the Face of the Presumption of Innocence’

7. The Legal System is Not Supposed to be Subjective

Allen refers to

the desire to have disputes settled by reference to reality rather than the subjective state of mind of the decision maker

As you can see, a lot of the objections to probability here are continually raised in the frequentist vs. Bayesian interpretation of probability. But following in the steps of E. T. Jaynes, Robertson and Vignaux demonstrate that probability can be derived from some basic assumptions about propositional logic.

The authors then go on to explain the different “types” of probability, which is probably (heh) sowing confusion:

A priori probability refers to cases where there are a finite number of possible outcomes each of which is assumed to be equally probable. Probability refers to the chance of a particular outcome occurring under these conditions. Thus there is a 1 in 52 chance of drawing the King of Hearts from a pack of cards under these conditions and the axioms of probability can be used to answer questions like: ‘what is the probability of drawing a red court card?’ or ‘what is the probability of drawing a card which is (n)either red (n)or a court card?’

Empirical probability refers to some observation that has been carried out that in a series Y event X occurs in a certain proportion of cases. Thus surveys of weather, life expectancy, reliability of machinery, blood groups, will all produce figures which may then be referred to as the probability that X will occur under conditions Y.

Subjective probability refers to a judgement as to the chances of some event occurring based upon evidence. Unfortunately, Twining treats any judgement a person might make and might choose to express in terms of ‘probability’ as a ‘subjective probability’. This leads him to say that subjective probabilities ‘may or may not be Pascalian’.

[…]

This analysis of probability into different types invites the conclusion that ‘mathematical probability’ is just one type of probability, perhaps not appropriate to all circumstances… The adoption of any of the definitions of probability other than as a measure of strength of belief can lead to an unfortunate effect known as the Mind Projection Fallacy. This is the fallacy of regarding probability as a property of objects and processes in the real world rather than a measure of our own uncertainty. [my emphasis]

An instance of this fallacy is something called the Gambler’s fallacy. Indeed, in that post of mine I pretty much wrote what I emphasized in the quote above.

The authors then point out something pretty obvious: That flipping a coin is subject to the laws of physics. If we knew every single factor that went into each coin toss (e.g., strength of the flip, density of the air, the angle in which it was flipped, how long it spins in the air, the firmness of the surface it is landing on, etc.) we would know which side of the coin would be facing up without any uncertainty.

However, we don’t know every factor that goes into a coin toss, or drawing cards from a deck, or marbles from a jar (including the social influences of the marble picker). So there is a practical wall of separation between epistemology and ontology; a wall between how we know what we know and the actual nature of what we’re observing.

The authors continue with three minimal requirements for rational analysis of competing explanations:

Desiderata:

1. If a conclusion can be reasoned out in more than one way, then every possible way should lead to the same results.

2. Equivalent states of knowledge and belief should be represented by equivalent plausibility statements. Closely approximate states should have closely approximate expressions; divergent states should have divergent expressions.

The only way consistently to achieve requirement 2 is by the use of real numbers to represent states of belief. It is an obvious requirement of rationality that if A is greater than B and B is greater than C then A must be greater than C. It will be found that any system which obeys this requirement will reduce to real numbers. Only real numbers can ensure some uniformity of meaning and some method of comparison.

3. All relevant information should be considered. None should be excluded for ideological reasons. If this requirement is not fulfilled then obviously different people could come to different conclusions if they exclude different facts from consideration.

Clearly the legal system does exclude evidence for ideological reasons. Rules about illegally obtained evidence and the various privileges constitute obvious examples. It is important therefore, that there should be some degree of consensus as to what information is to be excluded in order to prevent inconsistent results. It is also important that we are explicit about exclusions for ideological reasons and do not pretend to argue that better decisions will be made by excluding certain evidence. This pretence is one of the justifications for the hearsay rule, for example, and it is clear from these cases from a variety of jurisdictions that judges are increasingly impatient with this claim.

The next section I will try to sum up where possible:

Rules to Satisfy the Desiderata:

1. The statement ‘A and B are both true’ is equivalent to the statement ‘B and A are both true’.

2. It is certainly true that A is either true or false.

The statment ‘A and B are both true’ can be represented by the symbol ‘AB’. So proposition 1 becomes ‘AB = BA’.

This is the basic rule for conjunction in propositional logic. P ^ Q is equivalent to Q ^ P.

How do we assess the plausibility of the statement AB given certain information I, symbolically P(AB | I)?

First consider the plausibility of A given I, P(A | I), then the plausibility of B given I and that A is true, P(B | A, I)… Thus in order to determine P(AB | I) the only plausibilities that need to be considered are P(A | I) and P(B | A, I). Since P(BA | I) = P(AB | I) (above)… [c]learly, P(AB | I) is a function of P(A | I) and P(B | A, I) and it can be show that the two terms are simply to be multiplied. This is called the ‘product rule’.

And because of the product rule, and because of requirement 2 above, the numbers we should assign to our certainties of “absolutely true” and “absolutely false” are 1 and zero, respectively.

Next, since we know that absolute certainty is 1, then the statement P(A, ~A) — that is, the probability that A is true or false — should be 1. And from that it follows that if P(A) + P(~A) = 1, then however much P(A) increases, P(~A) is equal to 1 minus P(A). This, the authors call the addition rule.

We may wish to assess how plausible it is that at least one of A or B is true…

P(A or B) = P(A)P(B | A) + P(A)P(~B | A) + P(~A)P(B | ~A)

Now, the first two terms on the right hand side can be expressed as:

P(A)P(B | A) + P(A)P(~B | A) = P(A)P(B or ~B | A) = P(A, B or ~B) = P(A)

And the third term, P(~A)P(B | ~A) as P(B, ~A) by the product rule.

Hence P(A or B) = P(A) + P(B, ~A).

This means that if we are interested in a proposition, C, which will be true if either (or both) A or B is true we can assess the probability of C from those of A and B. Thus, if the defendant is liable if either (or both) of two propositions were true then the probability that the defendant is liable is equal to the union of the probabilities of the two propositions. Courts appear to find this rule troublesome. The Supreme Court of Canada applied it correctly in Thatcher v The Queen but in New Zealand the Court of Appeal failed to apply it in R v Chingell and the High Court failed to apply it in Stratford v MOT.

3. If P(A | I)P(B | A, I) = P(B | I)P(A | B, I) (the product rule) then if we divide both sides of the equation by P(B | I) we get

P(B | I)P(A | B, I) / P(B | I) = P(A | I)P(B | A, I) / P(B | I)

The two P(B | I)’s on the left hand side cancel out and we have

P(A | B, I) = P(A | I)P(B | A, I) / P(B | I)

This is Bayes’ Theorem.

Cue Final Fantasy fanfare!

From here, the authors begin going over objections to probability and its utility in the law; objections that are borne of the misconceptions about probability and its utility outside the law. Most of these objectsions, in fact, are due to a frequentist view of probability; thinking of probability as a fundamental aspect of the object or event we’re looking at instead of a description of our uncertainty. As a matter of fact, that view should be put to rest by the authors’ demonstration of only using logic to derive Bayes Theorem. At no point did they use frequencies or any appeal to the nature of an object.

I did read one response to this article in the same publication in JSTOR, but it amounted to basically “This would be really hard to do” and not “this is invalid and/or it doesn’t follow from the rules of logic”.

 
Comments Off on Probability: The Logic of the Law

Posted by on June 16, 2015 in Bayes

 

The Rules Of Logic Are Just Probability Theory Without Uncertainty

bayes theorem

So it looks like for the summer I won’t be having any grad courses. Which means I can go back to blogging a bit and commenting on the multitude of things I find dealing with religion and/or rationality that I come across on the web. Maybe even finish reading some books I’ve bought and blogging about them too!

One thing I read on Quora is an intersection of religion and rationality: Using Bayes Theorem in history. Unfortunately this won’t be a post praising the argument; rather, it’ll be one explaining the author’s fail at rationality:

To begin with, it’s illustrative to note who uses Bayes Theorem to analyse history and who does not. In the first category we have William Lane Craig, the conservative Christian apologist, who uses Bayes Theorem to “prove” that Jesus actually did rise from the dead. And we also have Richard Carrier, the anti-Christian activist, who uses Bayes Theorem to “prove” that Jesus didn’t exist at all. Right away, a curious observer would find themselves wondering how, if this Theorem is the wonderful instrument of historical objectivity both Craig and Carrier claim it to be, two people can apply it and come to two completely contradictory historical conclusions. After all, if Jesus didn’t exist, he didn’t do anything at all, let alone something as remarkable as rise from the dead. So both Carrier and Craig can’t both be right. Yet they both use Bayes Theorem to “prove” historical things. Something does not make sense here.

Yes something doesn’t make sense here, and one can tell what that is by inference from the title of this current blog post.

As I wrote above, logic is just probability without the attendant uncertainty. Which should sorta be uncontroversial since logic and math are highly interconnected, just like math and probability are interconnected. I’m also not the first to point this out; I first read this connection in Jaynes.

But let me offer a couple of demonstrations. How about the basic syllogism with a conjunction as the major premise:


1. P ^ Q (true)
2. P (true)
Therefore Q

If I give a probability value to the major and minor premise, we can find out what conclusion follows:


1. P ^ Q (100%)
2. P (100%)
Therefore Q (100%)

This follows both logically and mathematically / probabilistically. If P * Q is 1, and P is 1, then Q must also be 1. So the answer is the same for both the formal logic formulation and the probabilistic formulation. Another example, using the same format:


1. ~(P ^ Q)
2. P (true)
Therefore ~Q

So if you can’t understand the fancy symbols, this reads that if you have a conjunction P and Q that is false, and you also know that P is true, then it follows necessarily that Q is false. The same conclusion will follow if we substitute probabilities:


1. P ^ Q (0%)
2. P (100%)
Therefore Q (0%)

This reads if the probability of P and Q is 0%, and we know that P is 100% then it must mean that Q is 0%. It’s a straightforward algebraic solve-for-x deal. The conjunction of the major premise of this case can be converted into a disjunction using DeMorgan’s law:


1. ~P v ~Q (true)
2. P (true)
3. Therefore ~Q

Does using probability yield the same conclusion?


1. ~P v ~Q (100%)
2. P (100%)
3. Therefore ~Q (100%)

Since this is a disjunction, we are no longer using multiplication to find the answer.

The point with this is that the underlying mechanisms are the same: conjunctions in propositional logic have the same “mechanism” for finding conclusions that math/probability do. The main difference between logic and probability is that logic is binary (yes/no) whereas probability is comparative. If we know that A is greater than B, and B is greater than C, then A must be greater than C. The shortcut for those sorts of comparisons is using numbers. And more relevantly, if history is about comparing explanations — which is a measure of uncertainty — the only clear way to do so is by using numbers: Probability.

So let’s substitute “Bayes theorem” with “propositional logic” in the original quote and see if this still makes sense:

To begin with, it’s illustrative to note who uses [modus tollens] to analyse history and who does not. In the first category we have William Lane Craig, the conservative Christian apologist, who uses Bayes Theorem to “prove” that Jesus actually did rise from the dead. And we also have Richard Carrier, the anti-Christian activist, who uses [modus tollens] to “prove” that Jesus didn’t exist at all. Right away, a curious observer would find themselves wondering how, if this [propositional logic] is the wonderful instrument of historical objectivity both Craig and Carrier claim it to be, two people can apply it and come to two completely contradictory historical conclusions. After all, if Jesus didn’t exist, he didn’t do anything at all, let alone something as remarkable as rise from the dead. So both Carrier and Craig can’t both be right. Yet they both use [modus tollens] to “prove” historical things. Something does not make sense here

And there we have it. It is indeed true that both Carrier and Craig have attempted to use propositional logic to defend their cases. This must mean that historians need to do away with using formal rules of logical inference because they can lead to different, contradictory conclusions. Clearly, this now means that the whole gamut of logical fallacies is now in play to argue anything one wants in historical analysis!

This reminds me of how Creationists and other anti-science types think that the scientific enterprise is wholly corrupt because sometimes the scientific method produces two contradictory studies.

But yes. Both probability and logic (and science) follow the GIGO rule: Garbage in, garbage out. We can’t argue against a tool just because it follows GIGO.

 
Comments Off on The Rules Of Logic Are Just Probability Theory Without Uncertainty

Posted by on June 5, 2015 in Bayes

 

What are the odds that Jesus rose or Moses parted the waves? Even with the best witnesses, vanishingly small

I claim no great originality for my argument. I’m borrowing from the great Scottish philosopher David Hume, particularly Section 10 of his magnificent Enquiry Concerning Human Understanding (1748). If there is any novelty in my presentation, it owes to the marriage of Hume’s ideas with a famous theorem in probability theory proposed by the Reverend Thomas Bayes in ‘An Essay towards solving a Problem in the Doctrine of Chances’ (1763). The technical details, fortunately, can be put to the side for our purposes.

Read more at Aeon.

 
Comments Off on What are the odds that Jesus rose or Moses parted the waves? Even with the best witnesses, vanishingly small

Posted by on December 5, 2014 in Bayes, rationality

 

“There’s No Evidence For The Existence of God”

IMG_4559.JPG

I used to think that the title-quote of this blog post was a good rejoinder when people asked me why I didn’t believe in any sort of god. Nowadays, I sort of grimace a little when I hear atheists use that phrase. Because now I consider myself a Bayesian. And for Bayesians, “no evidence” means something a lot different than how other people use “no evidence”.

As a Bayesian, if I say there is evidence for some hypothesis, then this means that P(H | E) > P(H). If I say there is evidence against some hypothesis, then this means that P(H | E) < P(H). Most importantly, as a Bayesian, I don't just update once; I update on multiple pieces of evidence to arrive at a provisional posterior probability about some claim. And it’s provisional because there’s always new evidence to discover. In this sense, and in my opinion, agnosticism is probably the closest mainstream or Traditional Rationality analog to being a Bayesian.

But what could it mean if I say there is no evidence for some claim? And does this apply to the concept of god?

Let’s compare two conditional probabilities: The probability of having some datum given that god exists and the probability of having some datum given the nonexistence of god. P(D | G) and P(D | ~G). So, assuming god exists, what would the most basic evidence be, and would this be more or less likely given the nonexistence of god?

Some axioms of probability to remind you of: P(E | H) + P(~E | H) = 100%. That is, the probability of the evidence given the hypothesis is true plus the nonexistence of (or existence of some other) evidence given the hypothesis is true must exhaust all possibilities. Meaning they add up to 100%. This is how you know that you have a 1/6 chance of rolling a 4 given a fair die. P(Roll 4 | Fair Die) + P(Roll Other Number | Fair Die) = 100%.

Given that, most simplistically, stories about the existence of god are more likely than no stories about god given that god exists. Meaning that P(D | G) > P(~D | G). And the opposite for the alternative: stories about the existence of god are less likely given that no god exists than no stories about god given that god doesn’t exist. Meaning, also, that P(D | ~G) < P(~D | ~G). If I can say this another way, if god did exist we would have more stories about him than if god didn’t exist: P(D | G) > P(D | ~G). Think about it. There are more non stories of things that don’t exist than there are stories of things that don’t exist. Sure there are stories of unicorns and unicorns don’t exist. But what about the trillions of things that don’t exist that we concurrently don’t have stories of? They are legion.

Basically, anecdotes about the existence of god are evidence that god exists. I go over this in the post Logical Fallacies as Weak Bayesian Evidence: Argument from Anecdote. This all might seem a bit counterintuitive, but relying on intuition to make decisions is just another way of saying that the decision conforms to your biases. Which is usually not a good thing.

So what does no evidence look like? To me, this would be some conditional probability that is equal to all alternatives. One where Bayes Factor is 1. In other words, the evidence exists independently of the hypothesis.

This all being said, I think there is evidence for the existence of god. I actually concede a little bit of relatively strong evidence for the existence of god. But, there is so much more evidence against the existence of god because god, as defined by laypeople and sophisticated theologians alike, is unfalsifiable. For most other data besides morality, god is the equivalent of a trillion^trillion sided die and expecting to roll a 3, and comparing that to the probability of rolling a 3 given normal die. This is what happens when one conceives of an all-powerful god; there’s nothing an all powerful god can’t explain.

So yes, there is evidence for the existence of god. But it is underwhelming in comparison to the orders upon orders of magnitude of the evidence — Bayesian evidence — against the existence of god.

 

Grad School

20140430-110636.jpg

So I’m starting grad school for computer science in about a month. This is on top of having a normal 9 – 5 (well, 8:30 – 6) job. Meaning that in a little while I’ll probably have less time for blogging; at least, blogging anything with more than some passing thoughts and/or cool articles I find about religion.

Since I’m continuing my compsci schooling towards an M.S. I thought I’d try brushing up on my programming besides the meager tasks that I do for work (right now I’m more of a “software engineer”, meaning I mainly concentrate on the process aspect of software development with some coding on the side if required) so I’m writing a Java app that is — you guessed it — computing Bayes Theorem! I’m going to add it as an executable to my static website where I’m going to be doing some other web dev for a page dedicated to how probability theory is the logic of science. The page isn’t up yet, but it’ll get there eventually.

It was actually really simple to write the backend code for BT, but one neat little thing I discovered while coding for it, ironing out all of the nooks and crannies of BT, was combining likelihood ratios/Bayes factors. Here it is, better described over at Overcoming Bias:

You think A is 80% likely; my initial impression is that it’s 60% likely. After you and I talk, maybe we both should think 70%. “Average your starting beliefs”, or perhaps “do a weighted average, weighted by expertise” is a common heuristic.

But sometimes, not only is the best combination not the average, it’s more extreme than either original belief.

Let’s say Jane and James are trying to determine whether a particular coin is fair. They both think there’s an 80% chance the coin is fair. They also know that if the coin is unfair, it is the sort that comes up heads 75% of the time.

Jane flips the coin five times, performs a perfect Bayesian update, and concludes there’s a 65% chance the coin is unfair. James flips the coin five times, performs a perfect Bayesian update, and concludes there’s a 39% chance the coin is unfair. The averaging heuristic would suggest that the correct answer is between 65% and 39%. But a perfect Bayesian, hearing both Jane’s and James’s estimates – knowing their priors, and deducing what evidence they must have seen – would infer that the coin was 83% likely to be unfair.

That is because a perfect Bayesian would be combining their data, not simply taking an average of their posteriors. Which makes more sense if you think about it. If one group of people concluded that the world was round and another group of people thought the world was flat, it wouldn’t make sense to take an average of the two conclusions and say that the world must be shaped like a calzone. You would want the data that they used to arrive at their conclusions and update on that. Taking an average of the two is a social solution — meant to save people’s egos — not one that’s actually attempting to get at a more accurate model of the world.

It seems like combining likelihood ratios is actually pretty straightforward. Think about the conjunction fallacy. The probability of X% combined with the probability of Y% isn’t X% + Y%, or the average of X% and Y%, but rather X% * Y%. So combining likelihood ratios follows the same logic.

Again, from OB:

James, to end up with a 39% posterior on the coin being heads-weighted, must have seen four heads and one tail:

P(four heads and one tail| heads-weighted) = (0.75^4 * 0.25^1) = 0.079. P(four heads and one tail | fair) = 0.031. P(heads-weighted | five heads) = (0.2 * 0.079)/(0.2 * 0.079 + 0.8 * 0.031) = 0.39, which is the posterior belief James reports.

Jane must similarly have seen five heads and zero tails.

Plugging the total nine heads and one tail into Bayes’ theorem:

P(heads-weighted | nine heads and a tail) = ( 0.2 * (0.75^9 * 0.25^1) ) / ( 0.2 * (0.75^9 * 0.25^1) + 0.8 * (0.5^9 * 0.5^1) ) = 0.83, giving us a posterior belief of 83% that the coin is heads-weighted.

So what I call the success rate — P(E | H) — is represented here as P(four heads and one tail | heads-weighted). P(E | ~H), the alternative hypothesis, is P(four heads and one tail | fair). P(E | H) / P(E | ~H) = 0.079 / 0.031 = 2.531 for James’ likelihood ratio. Jane’s numbers are P(E | H) / P(E | ~H) = 0.237 / 0.031 = 7.593. The combined likelihood ratio is 19.221, which is how much evidence is needed to move the prior from 20% to 83%; that likelihood ratio also happens to be the other two likelihoods multiplied together, 2.531 * 7.593.

Something like this is very handy if you have two people with disparate priors. Two people can have different priors, but as long as you’re updating on the same evidence, the priors will eventually converge. Combining likelihood ratios ensures that both parties are updating on the same evidence, since the likelihood ratio is what is determining how much your prior moves.

 
Comments Off on Grad School

Posted by on April 30, 2014 in Bayes

 

“Real Life Isn’t A Probability Game”

20140224-170327.jpg

(Magic: A probability game)

So on the Facebook I had a conversation with someone regarding what constitutes a good explanation. The person rejoined with something like “real life isn’t a probability game” so I gave up the thread at that.

But why would someone think that the laws of probability don’t apply to everyday situations, or for making sense of the world? We intuitively use probability to make many of our decisions during the day. The decision to not walk down a dark alley in a bad neighborhood at 3am is one based on probability. There is neither a zero percent chance that you’ll get robbed nor a 100% chance that you will get robbed. The decision to fly in a plane, get in a car, check the weather, type on a keyboard… these are all decisions based on knowing the odds in your favor.

There are a few posts on Less Wrong about how probability follows from logic, and also a few books. But I think those are a bit too high level for the type of person who would think that probability doesn’t apply for making sense of the world. So something more intuitive and obvious is needed.

Let’s see if I can try this “probability follows from logic” thing with as little assumptions as possible.

Assumptions: A = A. A != ~A. Meaning that if I say the word “ball”, you know I mean “ball” and not “not-ball”. This is a fundamental assumption for normal human communication to be possible. You’ve assumed this assumption in order to comprehend this post!

A normal argument:

Premise 1: If A Then B
Premise 2: A
Conclusion: B

This is also pretty straightforward. The word to describe this inference if you want to run someone over with fancy Latin words is modus ponens.

Another type of inference:

P1: If A Then B
P2: not-B
C: not-A

Not as straightforwardly intuitive as modus ponens. This one is called modus tollens. Which means that if we have a material conditional where the consequent is false, then asserting that the antecedent is also false follows logically.

These types of inferences color a lot of our explanatory language. If my computer is on, then I can type this blog post. My computer is on, so a reasonable conclusion — based on the premise that if my computer is on then I can type this post — is that I can type this post. It seems very redundant because we should know this sort of stuff already.

Now for the challenging part.

P1: A does not equal C
P2: If A Then B
P:3 If C Then B
P4: B
C: …?

In this case, if you write as the conclusion A or C, that is a logical fallacy called affirming the consequent. Due to the logic of truth tables, you can’t conclude the antecedent of a material conditional if only given the consequent as a premise.

What if you want to find out the true antecedent for B? You could just write the conclusion as
A or C but you also have to take into account that there might be other causes for B besides A and C; that would be your unknown ???. So it would follow logically that the conclusion could be A or C or ???. But that doesn’t really help, does it? The conclusion could always be written as ??? and we would be “correct”.

Or say you were doing a code review of a program that had two separate logic gates:

If (A == true) then B;
If (C == true) then B;

You run the program and B executes. Which condition in the code was satisfied to run B? Was it A or C? Or some other unknown state? Assuming the same argument as above (A is not equal to C) if it were me, I would look for other evidence that also followed from A or C being run. But we also don’t know what other processes or code could also produce B so those have to be included in the possibilities.

To make this easier, let’s say that instead of a program, we’re dealing with something in real life. The prototypical example is wet grass. If it rains, then the grass is wet. If the sprinklers turn on, then the grass is wet. When I say “rain” I don’t secretly mean “sprinklers” or vice versa; rain is not equal to sprinklers (though it could rain and the sprinklers turn on, but for simplicity’s sake let’s assume that doesn’t happen). This also follows the same rules of logic as above. Just because the grass is wet doesn’t mean that it rained nor that the sprinklers turned on; that is the same affirming the consequent fallacy because some unknown other causes might make the grass wet.

So let’s say we ran this code so 100 times, or in real life checked the grass 100 days in a row. 10 out of those 100 times we checked the grass (i.e. ran the program), the water was wet (B executed). Of those 10 times, 2 times that the grass was wet was due to the sprinklers. 5 times the grass was wet, it rained. And the remaining 3 times the grass was wet it neither rained nor the sprinklers turned on, but was due to some unknown cause.

What makes things easy in this case is the assumption that every time it rains then the grass is wet and every time the sprinklers turn on the grass is wet. But what if that wasn’t the case? It follows that the grass being wet given that it rained or the sprinklers turned on were true would be some sort of fraction, depending on how tightly coupled rain/sprinklers were to wet grass. So to run with this thought for a moment, let’s say that 5 out of 6 times that it rained, the grass was wet and 2 out of 4 times that the sprinklers turned on, the grass was wet. Maybe the reason for the inconsistencies is due to a really arid climate that dries out the grass before it’s checked. Maybe the rain/sprinklers didn’t last long enough to really make the grass wet. Who knows.

Now, we have a bunch of fractions floating around. We need to keep track of them.

10 out of 100 days the grass was wet.
of those 10 times:
2 out of 10 times the grass was wet was due to sprinklers
5 out of 10 times the grass was wet was due to rain
3 out of 10 times the grass was wet was due to some other reason

What about the total times that the sprinklers turned on out of those 100 days, and how many times it rained during those days? Those are also numbers we need to be aware of. In this hypothetical, it rained 6 times total and the sprinklers turned on 4 times total. So more fractions:

6 out of 100 days it rained
4 out of 100 days the sprinklers turned on

It might be better to start thinking in terms of probability, right? So what is the probability that the grass is wet given that it rained? Well that would be the total number of times that the grass was wet given that it rained, or Pr(grass wet | rained). The grass was wet 10 times, and of those 10 times, 5 were due to rain; 5 out of 10 or 50%.

So let’s back up a bit. We are starting to get into the laws of probability. So as an example, what is the probability of flipping heads twice in a row? This would be 50% * 50%, which is 25%. Meaning that if you flipped a coin 100 times in a row, you should end up with the sequence heads-heads 25 times. This is a straight multiplication because each flip is independent of each other; my coin choice in the first flip doesn’t determine the type of coin tossed in the second flip. However, if the choices are dependent, then a different multiplication rule applies.

Let’s say that I pick a card from a deck, a queen. What is the probability of picking a queen? 4 out of 52. If I then ask what the probability is of picking a jack of spades next, my first choice obviously affects my second choice. The probability of picking a jack of spades is no longer 1 out of 52 but 1 out 51, since the queen has already been removed. This is dependence. It then becomes Pr(Queen) * Pr(Jack of Spades | Queen). This is 4/52 * 1/51, which is 4 out of 2652. Meaning that the sequence “picked a queen” and then “jack of hearts” is really unlikely since we are basing “jack of hearts” on the previous condition of “picked a queen”. They are dependent.

It just so happens that Pr(Queen)*Pr(Jack of Spades | Queen) = Pr(Jack of Spades)*Pr(Queen | Jack of Spades). Try it out:

4/52 * 1/51 = 1/52 * 4/51

This “x * x given y” also applies to the coin flips. Since the coin flips are independent, the Pr(heads) is equal to Pr(heads | heads) so to find out the probability of flipping heads twice in a row is Pr(heads) * Pr(heads | heads). It is still 25%.

Back to the task at hand. We know that Pr(grass wet | rained) is 50%. We also know that the probability of it raining at all, Pr(rained), is 6 out of 100, or 6%. This means we have Pr(rained)*Pr(grass wet | rained) = Pr(grass wet)*Pr(rained | grass wet). Meaning that the next time we find the grass wet, and we want to find out Pr(rained | grass wet), we have all of the info we need to calculate this: We know what Pr(rained) is, what Pr(grass wet | rained) is, and Pr(grass wet) is. So if we want to figure that out, we divide both sides of that equation Pr(rained)*Pr(grass wet | rained) = Pr(grass wet)*Pr(rained | grass wet) by Pr(grass wet). What does that formula become?

BAYES THEOREM! Pr(rained | grass wet) = Pr(grass wet | rained)*Pr(rained) / Pr(grass wet).

We can also use the same formula if we want to find out Pr(sprinklers on | grass wet), substituting what we have for Pr(rained) with Pr(sprinklers on). And once we know that Bayes Theorem applies in figuring out whether it rained or if the sprinklers turned on, we know that all other probabilistic logic — like extraordinary claims require extraordinary evidence, precision / falsifiability, absence of evidence IS evidence of absence, and independence — also apply.

We can go back to our original plain logic inferences and see if probabilistic logic gives us the same answers. Let’s try modus ponens first.

P1: Pr(B | A) = 100% (“probability B given A” is the equivalent of saying “If A Then B”)
P2: Pr(A) = 100%
C: Pr(B) = Pr(B | A)*Pr(A) = 100%

Modus tollens:

P1: Pr(B | A) = 100%
P2: Pr(B) = 0%
C: Pr(A) = Pr(A | B)*Pr(B) / Pr(B | A) = 0%.

Let’s see how fallacious affirming the consequent is:

P1: Pr(B | A) = 100%
P2: Pr(B | C) = 100%
P3: Pr(B) = 100%
C:Pr(A | B) = Pr(B | A)*Pr(A) / Pr(B)
C:Pr(C | B) = Pr(B | C)*Pr(C) / Pr(B)

As you can see, since we don’t know what the probability of A or C is, we can’t conclude anything about A or C given that B happened. If Pr(A) were 100% then B would also be 100% and that would be modus ponens. Anything less than 100% and we can’t safely conclude A to the exclusion of C or any other conclusion just like affirming the consequent says.

What we should be aware of, however, is that modus ponens itself only works if we have 100% certainty in our premises. If Pr(B | A) was 100% and Pr(A) was 50%, then modus ponens fails, since our conclusion can only be as strong as our weakest premise. Hence Occam’s Razor.

Notice that the logical fallacy affirming the consequent can be viewed as weak Bayesian evidence: If Pr(B | A) > Pr(B | ~A), then Pr(A) is probably more likely than Pr(~A)… which means affirming the consequent — B — might be weak or strong Bayesian evidence for A.

So real life is certainly a probability game. It’s much more of a probability game than a logic game, and we can’t function at all in society without adhering to a system of logic. Like I said, you assumed logic to even comprehend the words in this post, even though logic only works if we have 100% certainty in our premises. We live in the real world, where 100% certainty in anything isn’t reasonable. We live in a world of uncertainty, not logic, making the laws of probability more relevant than logic even though, again, we need to assume logic to even comprehend basic communication and making sense of our everyday lives.

 
Comments Off on “Real Life Isn’t A Probability Game”

Posted by on February 24, 2014 in Bayes

 

Falsifiability and Characters In A Story

20140127-173826.jpg

Another interesting study intersecting psychology with falsifiability:

You have four cards. Flip whichever cards you must to verify the rule If a card has a vowel on one side then it has an even number on the other:

E C 4 5

People do notoriously badly at this game. It’s called the “Wason selection task”. It was mentioned in the sequences a few times. But it turns out, people are much better at this version:

There are four people drinking at a bar. A police officer busts in and needs to verify the rule If a person is drinking alcohol, they must be at least 21. Which of the following must they investigate further?

Beer-drinker, Coke-drinker, 25-year-old, 16-year-old

These problems are logically identical. However, most people suggest flipping 4 while few people suggest checking what the 25 year old is drinking.

More generally, it seems that people can do very well on the Wason selection task if it’s framed in such a way that people are looking for cheaters. (Eliminating the police officer from the above story is sufficient to reduce performance.)

So it seems we have an intuitive understanding of falsifiability if we move from abstract concepts to characters in a story, actively looking for cheaters.

I’m trying to think of examples where I can use characters/cheaters to explain how falsifiability works (or otherwise called precision) other than this, but it makes me think that people would choose based on representativeness.

So instead of using marbles or dice for an example of falsifiability, I think I can use police and a lineup of usual suspects as an example.

Let’s say that someone was murdered. The police line up the usual suspects of mob hitmen. In this case, we have four suspects.

Nate has a tendency to keep it simple. He always uses a .9mm gun to shoot victims. Jerry likes to either strangle or poison his victims, preferring to keep things from getting messy. Bob is known to use any means he can to kill his victims: guns, knives, poisoning, arson, strangling, bombs, throwing people out of airplanes, etc.Dan either strangles or shoots victims and doesn’t bother with any other methods.

The person murdered was found strangled. So, based on this information, and keeping things simple for this thought experiment, which person likely did it? We can rule out Nate since he never strangles victims. Bob, Dan, and Jerry all use strangling, but since Bob is all over the place with his method of taking out people he is the least likely to have strangled someone in this case. Dan and Jerry are equally as likely to have been the one to do it.

Obviously, you would have to also take into account prior probabilities, like how often each person actually kills someone for the mob if this were a real situation. But with all else equal, Nate is the least likely to have done this hit, followed by Bob, and then Dan and Jerry tie for most likely.

Of course, this remains to be seen whether people actually do better at eliminating Nate, placing Bob in second to last, and focusing on Dan and Jerry. I’ll have to ask some friends or something. But one point that I think might hinder it is representativeness; they might see Bob’s penchant for variety in his choice of murder and implicitly use that large number as a substitute for prior probability. In other words, they might see the method of murder and substitute it with the number of murders. In that view, Nate has only killed one person, Dan and Jerry have killed two, and Bob has killed a multitude… even though the number of people each person has murdered is never given in this thought experiment.

 
Comments Off on Falsifiability and Characters In A Story

Posted by on January 27, 2014 in Bayes, cognitive science, rationality

 
 
NeuroLogica Blog

My ὑπομνήματα about religion

Slate Star Codex

The Schelling Point for being on the Discord server (see sidebar) is Wednesdays at 10 PM EST

Κέλσος

Matthew Ferguson Blogs

The Wandering Scientist

Just another WordPress.com site

NT Blog

My ὑπομνήματα about religion

Euangelion Kata Markon

A blog dedicated to the academic study of the "Gospel According to Mark"

PsyPost

My ὑπομνήματα about religion

PsyBlog

Understand your mind with the science of psychology -

Vridar

Musings on biblical studies, politics, religion, ethics, human nature, tidbits from science

Maximum Entropy

My ὑπομνήματα about religion

My ὑπομνήματα about religion

My ὑπομνήματα about religion

atheist, polyamorous skeptics

Criticism is not uncivil

Say..

My ὑπομνήματα about religion

Research Digest

My ὑπομνήματα about religion

Disrupting Dinner Parties

Feminism is for everyone!

My ὑπομνήματα about religion

The New Oxonian

Religion and Culture for the Intellectually Impatient

The Musings of Thomas Verenna

A Biblioblog about imitation, the Biblical Narratives, and the figure of Jesus