RSS

Category Archives: Bayes

What Are The Strongest Arguments For Atheism?

The strongest arguments for atheism aren’t really related to atheism, but about how the human mind works and what makes something a good explanation.

We know why we have particular emotions. Anger protects us from harm, and thus dying. Feelings of friendship/bonding with others gives us access to resources and mates; it’s very hard to live alone without having had others to teach you how to do so or build the infrastructure that allows this. Many other animals have these same emotions, and for the same reasons: to not die and/or perpetuate their genes. The ones that don’t have these emotions usually die pretty quickly.

Why would a god have these emotions? There’s no underlying reason for a god to love or to get angry. A god that loves makes about as much sense as a god with a penis.

And then there are the reasons why we believe things in the first place. How are our beliefs formed? For many of the things we believe, we do so because of our feeling of certainty:

A newspaper is better than a magazine. A seashore is a better place than the street. At first it is better to run than to walk. You may have to try several times. It takes some skill, but it is easy to learn. Even young children can enjoy it. Once successful, complications are minimal. Birds seldom get too close. Rain, however, soaks in very fast. Too many people doing the same thing can also cause problems. One needs lots of room. If there are no complications, it can be very peaceful. A rock will serve as an anchor. If things break loose from it, however, you will not get a second chance.

Is this paragraph comprehensible or meaningless? Feel your mind sort through potential explanations. Now watch what happens with the presentation of a single word: kite. As you reread the paragraph, feel the prior discomfort of something amiss shifting to a pleasing sense of rightness. Everything fits; every sentence works and has meaning. Reread the paragraph again; it is impossible to regain the sense of not understanding. In an instant, without due conscious deliberation, the paragraph has been irreversibly infused with a feeling of knowing.

Try to imagine other interpretations for the paragraph. Suppose I tell you that this is a collaborative poem written by a third-grade class, or a collage of strung-together fortune cookie quotes. Your mind balks. The presence of this feeling of knowing makes contemplating alternatives physically difficult.

Did you get the same inability to explain the paragraph using some other concept? Take note of that: You really don’t have any control over how certain you feel about things. Just like other emotions, the feeling of certainty is generated unconsciously. The next obvious question would be “What sort of brain algorithm generates your feeling of certainty?” More on that below.

Experience teaches us what stimuli make us angry, or jealous, or happy, sad, etc. Sometimes, the feeling is unwarranted and using our self-reflection we can determine that feeling angry about a particular situation isn’t justified. What’s dangerous is this: Our feeling of certainty feels good. At least, it’s much more pleasant than the feeling of uncertainty. And in that sense, we generally never stop to reflect on why our feeling of certainty might not be correct. Unlike with, say, jealousy.

The rabbit hole of why we believe what we do goes a lot further than this. Books like Thinking, Fast and Slow about our cognitive biases go into a lot of this. The major premise of that book is that we have two types of thought engines. A “fast” engine (System 1) and a “slow” engine (System 2). These two engines are good at different tasks: the fast one is good at recognizing faces or voices, the slow one is good at math. The fast one is good at social interaction, the slow one is good for abstract/impersonal concepts.

Generally, the fast engine is the one that’s in charge, and is responsible for telling the slow engine to start up (the fast one is also the one responsible for the feeling of certainty). Problem is, the fast engine has to be trained on when a task should be handled by itself or when it should give a problem over to the slow engine. It’s not very good at doing this intuitively. But for many of us, a problem might have been answered already by the fast engine and when challenged that’s the only time the fast engine uses the slow engine: To defend the fast one’s conclusion. And a lot of the time, the fast one’s conclusion will be for some social goal: Status, friendship, not ending up dead, and so on.

Our brains are actually more complicated, or modular, than the System 1 and System 2 way of explaining it. There actually seem to be multiple modules in our brains, and the ones that use information don’t explain their “reasoning” to the ones that talk to the outside world. Our brains are more like Congress, with some congresspeople acting on behalf of the overall “fast” engine or “slow” engine. The you that you feel is “you”, speaking to the outside world, is more like the press secretary for Congress.

There are a few experiments that show that when communication is physically severed between the two halves of the brain, each side of the brain gets different information. Yet, the part of the brain that does the speaking might not be the part of the brain that has the information. So you end up with rationalizations like split brain patients grabbing a shovel with their left hand (since their left eye was shown snow) while their right eye sees a chicken. When asked to explain why they grabbed the shovel, they — well, the side of their brain that only sees the chicken — make up an explanation, like the shovel is used to scoop up chicken poop! That press secretary, pretty quick on his feet.

But this doesn’t just happen with split brain patients. It seems to happen a lot more than we think, in our normal, everyday brains.

So for example, there was one experiment where people were asked to pick their favorite pair of jeans out of four (unbeknownst to them) identical pairs of jeans. A good portion of the people picked the jeans on the right, since they looked at the jeans from left to right. But they were unaware that that was their decision algorithm, and they rationalized their decision by saying they liked the fabric or the length or some other non-discriminating fact about the jeans. Liking the fabric of one pair of jeans more than the others was demonstrably false since the jeans were identical, yet that was the reason they gave. There’s still no persistent across the isle partisanship in your fully functioning brain, so the press secretary has to still come up with a good, socially acceptable story about Congress’ decision for the general public’s consumption. The part of our brain that ‘reasons’ and explains our actions, neither makes decisions, nor is even privy to the real cause of our actions.

The tl;dr version is this. Our brains are good at social goals. And unless we’ve been trained on it, it’s not so good at forming true beliefs about the non-social world. If we had some machine was was designed to analyze electromagnetic radiation as seen in space and pointed that machine at its own circuitry, it would interpret everything about itself in the manner of cosmic rays. Similarly, if we had a machine (our brain) that interprets everything through the lens of social interaction, and pointed it at the universe, it would interpret everything in the universe as some manifestation of social rules.

And this is what happens. Our default is to treat a lot of non-social things as social. It’s why things like animism and magical thinking are prevalent. It’s why we call planets “planets” (Greek for wanderer), the Milky Way a galaxy (gala is Greek for milk. In our case, Hera’s milk). If someone “thinks really hard” about a problem, they’re more than likely using the tools meant for social problems, not the tools meant for solving non-social questions.

So if we don’t have control over our feeling of certainty, what’s a System 2 way of making sure that we have correct beliefs about non-social things? How can we be sure that we aren’t just defending a belief that we initially arrived at unconsciously? Or, more generally, what are some unbiased traits that good explanations share? What makes something a bad explanation? Since we’re operating under uncertainty (since we can’t trust our feeling of certainty), we have to use methods for explaining our uncertainty logically and consistently.

Let’s look at the Linda problem:

Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations.

Which is more probable?

  1. Linda is a bank teller.
  2. Linda is a bank teller and is active in the feminist movement.

What does this have to do with good explanations? Most people will say that it’s more likely that Linda is a bank teller and is active in the feminist movement. While that might seem true socially (i.e., being a feminist and a bank teller seems to tell a better story about Linda), it’s mathematically impossible; the population of people who are bank tellers is larger than the population of people who are bank tellers and in the feminist movement. That’s why this is called the conjunction fallacy. The conjunction of A and B is necessarily smaller than A individually (or B individually).

All else being equal, a good explanation has fewer unnecessary conjunctions than bad ones. Taken further, an explanation with two known facts as conjunctions is more likely than an explanation with one known fact and one unknown fact. The more unknown facts you use to support your explanation, the less likely it is. This is generally called Occam’s Razor.

So for example, a noise at night. A tree hitting your window at night has fewer assumptions that need to be true than an alien invasion in your house. Trees hitting windows only require things that we already know to be true. Alien invasions require a lot more things to be happening in the world that we don’t know to be true (e.g., the possibility of intelligent alien life, of interstellar/intergalactic travel) than just trees and wind. That goes into the next thing that good explanations have.

Good explanations are more commonplace (more mundane) than bad ones. If you’re walking down the street and hear hooves clicking on the street, it’s probably a deer or a horse. Not a zebra or a cow. Or a hooved alien from Jupiter. The corollary for commonplace is that extraordinary claims require extraordinary evidence. Hearing hooves isn’t unlikely enough to suggest that what you hear is a hooved alien from Jupiter. You need evidence that’s a lot less likely to happen than that.

Another facet of good explanations is that they explain only what they intend to explain and very little else. It’s the difference between using bug spray to kill a spider over setting fire to your house to kill it; good explanations are precise in what they explain.

As an example, let’s say you’re a student at uni. You know one of your TAs, Anna, only uses red ink when grading papers. But the other TA, Jill, uses a variety of colored ink (red, blue, green, black, orange, purple, etc.) to grade the papers. You get your grade on a quiz back one day and it’s a grade you disagree with. The ink on it is red. Based on only this information (e.g., assume they’ve graded equal amounts of papers at this point and they have similar handwriting), which TA was more likely to have graded your paper? It’s certainly possible that Jill did, after all, she has been known to use red, but it’s more likely that Anna was the one who graded it since she only uses red. The lesson here is that, the more possible things your explanation can explain, the less likely it is to explain a particular instance.

Now notice the words I’m using: Likely, probably, possible. I’m not reinventing the wheel by saying that we need a logical framework for dealing with uncertainty, and one has already been created: Probability theory. For the conjunction fallacy, this works. The conjunction of 90% and 50%, or 90% * 50% is less than both 90% and 50% (it’s 45%). Commonplace is another way of saying prior probability. And when we talk about prior probability, we’re usually talking about Bayes theorem.

Now, Pr(Claim | Evidence) reads “the probability of claim given evidence”. The short formulation of Bayes Theorem (BT) is Pr(Claim | Evidence) = Pr(Evidence | Claim) * Pr(Claim) / Pr(Evidence). An extraordinary claim, that is, a low prior probability claim, needs a correspondingly low probability evidence. And if you have some equation that is 100 * 4 / 5, the result will be a lot closer to 100 than it is to 4 or 5.

BT also explains why Anna was more likely to have graded the paper than Jill. Let’s say Anna is represented as dice that is 1s on all sides, and Jill is normal 6 sided dice (it’s the reason I picked 6 colors for Jill above). Let’s further say you have a jar filled in equal amounts with the normal 6 sided dice and the 1 sided dice; the jar is 50 / 50 of each. You’re blindfolded, told to pull a die from the jar and roll it. You’re told that you rolled a 1. What’s the probability that you grabbed the Anna dice (the 1s on all sides) or you grabbed the Jill dice (normal 1 – 6 dice)? The probability of rolling a 1 given the 1 dice is 100%. The probability of rolling a 1 given the 1 – 6 dice is 1 / 6, or around 16%.

For this we use the long form of BT: Pr(Anna | One) = Pr(One | Anna) * Pr(Anna) / [ Pr(One | Anna) * Pr(Anna) ] + [ Pr(One | Jill) * Pr(Jill) ]. What we end up with is around an 86% chance that you grabbed the Anna dice. If you follow this, you can tell that the more possible numbers the Jill dice has, the less likely it is that it can account for rolling a 1. Another way of phrasing “precision” is that there’s a punishment for spreading yourself too thin, of trying to hedge all bets, when trying to explain something.

So, tl;dr the qualities of good explanations are that they are on the likelier side of Occam’s Razor, are mundane, and precise. There are others, but this is probably (heh) getting too long.


Notice that I hardly ever mentioned god or atheism in these sections. Especially the second part. That’s because I think the strongest arguments for atheism aren’t about atheism per se, but are in general strong arguments for good thinking. They take into account our imperfections as human beings, especially in regards to how people think and act, and attempts to account for those failings. It seems to me that god(s) are what happen when social brains are trying to explain a fundamentally impersonal universe. And when that happens, those personal explanations for impersonal events tend to fail the logic of dealing with uncertainty.

Advertisements
 

How do you explain the mysterious beauty of this planet without referring to a supreme being?

I’m from NYC. This question would be the equivalent of me saying “how do you explain that I was born in the most awesome city on the planet without a supreme being?”

Of course, almost everyone says their home town is the best ever. Why do you think that is? I think the answer to that is the same as the answer to your question.

But let’s get a bit deeper into the assumptions behind your question. What’s your logical link from “Earth is beautiful” to “therefore a supreme being”? In other words, what makes something a good explanation?

If I were to say that something is a chair, there are qualities that chairs have in common that define them as chairs instead of beanbags: Chairs have four legs, a back support part, a part to sit on, etc. There should be some similar consistent criteria for what constitutes a good explanation, and why you think this creates the necessary link between “Beautiful Earth” and “Supreme Creator”.

If you get home late and your boyfriend/girlfriend asks why you’re late, what would be a good explanation? Why is “I got stuck in traffic” better than “I was kidnapped by aliens”? We know the former is more believable, but why?

Well, you might say something like “traffic causes people to be late more than getting kidnapped by aliens does”. And that would be correct. But I argue that this isn’t enough to separate good explanations from bad explanations, and it isn’t enough to explain why your link from “Beautiful Earth” to “Supreme Being” is a strong or weak link.

Since this isn’t a dialog, I’ll have to just explain another quality of a good explanation: Good explanations are specialized. Meaning, they explain what they intend to explain and that’s it. An explanation that can be used to explain some situation, but then can also be used to explain its polar opposite, isn’t a good explanation.

So, if instead of getting home late, you got home early, and your boyfriend/girlfriend asks why you’re early, then saying “because I got stuck in traffic” doesn’t make sense. The stuck-in-traffic explanation is specialized for only making people late. But “I got kidnapped by aliens” works just as well for making someone late as it does for making someone early. Once you invoke aliens, then anything is possible.

main-qimg-a8973bd9618be2fcfa54b4fe0875dbb0-c

Let me repeat that last sentence more generally: Once you invoke [bad explanation], anything is possible.

This is a real important concept to grasp. Bad explanations, because they’re not specialized, allow for any possible outcome. And the more possibilities your explanation allows, the less likely it is that your explanation is responsible for a specific problem. There’s only one explanation that can allow for any possible outcome: Pure randomness.

Both qualities of good explanations I’ve enumerated here — a good explanation is more commonplace (e.g., “traffic causes people to be late more than getting kidnapped by aliens does”) and more specialized — follow directly from probability theory. So they’re not things I’ve just made up.

So back to the question at hand: How do you explain the mysterious beauty of this planet without referring to a supreme being? Why do you think a supreme being is a good explanation? Are supreme beings commonplace? Are supreme beings only responsible for beauty, or is anything possible for a supreme being?

I think we know the answers to those questions.

 
Comments Off on How do you explain the mysterious beauty of this planet without referring to a supreme being?

Posted by on May 14, 2018 in Bayes, Quora answers, religion

 

The Monty Hall Problem Refutes Your Religion

Well the title of this post is a bit inflammatory. So I won’t be arguing that it “refutes” your religion, but will be arguing more that it’s weak Bayesian evidence against your religion.

So. The Monty Hall problem is an illustration of how our intuitions of probability don’t always match up with reality. In its original formulation, you’re given a choice between three doors. One door has a prize, the other two do not. If you choose one of the doors, then another door that doesn’t have a prize is shown to you. You then have the option of staying with the door you chose or switching doors.

Most people think that it either doesn’t matter whether you switch or that switching lowers your probability of winning. Neither of those is true!

Your initial probability of winning the prize is 1 out of 3. Once one of the doors is opened, the probability that you had picked the correct door stays at 1 out of 3 whereas the other non-picked door now contains the remaining probability of 2 out of 3. Because you have to do a Bayesian update once new information — in this case, the one door revealed to not have the prize — is introduced.

I’ve gone over this before. Yet, I want to add an additional wrinkle to the problem to make intuition fall more in line with Bayesian reasoning.

If, instead of picking one door out of three to win the prize, what if it were one door out of 100? And once you’ve made your selection, 98 other doors are opened up to show that they have no prize, leaving only your choice and one other unknown door? In this case it seems more obvious that something is suspicious about the only other door that wasn’t opened up. And this intuition lines up with a Bayesian update using the same scenario:

P(H): 1 out of 100 or .01

P(~H): 99 out of 100, or .99

P(E | H): Probability of all other doors besides yours and one other being opened to reveal no prize given that you’ve picked the correct door: 100%.

P(E | ~H): Probability of all other doors besides yours and one other being opened to reveal no prize given that you’ve picked the incorrect door is 100%.

This is an easy Bayesian update to do. Both conditional probabilities, P(E | H) and P(E | ~H) are both 100%. Meaning the likelihood ratio is 1, and your posterior probability is the same as your prior probability. But now your selection is still 1 out of 100 and the only other remaining door has a probability of 99 out of 100 of having a prize! So in this case, both Bayesian reasoning and intuition line up: There is something suspicious about the only other door that wasn’t opened.

How does this relate to religion? Specifically, the religion that you grew up with?

Using Willy Wonka’s logic in the meme above, the chance that you just happened to grow up with the correct religion is pretty low. Instead of the chance of picking the correct door out of 3, or out of 100, you’ve picked a door out of thousands of religions; many of which no longer exist. They are “opened doors” revealing no prize in the analogy.

So a Bayesian update will work the same way as it did with picking one door out of 100. Meaning, your religion is probably wrong. And you should probably switch religions. The only reason I say this is weak Bayesian evidence is because there are still a few religions to choose from. But their joint probability of being correct is yet higher than the single chance that your family religion is the correct one.

Analogously, it would be like if, say, you had a choice between choosing one door out of 10,000, and after your choice all but 10 of the doors are closed. Your initial chance of having chosen the correct door is still 1 out of 10,000, but the 10 doors that remained open after closing the rest have a joint probability of 9,999 out of 10,000 of being the correct door: Those 10 other doors individually have (approximately) 10% chance of being the correct door. As opposed to your original selection’s probability of 1 out of 10,000.

So the Monty Hall problem is weak Bayesian evidence against your religion.

 
2 Comments

Posted by on March 5, 2018 in Bayes, religion

 

Can Subjective Probability Be Expressed As A Number? What Does The CIA Say?

The Psychology of Intelligence Analysis

Amazon.com summary:

This volume pulls together and republishes, with some editing, updating, and additions, articles written during 1978-86 for internal use within the CIA Directorate of Intelligence. The information is relatively timeless and still relevant to the never-ending quest for better analysis. The articles are based on reviewing cognitive psychology literature concerning how people process information to make judgments on incomplete and ambiguous information. Richard Heur has selected the experiments and findings that seem most relevant to intelligence analysis and most in need of communication to intelligence analysts. He then translates the technical reports into language that intelligence analysts can understand and interpreted the relevance of these findings to the problems intelligence analysts face.

Money quote, chapter 12 pages 152 – 156

Expression of Uncertainty

Probabilities may be expressed in two ways. Statistical probabilities are based on empirical evidence concerning relative frequencies. Most intelligence judgments deal with one-of-a-kind situations for which it is impossible to assign a statistical probability. Another approach commonly used in intelligence analysis is to make a “subjective probability” or “personal probability” judgment. Such a judgment is an expression of the analyst’s personal belief that a certain explanation or estimate is correct. It is comparable to a judgment that a horse has a three-to-one chance of winning a race.

Verbal expressions of uncertainty—such as “possible,” “probable,” “unlikely,” “may,” and “could”—are a form of subjective probability judgment, but they have long been recognized as sources of ambiguity and misunderstanding. To say that something could happen or is possible may refer to anything from a 1-percent to a 99-percent probability. To express themselves clearly, analysts must learn to routinely communicate uncertainty using the language of numerical probability or odds ratios. As explained in Chapter 2 on “Perception,” people tend to see what they expect to see, and new information is typically assimilated to existing beliefs. This is especially true when dealing with verbal expressions of uncertainty.

By themselves, these expressions have no clear meaning. They are empty shells. The reader or listener fills them with meaning through the context in which they are used and what is already in the reader’s or listener’s mind about that context. When intelligence conclusions are couched in ambiguous terms, a reader’s interpretation of the conclusions will be biased in favor of consistency with what the reader already believes. This may be one reason why many intelligence consumers say they do not learn much from intelligence reports.

It is easy to demonstrate this phenomenon in training courses for analysts. Give students a short intelligence report, have them underline all expressions of uncertainty, then have them express their understanding of the report by writing above each expression of uncertainty the numerical probability they believe was intended by the writer of the report. This is an excellent learning experience, as the differences among students in how they understand the report are typically so great as to be quite memorable.

In one experiment, an intelligence analyst was asked to substitute numerical probability estimates for the verbal qualifiers in one of his own earlier articles. The first statement was: “The cease-fire is holding but could be broken within a week.” The analyst said he meant there was about a 30-percent chance the cease-fire would be broken within a week. Another analyst who had helped this analyst prepare the article said she thought there was about an 80-percent chance that the cease-fire would be broken. Yet, when working together on the report, both analysts had believed they were in agreement about what could happen. Obviously, the analysts had not even communicated effectively with each other, let alone with the readers of their report.

Sherman Kent, the first director of CIA’s Office of National Estimates, was one of the first to recognize problems of communication caused by imprecise statements of uncertainty. Unfortunately, several decades after Kent was first jolted by how policymakers interpreted the term “serious possibility” in a national estimate, this miscommunication between analysts and policymakers, and between analysts, is still a common occurrence.

I personally recall an ongoing debate with a colleague over the bona fides of a very important source. I argued he was probably bona fide. My colleague contended that the source was probably under hostile control. After several months of periodic disagreement, I finally asked my colleague to put a number on it. He said there was at least a 51-percent chance of the source being under hostile control. I said there was at least a 51-percent chance of his being bona fide. Obviously, we agreed that there was a great deal of uncertainty. That stopped our disagreement. The problem was not a major difference of opinion, but the ambiguity of the term probable.

The table in Figure 18 shows the results of an experiment with 23 NATO military officers accustomed to reading intelligence reports. They were given a number of sentences such as: “It is highly unlikely that. . . .” All the sentences were the same except that the verbal expressions of probability changed. The officers were asked what percentage probability they would attribute to each statement if they read it in an intelligence report. Each dot in the table represents one officer’s probability assignment.

While there was broad consensus about the meaning of “better than even,” there was a wide disparity in interpretation of other probability expressions. The shaded areas in the table show the ranges proposed by Kent.

The main point is that an intelligence report may have no impact on the reader if it is couched in such ambiguous language that the reader can easily interpret it as consistent with his or her own preconceptions. This ambiguity can be especially troubling when dealing with low-probability, high-impact dangers against which policymakers may wish to make contingency plans.

Consider, for example, a report that there is little chance of a terrorist attack against the American Embassy in Cairo at this time. If the Ambassador’s preconception is that there is no more than a one-in-a-hundred chance, he may elect to not do very much. If the Ambassador’s preconception is that there may be as much as a one-in-four chance of an attack, he may decide to do quite a bit.

The term “little chance” is consistent with either of those interpretations, and there is no way to know what the report writer meant. Another potential ambiguity is the phrase “at this time.” Shortening the time frame for prediction lowers the probability, but may not decrease the need for preventive measures or contingency planning.

An event for which the timing is unpredictable may “at this time” have only a 5-percent probability of occurring during the coming month, but a 60-percent probability if the time frame is extended to one year (5 percent per month for 12 months). How can analysts express uncertainty without being unclear about how certain they are? Putting a numerical qualifier in parentheses after the phrase expressing degree of uncertainty is an appropriate means of avoiding misinterpretation. This may be an odds ratio (less than a one-in-four chance) or a percentage range (5 to 20 percent) or (less than 20 percent). Odds ratios are often preferable, as most people have a better intuitive understanding of odds than of percentages.

I’ll probably (heh) use the ranges in the figure to do Bayesian updates in the app I’m coding.

 
2 Comments

Posted by on December 27, 2017 in Bayes

 

Probability Only Exists In Your Head

I’ve already written about this before but I’ve thought of another way of explaining this.

As I wrote in that post that I linked to above, probabilities aren’t facts about objects or phenomena that we look at or experience. If you flip a coin and it lands heads twice, the probability of it landing tails on the third flip is the same as the probability of it landing heads on that third flip.

But people who think that probability is an aspect of the coin similar to its weight or its color will think that 50% probability is physically tied to the coin, so it *must* account for the lack of landing tails on the next flip. As though there is a god of coin flips who has to make sure that the books are accounted for.

Again, this is wrong. And this next scenario I think explains why.

In a standard deck of cards, there’s a 1/52 chance of pulling any specific card, right? What if we have two people, Alice and Bob, who want to pull from the deck. Except, Alice has memorized the order of the cards in the deck and Bob hasn’t.

What is the probability of Bob drawing an Ace of Spades on the first draw? For us and Bob, it’s 1/52. But for Alice — because she’s memorized the order of the cards — it’s virtually certain (e.g., 99.99% or 0.000…1%) to her which card Bob will draw.

If 1/52 was some intrinsic aspect of the deck of cards, then how can there be two different probabilities? Obviously, because probability is a description of our uncertainty. It only exists in our minds. The reader of that thought experiment and Bob are operating under uncertainty. Alice, on the other hand, is not because she’s memorized the order of the cards.

Furthermore, Bayes is all about updating on new evidence. What if there was some third actor, Chad, who mixed up the deck of cards outside of Alice’s knowledge? Now, Alice may think that the next card’s probability is either 100% or 0%, but this is not true either. Now Chad has the certainty.

If Bob draws a card that Alice doesn’t think he should draw, how can she possibly do a Bayesian update on either 0% or 100%? She has to do the equivalent of moving faster than the speed of light in order to update; it literally takes infinite bits of data in order to update from 0% or 100% to some other number. Try it:

P(H | E) = P(E | H) * P(H) / P(E)
50% = ??? * 0% / 1.9% or

50% = ??? * 100% / 1.9%

This situation can be repeated over and over again, introducing new characters manipulating the deck outside of other people’s knowledge. And this demonstrates that not only is probability subjective and in your head, but that a Bayesian probability of 0% or 100% is not a probability at all because those numbers cannot be updated.

 
Comments Off on Probability Only Exists In Your Head

Posted by on September 14, 2017 in Bayes

 

Simpson’s Paradox And The Positive/Negative Effect of Religious Belief

While not necessarily related to Bayes Theorem, something like this has been popping up in my mind whenever I read news stories dealing with statistics so I thought I would make a post about it.

In simplest terms, aggregate data might have different statistical properties than subsets of the aggregate data. As a matter of fact, the aggregate data might show the completely opposite effect when looked at in subsets.

An intuitive example of this is weather. You can average the temperature over the course of the year, or you could find the average of temperature over the course of six months. It might be that temperature over the course of the year has a slightly positive upward slope, yet temperature from June to December has a negative slope.

This seems obvious. But what if you’re dealing with something that’s not so obvious?

The example Wikipedia gives that I think is a non-controversial example is kidney stone treatment. Say you have Treatment A for either large or small kidney stones and Treatment B for large or small kidney stones.

Treatment A is effective on 81 out of 87 (93%) small kidney stones while Treatment B is effective on 87% (234/270) small kidney stones. For large kidney stones, Treatment A is effective 73% (192/263) of the time and Treatment B is effective 69% (55/80) of the time.

Clearly, Treatment A is what you should use for both small and large kidney stones. But what happens when we aggregate over both small and large kidney stones? Treatment A is 81/87 + 192/263 = 273/350 (78%) while Treatment B is 234/270 + 55/80 = 289/350 (83%). Now it turns out that Treatment B is better than Treatment A!

Therein lies Simpson’s Paradox. What happens when we have something controversial? Wikipedia also has the example of apparent sexism in graduate school admissions (which it still seems like no one has tried to account for this paradox when talking about modern controversies like the gender wage gap). But this is mainly a religion blog: So what about whether religion is good or bad for people or society?

Very religious Americans […] have high overall wellbeing, leading healthier lives, and are less likely to have ever been diagnosed with depression… These positive associations between religious engagement and the good life are reverse when comparing more versus less religious places rather than individuals…

Gallup World Poll data from 152 countries [show] a striking negative correlation between these countries’ population percentages declaring that religion is “important in your daily life” and their average life satisfaction score…

Across US states, religious attendance rates predict modestly lower emotional well-being…

Epidemiological studies reveal that religious engagement predicted longer life expectancy…

Across states, religious engagement predicts shorter life expectancy…

Across states religious engagement predicts higher crime rates. But across individuals, it predicts lower crime rates…

If you want to make religion look good, cite individual data. If you want to make it look bad, cite aggregate data…

Stunning individual versus aggregate paradoxes appear in other realms as well. Low-income states and high-income individuals have [recently] voted Republican…

Liberal countries and conservative individuals express greater well-being…

Highly religious states, and less religious individuals, do more Google “sex” searching…

One might wonder if the religiosity-happiness association is mediated by income — which has some association with happiness. But though richer people are happier than poor people, religiously engaged individuals tend to have lower incomes — despite which, they express greater happiness.

This is from a conference paper. I’m not actually sure if this is an example of Simpson’s Paradox, but the larger point remains. Breaking up data along different axes might yield paradoxical results. As the author says, if you want to make religion look bad, cite aggregate data. If you want to make religion look good, cite individual data.

But which statistic should one use? The aggregate data or the individual data? They’re both true, for lack of a better word, so it’s not like one is “lying”. I would tend to lean towards using the aggregate data if forced to choose. But there’s no harm in looking at both. And if both paint the same picture that just means that you have a more complete view of the phenomenon at hand.

 
Comments Off on Simpson’s Paradox And The Positive/Negative Effect of Religious Belief

Posted by on June 26, 2017 in Bayes, economics/sociology, religion

 

Probability: The Logic of the Law

bayes theorem

While poking around on JSTOR (thanks, grad school!) I found an interesting article in the Oxford Journal of Legal Studies called “Probability – The Logic of the Law“. In it, Bernard Robertson and G. A. Vignaux argue that probability is, you guessed it, the logic behind legal analysis and arbitration.

So not only do we have arguments in favor of probability being the logic of science (Jaynes), probability being the logic of historical analysis (Tucker, Carrier), but we now have an argument that probability is the logic of the legal world, too.

Here’s how Robertson and Vignaux derive Bayes Theorem in the article:

It has been argued that the axioms of probability do not apply in court cases, or that court cases out not to be thought about in this way even if they do apply. Alternatively, it is argued that some special kind of probability applies in legal cases, with its own axioms and rules… with the result that conventional probability has become known in the jurisprudential word Pascalian… In practice one commonly finds statements such as:

The concept of ‘probability’ in the legal sense is certainly different from the mathematical concept; indeed, it is rare to find a situation in which these two usages co-exist, although when they do, the mathematical probability has to be taken into assessment of probability in the legal sense and given its appropriate weight

This paper aims to show that this view is based upon a series of false assumptions.

The authors then go into some detail about common objections to the “mathematical” view of probability and why people think it doesn’t apply to the law:

1. Things either Happen or They Don’t; They Don’t Probably Happen

An example of this argument is provided by Jaffee: ‘Propositions are true or false; they are not “probable”

2. A Court is Concerned not with Long Runs but with Single Instances

Thus, descriptively:

Trials do not typically involve matters analogous to flipping coins. They involve unique events, and thus there is no relative frequency [my emphasis] to measure

And normatively:

Application of substantive legal principles relies on, and due process considerations require, that triers must make individualistic judgements about how they think a particular event (or series of events) occurred

3. Frequency Approaches Hide Causes and Other Relevant Information which Should Be Investigated

For an extended example of this argument see Ligertwood Australian Evidence (p14)

4. Evidence Must Be Interpreted

The implicit conception [in the probability debate] of ‘evidence’ is that which is plopped down on the factfinder at trial… the evidence must bear its own inferences… each bit of evidence manifests explicitly its characteristics. This assumption is false. Evidence takes on meaning for trials only through the process of being considered by a human being… the underlying experiences of each deliberator become part of the process, yet the probability debates proceed as though this were not so

5. People Actually Compare Hypotheses

Meaning is assigned to trial evidence through the incorporation of that evidence into one or more plausible stories which describe ‘what happened’ during events testified to at trial …The level of acceptance will be determined by the coverage, coherence and uniqueness of the ‘best’ story.

6. Assessment of Prior Odds ‘Appears to Fly in the Face of the Presumption of Innocence’

7. The Legal System is Not Supposed to be Subjective

Allen refers to

the desire to have disputes settled by reference to reality rather than the subjective state of mind of the decision maker

As you can see, a lot of the objections to probability here are continually raised in the frequentist vs. Bayesian interpretation of probability. But following in the steps of E. T. Jaynes, Robertson and Vignaux demonstrate that probability can be derived from some basic assumptions about propositional logic.

The authors then go on to explain the different “types” of probability, which is probably (heh) sowing confusion:

A priori probability refers to cases where there are a finite number of possible outcomes each of which is assumed to be equally probable. Probability refers to the chance of a particular outcome occurring under these conditions. Thus there is a 1 in 52 chance of drawing the King of Hearts from a pack of cards under these conditions and the axioms of probability can be used to answer questions like: ‘what is the probability of drawing a red court card?’ or ‘what is the probability of drawing a card which is (n)either red (n)or a court card?’

Empirical probability refers to some observation that has been carried out that in a series Y event X occurs in a certain proportion of cases. Thus surveys of weather, life expectancy, reliability of machinery, blood groups, will all produce figures which may then be referred to as the probability that X will occur under conditions Y.

Subjective probability refers to a judgement as to the chances of some event occurring based upon evidence. Unfortunately, Twining treats any judgement a person might make and might choose to express in terms of ‘probability’ as a ‘subjective probability’. This leads him to say that subjective probabilities ‘may or may not be Pascalian’.

[…]

This analysis of probability into different types invites the conclusion that ‘mathematical probability’ is just one type of probability, perhaps not appropriate to all circumstances… The adoption of any of the definitions of probability other than as a measure of strength of belief can lead to an unfortunate effect known as the Mind Projection Fallacy. This is the fallacy of regarding probability as a property of objects and processes in the real world rather than a measure of our own uncertainty. [my emphasis]

An instance of this fallacy is something called the Gambler’s fallacy. Indeed, in that post of mine I pretty much wrote what I emphasized in the quote above.

The authors then point out something pretty obvious: That flipping a coin is subject to the laws of physics. If we knew every single factor that went into each coin toss (e.g., strength of the flip, density of the air, the angle in which it was flipped, how long it spins in the air, the firmness of the surface it is landing on, etc.) we would know which side of the coin would be facing up without any uncertainty.

However, we don’t know every factor that goes into a coin toss, or drawing cards from a deck, or marbles from a jar (including the social influences of the marble picker). So there is a practical wall of separation between epistemology and ontology; a wall between how we know what we know and the actual nature of what we’re observing.

The authors continue with three minimal requirements for rational analysis of competing explanations:

Desiderata:

1. If a conclusion can be reasoned out in more than one way, then every possible way should lead to the same results.

2. Equivalent states of knowledge and belief should be represented by equivalent plausibility statements. Closely approximate states should have closely approximate expressions; divergent states should have divergent expressions.

The only way consistently to achieve requirement 2 is by the use of real numbers to represent states of belief. It is an obvious requirement of rationality that if A is greater than B and B is greater than C then A must be greater than C. It will be found that any system which obeys this requirement will reduce to real numbers. Only real numbers can ensure some uniformity of meaning and some method of comparison.

3. All relevant information should be considered. None should be excluded for ideological reasons. If this requirement is not fulfilled then obviously different people could come to different conclusions if they exclude different facts from consideration.

Clearly the legal system does exclude evidence for ideological reasons. Rules about illegally obtained evidence and the various privileges constitute obvious examples. It is important therefore, that there should be some degree of consensus as to what information is to be excluded in order to prevent inconsistent results. It is also important that we are explicit about exclusions for ideological reasons and do not pretend to argue that better decisions will be made by excluding certain evidence. This pretence is one of the justifications for the hearsay rule, for example, and it is clear from these cases from a variety of jurisdictions that judges are increasingly impatient with this claim.

The next section I will try to sum up where possible:

Rules to Satisfy the Desiderata:

1. The statement ‘A and B are both true’ is equivalent to the statement ‘B and A are both true’.

2. It is certainly true that A is either true or false.

The statment ‘A and B are both true’ can be represented by the symbol ‘AB’. So proposition 1 becomes ‘AB = BA’.

This is the basic rule for conjunction in propositional logic. P ^ Q is equivalent to Q ^ P.

How do we assess the plausibility of the statement AB given certain information I, symbolically P(AB | I)?

First consider the plausibility of A given I, P(A | I), then the plausibility of B given I and that A is true, P(B | A, I)… Thus in order to determine P(AB | I) the only plausibilities that need to be considered are P(A | I) and P(B | A, I). Since P(BA | I) = P(AB | I) (above)… [c]learly, P(AB | I) is a function of P(A | I) and P(B | A, I) and it can be show that the two terms are simply to be multiplied. This is called the ‘product rule’.

And because of the product rule, and because of requirement 2 above, the numbers we should assign to our certainties of “absolutely true” and “absolutely false” are 1 and zero, respectively.

Next, since we know that absolute certainty is 1, then the statement P(A, ~A) — that is, the probability that A is true or false — should be 1. And from that it follows that if P(A) + P(~A) = 1, then however much P(A) increases, P(~A) is equal to 1 minus P(A). This, the authors call the addition rule.

We may wish to assess how plausible it is that at least one of A or B is true…

P(A or B) = P(A)P(B | A) + P(A)P(~B | A) + P(~A)P(B | ~A)

Now, the first two terms on the right hand side can be expressed as:

P(A)P(B | A) + P(A)P(~B | A) = P(A)P(B or ~B | A) = P(A, B or ~B) = P(A)

And the third term, P(~A)P(B | ~A) as P(B, ~A) by the product rule.

Hence P(A or B) = P(A) + P(B, ~A).

This means that if we are interested in a proposition, C, which will be true if either (or both) A or B is true we can assess the probability of C from those of A and B. Thus, if the defendant is liable if either (or both) of two propositions were true then the probability that the defendant is liable is equal to the union of the probabilities of the two propositions. Courts appear to find this rule troublesome. The Supreme Court of Canada applied it correctly in Thatcher v The Queen but in New Zealand the Court of Appeal failed to apply it in R v Chingell and the High Court failed to apply it in Stratford v MOT.

3. If P(A | I)P(B | A, I) = P(B | I)P(A | B, I) (the product rule) then if we divide both sides of the equation by P(B | I) we get

P(B | I)P(A | B, I) / P(B | I) = P(A | I)P(B | A, I) / P(B | I)

The two P(B | I)’s on the left hand side cancel out and we have

P(A | B, I) = P(A | I)P(B | A, I) / P(B | I)

This is Bayes’ Theorem.

Cue Final Fantasy fanfare!

From here, the authors begin going over objections to probability and its utility in the law; objections that are borne of the misconceptions about probability and its utility outside the law. Most of these objectsions, in fact, are due to a frequentist view of probability; thinking of probability as a fundamental aspect of the object or event we’re looking at instead of a description of our uncertainty. As a matter of fact, that view should be put to rest by the authors’ demonstration of only using logic to derive Bayes Theorem. At no point did they use frequencies or any appeal to the nature of an object.

I did read one response to this article in the same publication in JSTOR, but it amounted to basically “This would be really hard to do” and not “this is invalid and/or it doesn’t follow from the rules of logic”.

 
Comments Off on Probability: The Logic of the Law

Posted by on June 16, 2015 in Bayes

 
 
NeuroLogica Blog

My ὑπομνήματα about religion

Slate Star Codex

SELF-RECOMMENDING!

Κέλσος

Matthew Ferguson Blogs

The Wandering Scientist

What a lovely world it is

NT Blog

My ὑπομνήματα about religion

PsyBlog

Understand your mind with the science of psychology -

Vridar

Musings on biblical studies, politics, religion, ethics, human nature, tidbits from science

Maximum Entropy

My ὑπομνήματα about religion

My ὑπομνήματα about religion

My ὑπομνήματα about religion

Skepticism, Properly Applied

Criticism is not uncivil

Say..

My ὑπομνήματα about religion

Research Digest

My ὑπομνήματα about religion

Disrupting Dinner Parties

Feminism is for everyone!

My ὑπομνήματα about religion

The New Oxonian

Religion and Culture for the Intellectually Impatient

The Musings of Thomas Verenna

A Biblioblog about imitation, the Biblical Narratives, and the figure of Jesus

The Syncretic Soubrette

Snarky musings from an everyday woman