So I’m starting grad school for computer science in about a month. This is on top of having a normal 9 – 5 (well, 8:30 – 6) job. Meaning that in a little while I’ll probably have less time for blogging; at least, blogging anything with more than some passing thoughts and/or cool articles I find about religion.
Since I’m continuing my compsci schooling towards an M.S. I thought I’d try brushing up on my programming besides the meager tasks that I do for work (right now I’m more of a “software engineer”, meaning I mainly concentrate on the process aspect of software development with some coding on the side if required) so I’m writing a Java app that is — you guessed it — computing Bayes Theorem! I’m going to add it as an executable to my static website where I’m going to be doing some other web dev for a page dedicated to how probability theory is the logic of science. The page isn’t up yet, but it’ll get there eventually.
It was actually really simple to write the backend code for BT, but one neat little thing I discovered while coding for it, ironing out all of the nooks and crannies of BT, was combining likelihood ratios/Bayes factors. Here it is, better described over at Overcoming Bias:
You think A is 80% likely; my initial impression is that it’s 60% likely. After you and I talk, maybe we both should think 70%. “Average your starting beliefs”, or perhaps “do a weighted average, weighted by expertise” is a common heuristic.
But sometimes, not only is the best combination not the average, it’s more extreme than either original belief.
Let’s say Jane and James are trying to determine whether a particular coin is fair. They both think there’s an 80% chance the coin is fair. They also know that if the coin is unfair, it is the sort that comes up heads 75% of the time.
Jane flips the coin five times, performs a perfect Bayesian update, and concludes there’s a 65% chance the coin is unfair. James flips the coin five times, performs a perfect Bayesian update, and concludes there’s a 39% chance the coin is unfair. The averaging heuristic would suggest that the correct answer is between 65% and 39%. But a perfect Bayesian, hearing both Jane’s and James’s estimates – knowing their priors, and deducing what evidence they must have seen – would infer that the coin was 83% likely to be unfair.
That is because a perfect Bayesian would be combining their data, not simply taking an average of their posteriors. Which makes more sense if you think about it. If one group of people concluded that the world was round and another group of people thought the world was flat, it wouldn’t make sense to take an average of the two conclusions and say that the world must be shaped like a calzone. You would want the data that they used to arrive at their conclusions and update on that. Taking an average of the two is a social solution — meant to save people’s egos — not one that’s actually attempting to get at a more accurate model of the world.
It seems like combining likelihood ratios is actually pretty straightforward. Think about the conjunction fallacy. The probability of X% combined with the probability of Y% isn’t X% + Y%, or the average of X% and Y%, but rather X% * Y%. So combining likelihood ratios follows the same logic.
Again, from OB:
James, to end up with a 39% posterior on the coin being heads-weighted, must have seen four heads and one tail:
P(four heads and one tail| heads-weighted) = (0.75^4 * 0.25^1) = 0.079. P(four heads and one tail | fair) = 0.031. P(heads-weighted | five heads) = (0.2 * 0.079)/(0.2 * 0.079 + 0.8 * 0.031) = 0.39, which is the posterior belief James reports.
Jane must similarly have seen five heads and zero tails.
Plugging the total nine heads and one tail into Bayes’ theorem:
P(heads-weighted | nine heads and a tail) = ( 0.2 * (0.75^9 * 0.25^1) ) / ( 0.2 * (0.75^9 * 0.25^1) + 0.8 * (0.5^9 * 0.5^1) ) = 0.83, giving us a posterior belief of 83% that the coin is heads-weighted.
So what I call the success rate — P(E | H) — is represented here as P(four heads and one tail | heads-weighted). P(E | ~H), the alternative hypothesis, is P(four heads and one tail | fair). P(E | H) / P(E | ~H) = 0.079 / 0.031 = 2.531 for James’ likelihood ratio. Jane’s numbers are P(E | H) / P(E | ~H) = 0.237 / 0.031 = 7.593. The combined likelihood ratio is 19.221, which is how much evidence is needed to move the prior from 20% to 83%; that likelihood ratio also happens to be the other two likelihoods multiplied together, 2.531 * 7.593.
Something like this is very handy if you have two people with disparate priors. Two people can have different priors, but as long as you’re updating on the same evidence, the priors will eventually converge. Combining likelihood ratios ensures that both parties are updating on the same evidence, since the likelihood ratio is what is determining how much your prior moves.