Solomonoff Induction is the more accurate mathematical formulation of the epistemic heuristic Occam’s Razor. While my post on the subject is also mathematical, and is influenced by the conjunction fallacy, Occam’s Razor itself is just a vague rule of thumb. Solomonoff Induction is an attempt to make Occam’s Razor more rigorous.
The way I understand it, imagine writing a computer program that knew every single hypothesis imaginable that can be used to explain some data. The computer would select the hypothesis with the least amount of code — represented as a binary string — as its main prior. Or at least, as the prior with the highest probability.
The problem would be actually having a computer that had every hypothesis imaginable and one that compared each one via complexity of code (complexity of code = more bits used to represent it).
As an example, let’s say we have two pieces of code: One that represents Newton’s laws of motion and another that represents Einstein’s theory of relativity. At first glance, the program that computes Newton’s laws might seem simpler than Einstein’s (more people know Newton’s formulas than know Einstein’s formulas), but Newton’s code doesn’t account for weird things like Mercury’s orbit. So in actuality, Newton’s code would get bloated from all of the ad hoc code meant to account for things like Mercury’s orbit that can’t be computed using the baseline Newtownian formulas. And in the end, Newton’s program would be larger than Einstein’s due to that extra code; maybe Newton.dll would be 100 MB and Einstein.dll would be 80 MB.
Thus, by my understanding of Solomonoff Induction, a sufficiently advanced artificial intelligence would use Einstein.dll as its main prior when attempting to explain some gravitational phenomenon. At least until a smaller program is written that accounts for all of the things Einstein.dll accounts for plus things that it doesn’t (e.g., quantum gravity).
Now imagine comparing disparate hypotheses, like a computer that could model the atmosphere to predict when and where hurricanes will strike, and a computer program that attempted to model an angry god to predict where and when hurricanes strike. I’m willing to bet that the code used to model non-supernatural weather would be smaller than the one used to model a supernatural being’s motivations (I’m relatively certain that hurricanes forming isn’t more complex than the bio-chemical and social processes that produce anger in living beings, not to mention angry beings that have no physical body… though this is intuitively backwards; and it is backwards precisely because we think primarily in social ways). Or more pointedly, comparing the code used to model a supernatural Jesus coming back from the dead and the code used to model the story being invented by people who are the modern equivalent of people from a small, backwards village in Africa colonized by the British.
Well… this is all fine and dandy, but most people aren’t going to comprehend this intuitively, since there isn’t a reference to things that they already know about. But Solomonoff Induction makes sense to me (well, at least how I’ve laid it out above), because I’ve actually written code that uses the math behind special/general relativity and I can see how at first glance it would look more complicated than Newton’s laws of motion. But adding that extra code to account for things that can’t be explained via Newton’s laws would be bad programming practice. I would certainly prefer code that had a one-stop algorithm that computed things instead of an algorithm plus some hand-jammed code added to it because the original algorithm wasn’t robust enough.
So back to Occam’s Razor, except a more intuitive explanation of it. I think Occam’s Razor can be summed up using the English metaphor “a chain is only as strong as its weakest link”. Imagine choosing from a variety of chains to hook up a disabled car to the back of a pickup truck. Given that the chain is only as strong as its weakest link, you would want to pick the chain that has the lowest chance for having a weak link, thus a lower chance for the chain to break and the car goes careening off somewhere on the highway.
A short chain might have an extremely weak link in it, yet a longer chain might have a bunch of slightly weak links in it. Which chain do you use then? Whatever methodology you use to ensure that you pick the right chain would be Occam’s Razor; you could even go about removing the weak link altogether and go with the strongest part of the chain.