By now you’ve all heard the great news: Monkey has proved it is possible to beat Spider Solitaire at the 4-suit level! All he needs to do is build in-suit at every opportunity and never break a suited connector. The bad news is the win rate is extremely low. Of course if the Monkey wins \$3000 from the Eagle thanks to a single victory then we’re not complaining – but can we do even better?

First I wanna differentiate between a random hand and a custom hand. Recall that I discussed a number of you-tube videos where random players show off their awesome skills by beating a hand of 4-suit spider sans 85,78,68,79, but the unmentioned caveat was the hand was ridiculously easy.

If I pick a ridiculously easy hand then Monkey wins 6 games in 500, or 1.2 percent of the time. If I pick a random hand then Monkey wins a <sarcasm> whopping </sarcasm> zero games in 500.

It turned out my simulated annealing/stochastic hill climbing/stochastic gradient descent/something else thingy was a red herring. It worked the first time, but later experimentation with the random number generator showed that “always Level 3” was the best way to go in the long run. But at least it made for a good story lol 😊

## Guaranteed Minimum Evaluation Score

Good players (or avid readers of this blog or both!) will be aware of the concept of minimum guaranteed turn-overs. Basically we can calculate the minimum number of cards we are guaranteed to turn over even if the worst possible cards showed up. Of course we can do something similar for e.g. the number of suited connectors or some other “function” of the position. For instance, we might assign 10 points for flipping a card, 1 point for a suited connector and 100 points for removing a complete suit. And of course experienced players know that different programs will have different scoring systems. The important point is that we can (i) evaluate a given position (ii) compute a guaranteed minimum score even if the worst possible cards turned up.

This is an example of “looking ahead” since we are considering the consequences of a single move given the available information (i.e. face-up cards), which is obviously better than making a single move because it’s an in-suit build.

Here is a simple example: suppose the evaluation function was 10 points for exposing a card, 1 point for any suited connector, and no 1-point penalty for each move. Assuming best play in the start position below, what is your Guaranteed Minimum Evaluation Score (GMES) even if you turned over the worst possible cards?

I hoped you answered 54. Note that we are lucky in the sense we can build in-suit 4 times and off-suit once. Normally it’s the other way around.

A good first move is to build in-suit with the Jack-Ten of diamonds. The Ten is on one of six columns with equal-fewest cards and we also have spare off-suit Jack. But we can make many other moves and still come to the same GMES of 54. You could start with the Three-Two of Clubs or even the strange looking 9h-8d. But obviously the sensible option (for humans at least!) is Jd-0d.

The position you should be visualising is the one below, where a happy star represents any face-up card. We can easily check with a single glance that there are 5 turnovers and 4 in-suit builds, with a total evaluation score of 54.

If we want to train our Ninja Monkey to compute the best GMES we can simply iterate random sequences of moves from a given position. But we must be careful not to expose any new cards (e.g. if we shift the Ten of diamonds we will not flip the card underneath). Assuming we have enough iterations, Ninja Monkey will deduce e.g. shifting the Jack of diamonds onto the Club Queen is not a good move since it reduces the GMES from 54 to 53. Even worse would be 0d-9h, which reduces the GMES to 42. We lose one guaranteed turn-over and two in-suit builds with one bad move!

An example of a “good  enough sequence” is:  Jc-0d, 9d-8d, 3c-2d, 0d-98d, Jd-098d, Qc-Jc. Note that the first move was unnecessary but at least we got the correct GMES of 54.

Now Get Off Yer 65,82,83,79 And Start Procrastinating!

A lazy programmer would be satisfied with the above algorithm but of course it is possible to do even better than that. Note that there is no reason to commit ourselves to every move in the above sequence without examining the newly-turned over cards. After starting with Jc-0d, we can turn over a card and we always have the choice of 9d-8d, 3c-2d, 0d-98d, Jd-098d, Qc-Jc or the new card may give us an even better option. In effect we are procrastinating. In summary Ninja Monkey will find a sequence with the best possible MGES and execute it, but terminating the sequence prematurely if he turns over at least one new card.

How do we tell if it’s time to deal a new row of cards? Simples, as the Meerkat would say. If Ninja Monkey cannot increase the GMES then it is time to deal a new row of cards. If the stock is empty then either the game is won or the Monkey concedes defeat.

With the new algorithm, Monkey will beat an easy hand around half the time or a random hand 5 percent of the time. Random Hand statistics are shown in the graphic below

Let us say that a Spider hand is ImportNumbPieAsNP-Hard if Ninja Monkey cannot obtain one victory in 50 tries. In this case roughly half the hands are ImportNumbPieAsNP-Hard. At the other end of the spectrum there was also one very easy hand with 21 wins in 50 attempts. One implication of all this is if a Spider Solitaire server gives 40 ImportNumbPieAsNP-Hard games in a row then there’s a pretty good chance it’s biased.

Unfortunately all this comes at a price. Despite Ninja Monkey’s extremely fast metabolism, this takes a very long time to run.

# Spider Solitaire September 2019 experiment complete

My Spider Solitaire experiment for September is complete

Earlier I made a promise with a friend that I will play the 4-suit daily challenges on my i-phone. For each game I estimated the probability a monkey playing random moves will win at the 1-suit level. I said an inversion occurred if for any two games the latter was harder than the former (i.e. estimated win rate was less for the monkey).

I got a p-value of 0.0571 so the null hypothesis barely stood up (Nevertheless, I do not regret the experiment: my data for July pretty much forced me to hypothesize the program was biased because I did get p < 0.05, but only just).

Due to time constraints, I do not wish to further test my i-phone Spider Solitaire. I won’t be surprised if the random number generator is rigged, but it’s not worth my time to prove this. (If you are interested, I recommend you test more than one-month worth of games. Dates are sorted by day as primary key then month by secondary key so for instance March 17 < February 23 even though February occurs before March).

In the diagram below the downward trend is not obvious, but I suspect there were too many “near-perfect scores” at the beginning and not enough near the end. It is also interesting that the result was very close in the sense that “changing one bad result” after the fact would have been enough to push the p-value below 0.05. The decision to accept/reject the null hypothesis was too close to call until the very last day of this month.

Note: for my Spider Solitaire paper in Parabola, I tested a different Spider server and the downward trend was much more obvious.

That’s it for now, till next time 🙂

# The World’s Worst Math Teacher (another short story)

“Another one of life’s disappointments.”

“Marking assignments, the bane of every teacher,” growls Ms. Spider, as she angrily scrawls the word “DREADFUL” on a sheet of paper. “This goose just divided by zero.”

I’ve always enjoyed math, but I am all too aware that it represents a bugaboo for many ordinary folk. Not everybody can have higher than average IQ and not everybody can play piano and solve Rubik’s Cube at the same time. I agree we have to Make Math Great Again.

“I s’pose I could improve my presentations skills or learn Statistics 101,” admits Ms. Spider.

“I confess I never studied stats at uni,” I respond. “I had to pick it up all by myself.”

“Learning stats 101 sounds too much like work. Surely there must be a better way.”

“You could make the exams and homework easier,” I suggest.

“We can’t make it too easy,” responds Ms. Spider. “I’m sure the good students wouldn’t mind an extra challenge or two,”

I steal a glance at the goose’s assignment. Yes the goose is below average, but one of the assignment questions are badly worded. Another question has kilometres as a typo for metres, and I have to suppress a chuckle. I can see why some of Ms. Spider’s students call her the WWMT.

“Actually,” says Ms. Spider, “I was toying with a more radical solution”

“Which is?”

“We could give different exams to different students”

“What a revolutionary idea!” I exclaim. “Nobody has ever thought of this before!”

“From each according to his abilities … “

“From each according to his needs,” we chant in unison.

I am impressed: this Spider is clearly well-educated, not just in mathematics. She knows her clichés and sayings.

“Does that mean,” I ask, “if an awesome student correctly answers 40 assignment questions in a row then he will get a very difficult exam?”

“Exactly.”

“Hang on, what if an awesome student deliberately flunks the assignments …”

“Well … we could give the exam less weight than assignments,” the Spider responds somewhat nervously. “Then there is no advantage to tanking the assignments.”

“That’s Dandy!”

“For this to work,” continues Ms. Spider, “we have to come up with some way of measuring the difficulty of certain questions.”

“I understand,”

I mull over this for a while. We all know that students can be graded according to some chosen system. For instance, a math student can be Outstanding, Exceeds Expectations, Acceptable, Poor, Dreadful or Troll. But how can we grade certain questions?

The Spider writes two math questions on a sheet of paper:

“Which of these problems is harder?” asks Ms. Spider.

“I think both are equally easy. After all, I participated in the International Mathematical Olympiad many years ago.”

Somehow, I think that was not the answer Ms. Spider expected.

Behind us, a monkey, eagle, mouse, elephant, lion and jackal are enjoying some Texas Holdem. As usual, the monkey has squandered away all his chips early, and the Eagle is schooling the rest of the field, having accumulated more than half the chips in play. The Spider eyes them warily: clearly they should not be privy to our discussion.

“You see,” says Ms. Spider. “Sometimes I find it hard to judge the difficulty of a single question. For instance, I expect problem X to be easier than Y, but for some reason the reverse holds when I mark the assignments.”

I mull over Ms Spider’s words. I am not really in a position to judge, given I have never marked any student assignments.

“I have an idea,” says Ms. Spider. “Let’s draw a table”

“For simplicity,” says Ms. Spider. “Let’s assume each question is either marked correct or not correct, hence there are no partial marks. I use blank instead of 0 for ease of reading. Sam is an awesome student since she answered most questions correctly. Owen is a Stupid student because he only scored 2 out of 9. Each individual is represented by a single row.”

“Okay.”

“But there is no reason we can’t do the same with columns if you pardon the double negative. For instance, only six people solved problem 8 but nine solved problem 9. Therefore problem 9 is harder than  problem 8 …”

“So even if you don’t understand the questions themselves you can still say things like Debbie is better than Anna”

“Exactly,” replies Ms. Spider.

“With 18 students and 9 problems, you don’t have a lot of data”

It’s a stupid observation, I know – but I am only trying to buy time as I try to digest her ideas.

“Well, the same logic applies if we had 1800 students and 900 problems.”

“I think I understand,” I say. “It’s like some kind of Mechanical Turk. Previous students have tried these questions (and of course you don’t have to pay them to do these exams!), so you can work out which questions are easy or hard.”

“Wasn’t the Mechanical Turk some kind of fake chess-playing machine by Wolfgang von Kempelen? What a disgraceful idea! I would never try to cheat chess players like that”.

Okay, didn’t see that one coming. We need to agree on a definition of Mechanical Turk.

“Do you think your students will eventually find out their exam papers are different?”

“That shouldn’t be an issue,” says Ms. Spider, as she squirms in her seat. “If a poor student finds out, he has no reason to complain. If a good student finds out then deep down in his heart he already knows he is better than the poor student, so the exam result doesn’t matter.”

Somehow I think her logic is very, very, unsatisfactory. But I do know that many of the greatest insights precisely come from those who are willing to suggest ideas that sound utterly outrageous. For instance Rivest, Shamir and Adleman are your average computer scientists, but with a bit of luck they might one day become famous, known to every student of cryptography. So I should cut her some slack.

In fact, I am more than looking forward to the results of her revolutionary teaching methods. After all, I’m not the teacher and I don’t set the exams. I was especially careful not to suggest any drastic ideas of my own. If the radioactive 83,72,73,84 hits the fan and grows to fill the size of the entire house then I am more than happy to watch, knowing my 65,82,83,69 is fully covered.

Bring. It. On.

# The WHAT HOW and WHY of biased Spider Solitaire servers

In my paper on Spider Solitaire, I presented evidence that Opaque Solitaire(*) was biased: if a player won too many games the random number generator will favour difficult hands for future games. However I didn’t discuss why or how a software designer will do this. These are definitely valid questions, and will be the topic of this post.

(*) Opaque Solitaire is not the real name of the server.

Steve Brown, the author of Spider Solitaire Winning Strategies gives good reasons why most spider solitaire implementations will not rig deals if the player wins too much.

Steve Brown’s argument boils down to three main points.

• The WHAT: claims of bias on internet forums cannot be substantiated with any evidence.
• The HOW: It is not clear how to rig the random number generator without significant effort
• The WHY: Why would a software developer 80,74,83,83 off its users?

However, for the Opaque Solitaire program in question I can (partly) refute Steve’s argument.

## The WHAT

Obviously I have solid evidence Opaque Solitaire is biased: my paper is based on statistical hypothesis testing. I obtained a “magic number” that happened to be statistically significant at the alpha = 0.05 level. If you want more details, you know what to do 😊 The more interesting point of this post concerns the HOW. Suppose you wanted to design a Spider program that can adjust the difficulty level according to the player’s skill.

## The HOW

Let’s say you compiled a dataset of 100,000 hands (initial game states), and you wish to estimate the difficulty level of each hand. For simplicity assume difficulty is represented as an equity between 0 and 1. For instance, an equity of 0.58 means you are willing to bet 58 cents if a win pays \$1. Hence higher equity means the game is easier to win. Getting an expert human player to test 100,000 hands is clearly impractical. One can write a program to automatically assess the equity of each hand, but that runs into trouble too. For instance, the ranks of the first 10 cards showing is a poor indicator of equity since it is more than possible to overcome a poor start (or conversely a good start can sour).

But why not get the players themselves to estimate the equity for you? Consider the following hypothetical table

There are 18 players and 9 games. For each game and player, the result is either WIN (1), LOSS(0) or NOT PLAYED (blank) and I have color-coded results for ease of reading (any subliminal messages or easter eggs you find are purely coincidence!). Most of the games have been played so only a few data points are missing. For instance we can deduce that Sam is a pretty good player since he won 7 games out of 9 whereas Owen or Fiona is much worse. Similarly we can look at individual columns and say e.g. Games 3 and 8 are equally easy or hard since they have 6 wins, 11 losses and 1 not played. Game 6 or 9 is easier since more players beat it. We therefore can decide Game 9 is suitable for Owen because Owen is a poor player and wants to play easier hands. But we would not assign Game 5 to Isabella since that is relatively hard. One can think of this table as a Mechanical Turk, but the crowdworkers don’t find the tasks very onerous because for some reason they actually enjoy playing Spider Solitaire 😊

Note that implementing this does not require us to know much about Spider Solitaire strategy. The results of the table speak for themselves. For instance Debbie “dominates” Anna because whenever Anna won a particular hand, so did Debbie. But the reverse is false. Hence we know Debbie is a better player than Anna.

Obviously a small number of data points is not reliable, but it’s not hard to imagine a similar table for a much large number of players and games. Note that it is not necessary for every player to play every hand for this to work. Anyways, you get the gist. Assuming your Spider Solitaire  program is online and you are able to store cookies, you can keep tabs on which players are better than others and which hands are easy or hard. Hence you can assign the “correct difficulty” hands to different players. There might even be a Ph. D. or two in this 😊

I’m not saying this is the best method to rate the difficulty of individual spider hands, but this is one way to do it.

## The WHY

As for the WHY, my best guess is the developer(s) of Opaque Solitaire wish to challenge a good player and not bore him with too many easy hands. Unfortunately statistical testing can only say the data is fishy, and cannot answer why someone would “make the data fishy”, if you will.

I’ve seen forums where players accuse Microsoft Hearts of cheating. Some players claim that MS must compensate for the AI’s poor strategy by collusion. Others say MS does this because the software designers have good intentions but don’t understand the expectations of players. I agree Joe Bloggs probably knows nothing about Statistics and he is probably on tilt after losing three hands in a row. But when Jane Citizen accuses the same program of reneging or playing a card known to be held by someone else then you know you’ve got issues. I haven’t played much Microsoft Hearts but I’m siding with the chumps. For the same reasons, I would not lightly diss anyone who complains about rigged games, Spider Solitaire or otherwise (NOTE: the MS Hearts forums may be out of date and the software may have changed for the better, but you get the gist). Since my paper was successfully published, I believe the onus of proof should rest on software developers: they should make it easy to test if a program is biased or not.

## Summary

In summary, I believe most Spider programs are kosher, but not Opaque Solitaire. One word of warning: I do not have conclusive evidence that Opaque Solitaire deliberately rigs the cards because that’s the nature of Statistics – hypothesis testing alone does not equal conclusive proof: if your p-value is less than 0.05 then you might have obtained “unlucky data”, forcing you to jump to the wrong conclusion. But p < 0.05 can be used to justify further testing. But the point of my paper was to show that the bias exists and can be quantified using statistics.

# Monkey wins Man vs Wild

And it’s over! Spider GM played well to win 8 games, but monkey went one better with 9 victories. Most of the games were easy, but there were a few exceptions. Game 3 was just horrible: the final round was 833A5A8jk4 which is close to an instant loss (assuming no lucky suited connector such as a 3 landing on a 4 of the same suit). And that was not the only problem. Both human and monkey “agreed” on every game (i.e. both win or both lose) except game 4. Spider GM never found an empty column since the tactics were all wrong. Even so, 64% by the monkey was not the most convincing of victories. The conclusion is that the monkey’s win rate should have some correlation with an expert playing 4-suit solitaire. In other words, the effects of making random moves and playing at the one-suit level pretty much cancel each other out.

 Game SpiderGM Monkey 1 win 89% 2 win 64% 3 lose 26% 4 lose 64% 5 win 67% 6 win 72% 7 win 79% 8 win 55% 9 win 62% 10 loss 36% points 7 eight

Of course 10 trials is not a lot of data, and perhaps I need more games to say for sure. Also, lesser players may find similar results, but the threshold should be e.g. 30% not 50%.

Congrats to Monkey for a well-earned win and commiserations to Spider GM.

BTW If anyone knows how to change the word “eight” in the above spreadsheet into a number 8 without the unwanted percentage sign, please let me know 🙂

# Graphical Illustration of Random Walks

Once upon a time, Odysseus was trapped in a maze and wanted to find his way back to Ithaca. Hang on, is that Odysseus and the Sirens or Theseus and the Minotaur? Whatevs, I was never any good with Greek Mythology. Without loss of generality, assume it really is Odysseus and the Sirens.

Let us denote a maze as a finite set of nodes, where O represents Odysseus’ starting position, I represents Ithaca and S represents a Siren. Odysseus wins if he reaches Ithaca and loses if he reaches a Siren. We will also assume that a legal move consists of Odysseus moving from one node to another adjacent node in a downward direction. Any “terminal” node is marked either I or S. This implies that “draws” are impossible i.e. the game cannot continue indefinitely without a result. Example mazes are shown below:

Clearly, if Odysseus knows the entire maze layout then he is guaranteed to win (if at least one winning path exists) or lose (no such path exists). But if the maze layout is not known then Odyssues must guess which path to take. For instance, Odysseus might approach a dozen Sirens if they are singing Benjamin Britten’s Missa Brevis or get the 70-95-67-75 outta here if they are improvising karaoke with a Richard Clayderman song. Note that I didn’t claim that this was a good strategy.

The simplest strategy is a random-walk strategy. At any node we can compute the number of legal moves available to Odysseus and stipulate that each move occurs with equal probability. One can easily see that Odyssues should probably win in the left maze and probably lose in the right maze. Calculating the exact probabilities is left as the proverbial exercise for the reader.

In Spider Solitaire we can (in theory at least) enumerate the set of all legal game states reachable from a given start position and it will look like the Odysseus-Siren maze except it will obviously have a ridiculous number of nodes and links. We can say that a game is “easy” if it resembles the left maze and “hard” if it resembles the right. If we can compute (or estimate) exact probabilities of winning then we can have a quantitative measure of difficulty (probability of winning) rather than a categorical measure (easy/medium/hard). Obviously this will not be appropriate for human players since humans understand basic Spider Solitaire strategy (such as obtaining empty columns). But it might be appropriate for monkeys playing at the 1-suit level. At least we have a ball-park estimate of how easy or difficult a given hand should be.

The diagram below (left) is an example endgame where an expert player has removed six suits and only needs two more moves to win. For purposes of this example we will assume that (i) empty columns cannot be used and (ii) if the player fails to win the game in two moves then it’s an automatic loss. One can verify the Odysseus-Siren maze looks like the figure below (right). The probability of winning in two moves with random play is 50% (since only the first move is relevant and there are 2 good moves out of 4)

As an aside, one should note in practical play, game states in Spider Solitaire are not explicitly labelled “LOSE” since a player can resign even if there are one or more legal moves. But one can easily tweak things e.g. by imposing a move limit and saying that a game is automatically lost if we cannot win in N moves or less. Obviously in the above example it is impossible to reach an unwinnable state even if we tried. Hence a random move algorithm is practically certain to win if the move limit was (say) 50 instead of 2.

Let us suppose that Joe Bloggs plays a game of Spider Solitaire without undo and loses. He then proceeds to replay the same game but with unlimited 85,78,68,79. Joe Bloggs is able to win (and hence record the identity of every unseen card). JB can then estimate the difficulty of the hand using the random walk method described above. If the probability of winning is 80% then there is a fair chance JB misplayed when playing without undo. If the probability is only 20% then chances are JB was destined to lose even with expert play. This could give Joe Bloggs a better estimate of his playing strength than simple win-rate.

# Is Spider Solitaire Out To Get You?

To start things off, we will kick off with a simple number puzzle:

Traditional expert advice says if you are having a 76,79,83,73,78,71 streak then it’s probably time to take a break from the game and return when you are in a better frame of mind. Of course every man dog and millipede on the planet knows the best revenge is PROVING THE GAME IS RIGGED. But that’s easier said than done if you excuse the cliche.

To prove the game is rigged, people use one of two methods: the Scientific Method or the Internet Forum Method. The IFM is easier to understand, more convenient and more likely to produce the desired results, but today I prefer to discuss the SM.

I’ve seen a forum or two where card players think Microsoft Hearts is rigged. For those in the dark, Hearts is a trick-taking game – or more correctly a trick avoidance game where the aim is to avoid taking penalty cards. The rules will not be discussed in detail and are easily found online. Some players have accused MS programmers of rigging the cards if they win too often, to compensate for the poor playing strength of the three AI players (to be fair Microsoft isn’t the only one accused of cheating, and it is also possible things have moved on and the forums are seriously out of date).

In the Scientific Method we need statistical hypothesis testing. I must admit I have never studied statistics formally in my uni days, and I only began to truly appreciate what it all meant when I was playing Spider Solitaire – for research purposes.

Roughly speaking, to prove a game is biased we wanna design a test that is

(1) Pre-determined

(2) Well-defined

(3) Measurable

Because Hearts is better known than Spider, I will discuss the former game. Of course similar considerations would apply to Four-Suit spider.

Let us suppose that Joe Bloggs plays a number of hands of Hearts against three AI players. A “game” is a series of hands until one player gets 100 penalty points or more and the winner is the player with fewest points. A win occurs if and only if Joe Bloggs has the fewest penalty points after the game. If we ignore ties than Joe Bloggs should win 25% of the time. But since the AI players are weak, Joe Bloggs expects to win 50% of the time. However to simplify matters, we assume Joe Bloggs is only interested in individual hands. Joe gets the following results:

11010 01101 00010 00101

where 1 represents a “good” hand and 0 is a “bad” hand. Let us assume for the moment that we have a sound definition of what makes a hand good or bad. JB is careful to define good/bad hands (which are player-independent) rather than use wins or losses – to remove the possibility that Joe is falsifying his results, intentionally or otherwise (e.g. by playing on tilt after a 76,79,83,83).

“Hmmm” … thinks Joe. “I started with 60% good after 10 hands but the next 10 hands had only 30% good. Therefore something is wrong.”

But why did JB stop at 20 hands? Did he sign a contract saying he will play exactly 20 hands in front of three senior witnesses working at Microsoft, or did he feel like stopping now because that would conveniently corroborate his gut feel the program is rigged? I guess only Joe Bloggs would know the answer to that question.

If Joe had stopped at 10 hands then he wouldn’t have noticed any difference between the first and second halves of his results (they would both be 60% good games). If Joe had stopped after 50 hands then maybe his luck would even out and Joe would realise the software is not rigged after all. Or perhaps his luck would deteriorate even further. The important point is of course JB must have pre-determined the number of hands before playing, otherwise his results are invalid. Suppose we define a hand as bad if it is impossible to avoid taking Ukula(*), assuming all opponents gang up on JB and the hand is played double-dummy (for simplicity we ignore the possibility of shooting the moon). In theory this is well-defined but it is not trivial to determine if a given hand allows JB to avoid taking Ukula if played double-dummy. So we need a different definition of good/bad hands.

(*) Ukula is the name of a well-mannered Pitjantjatjara-speaking woman who I befriended many years ago. Other players will probably have a different and more colorful name for the card that is worth 13 penalty points.

Joe Bloggs knows that many of his friends like to complain about the 2,3,4,5 of hearts being in separate hands. Assuming the Hearts are not broken, it is mathematically impossible for the 5 of Hearts to win a trick unless (i) somebody is void or (ii) the 2,3,4,5 are in different hands. The chances of event (ii) happening is

52/52 * 39/51 * 26/50 * 13/49 = 0.1055

Let us ignore heart voids and say a hand is bad if the 2,3,4,5 of hearts are in different hands, else it is good.

But wait! In Hearts, most of the hands begin by players passing three unwanted cards to an opponent before play begins. Of course players will not pass random cards so to simplify things we only consider hands with no passing. This implies we need to play four times as many games to get the correct data.

Assuming that the random number generator is truly random we can say this experiment is “measurable” in the sense that we can obtain a probability of certain events. For instance, if we played 2 hands the probability that both are bad is 0.0111 and the probability that at least one is bad is 0.1999.

More complex calculations are possible. For instance, if 20 hands are played then the chances of exactly 5 bad hands is 0.0381 and the chances of at least 5 bad hands is 0.0525.

What we are actually measuring is called a p-value. Assuming the null hypothesis is true, the  probability that we would have observed “our actual result” (or sometimes “at least as bad as our actual result”) is the p-value. If this p-value is less than 0.05 then we say the result is statistically significant at the alpha = 0.05  level. If it was less than 0.01 then it would be statistically significant at the alpha = 0.01 level. Of course alpha must be pre-determined otherwise we are back to the first problem of a test that is not pre-determined.

Our final test would be something like the following:

One final warning: A p-value less than alpha does not imply conclusive evidence. For instance, we may have been very lucky/unlucky and the Random Number Generator gods gave us evidence the game is rigged when in fact it wasn’t rigged after all. But it may enable us to justify further testing – which may then lead to conclusive evidence.

As a chess analogy: suppose Wile. E. Cheetah beats three grandmasters in consecutive games. The organisers suspect cheating because there is too much correlation with the moves of top chess engines. They perform complex calculations in their heads and find p < 0.05. The organisers then force Wile. E. Cheetah to play the next few rounds without his “lucky EMCA headphones” (i.e. further testing). Sure enough W. E. C. learns the hard way that 1 e4 c6 2 d4 d5 3 Nc3 dxe4 4 Nxe4 Nd7 5 Bc4 Ngf6 6 Ng5 e6 7 Qe2 h6?? is not the main line in the Caro-Kann and confesses to everything.

Yes, incidents like these have happened in top-level chess. Googling examples is left as an exercise for the reader.

So there you have it. To derive the final test, we needed to have some knowledge of the game itself (Hearts or Spider) and some basic statistical theory (e.g. hypothesis testing), and we needed to take some care to make sure our experiment is sound. After all it’s hard to prove something is rigged if your experiment is itself rigged!

DISCLAIMER: I have not studied stats formally at uni, so I won’t be surprised if someone can explain hypothesis testing much better than I did here. If you aced statistics at University or High School and found reading this was way beneath your dignity then congrats, perhaps you should start writing your own blog and I should be the one learning from you 😊

Congrats on reaching the end of this post and fully comprehending every word! Here is your reward:

# Evaluating a Start Position

We now consider the following question: How can we evaluate a starting position? That is, if you are given an initial game state with 10 exposed cards how do we determine if the chances of winning are good, average or poor? Can we quantify our winning chances as a percentage (e.g. 58%)?

NOTE: evaluating a start position is useful since most Spider Solitaire implementations allow the player to abandon a game without counting it as a loss. But if you are serious about improving your game, I strongly recommend you never abandon games with a poor initial state.

A first thought may be to look for “features” of a game state. For instance suppose we are watching some top quality chess at an unhealthy time of the day. We might notice that

• White has an extra pawn
• The position is an endgame: both sides want to activate their king without fear of suddenly being mated.
• Black’s rook and king are more active than their opposite numbers
• Both sides have vulnerable pawns

Bear in mind we are only identifying individual features at this early stage. Eventually we may wish to formulate an overall assessment by combining these features somehow, but that comes later.

QUESTION: What are plausible features to use in an opening game state in Spider Solitaire?

Avid readers of this blog (yes you!) would immediately identify “guaranteed turnovers” as a possible feature. In the above diagram you should be able to quickly identify 5 turnovers. Of course every man, dog and millipede on the planet knows that building in-suit is even more desirable. In this case we have Q-J in spades and 2-3 in clubs. Therefore we have 2 guaranteed suited turnovers (and hence 3 off-suit turnovers).

Finally we can look at rank multiplicity. All players know that having too much of one rank can be a problem, especially when the adjacent rank is in short supply. You don’t need a Ph. D. in economics to work out things are less than ideal when the Spider Solitaire gods have supplied five Jacks on the opening deal and there is little demand for them. For simplicity let us define the rank multiplicity as the count of the most frequent rank. For instance the above diagram has a rank multiplicity of 2 since we have two Threes/Deuces and no rank appears more than twice. In summary:

• We have 5 guaranteed turnovers
• We have 2 guaranteed suited turnovers
• The rank multiplicity is 2.

There may be other features to consider, but we’ll keep things simple for now.

Are these values good, bad, or average? It turns out one can use simulation to answer this question. For instance if I had nothing better to do, I could play 10 million games of Spider and compute the number of guaranteed turnovers should be 3.97 on average.

Of course the lazy solution is to write a computer program to do the simulation for me. The program can simply deal 10 cards, do the relevant calculations and then abandon the game. An even lazier solution is to copy the results from Steve Brown’s excellent book Spider Solitaire Winning Strategies. He got the following results:

Looking at these graphs, I would immediately dismiss rank multiplicity as a useful feature (the entry for 5 is non-zero but is too small to be visible). After all more than 90% of games will have a value of 2 or 3! It is true that one can tweak rank multiplicity somehow (e.g. giving more weight to Aces and Kings which are the bugbears of most players), but I wanted to keep things simple for the time being. The important point is these quantities are easily obtained via simulation.

Suited turnovers is nice, but I think it’s more important to have many turnovers at the start of the game. In other words, quantity is more important than quality. In the above example, we have 5 guaranteed turnovers and 2 suited, both of which are above average. Hence if given a choice, I would take this position over a random position.

If you are a beginner, I would estimate that: