One for the Math Geeks (alternative version)

Simon: Seven of Clubs

Julie: Four of Diamonds

Julie: Jack of Clubs

Simon: Three of Diamonds

Webb: That’s Spiderwang!

Audience: (canned laughter)

Julie sighed audibly. This was the 100th game she had lost in a row. On 57 occasions Simon called a card and it was Spiderwang. On the other 43 occasions, Julie called a losing card and the host declared it was Not Spiderwang. Yes, Julie could rot13(xvpx Fvzba’f nefr) at regular Spider Solitaire and a game with more dependency on luck was required to prevent things becoming boring. But a game show with unexplained rules was taking things too far. Julie had to reluctantly consider the possibility that Simon might be (gasp!) cheating with the cooperation of the host.

Julie discussed her predicament with her trusted group of Very Smart Friends. They eventually agreed on a plan. On the next few episodes her VSF would record the footage, take meticulous notes and work out the pattern. Once the pattern was figured out Julie could at least hope to fight on equal terms.

After burning much midnight oil, a pattern was found. Webb would declare Spiderwang whenever every card of a single suit was named at least once. Webb would declare Not Spiderwang if any single card was named for the third time.

Eventually some Math Ph. D.’s decided to join in the fun. They worked out that if nobody named the same card thrice and the probability of a new card being named was double the probability of a card already named once then Spiderwang should occur after 66.5 cards on average. The median was 67.0. The theoretical minimum was obviously 13 cards, and if e.g. no Kings were named within the first 96 cards we would achieve the maximum of 97 cards. On the last hand, Simon achieved Spiderwang after 60 cards, which corresponded to the 23rd percentile.

On the next day Julie is pleased when the host announces that Simon is going first. Her group of Very Smart Friends have done the math and with best play she would win her first game after 90 cards.

Julie takes out some pen and paper for round 1. Curiously neither Simon or Webb object to her taking copious notes during the game. Then again, nothing on this game show made much sense to her anyway.

Julie: Four of Diamonds

Simon: Queen of Clubs

Rot13(jung gur shpx) – why would Simon throw away round 1 just like that? Of course, it takes less than three nano-seconds for the host to confirm her worst fears.

Webb: That’s Wangerspy!

Audience: (canned laughter)

One for the Math Geeks

After the world’s most intense game of Spider SOLITAIRE played by Spider GM, two International Masters and three coloured blobs from Among Us, followed by some rather detailed post-mortem analysis I think now is a good time for some light relief with some rot13(zngurzngvpny znfgheongvba). More specifically, we answer the following question: if cards are dealt one at a time from a perfectly shuffled deck, how long do we have to wait until a full suit appears?

The relevance to Spider Solitaire players should be pretty clear: we have absolutely no control over what cards turns up. If, for example, there are no Queens in the first 20 cards then the chances of the next turnover being a Queen are exactly 8 in 84 – no amount of skilled or unskilled play can fight the basic laws of probability. All you can do is hope to mitigate the effects of not getting any Queens in the first 20 cards with skilful play. Similar considerations apply if you’re in the middle-game desperately waiting on the Three of Diamonds to complete a full suit.

If you insist on shifting the odds, we can commit the cardinal sin of Spider Solitaire by using rot13(haqb). Alternatively, we can somehow fail to obtain any turnovers in the tableau for the remainder of the game – in either case, the mere thought is too horrible to contemplate.

There is one minor complication: when dealing from the stock 10 cards appear simultaneously (which I mentioned more than once during the post-mortem of the Among-Us game). Here we will assume they appear sequentially. In the example below, a row of 10 cards has just been dealt. There are 61 cards visible, and by the time the Three of Diamonds is dealt we see every card in Diamonds. In other words, we needed 60 cards to obtain a full suit.

Clearly, the minimum number of cards required is 13 and the maximum is 96 (if we get e.g. no Kings until the last eight cards). During the Among Us game I thought we were fairly lucky to see a full suit after 60 cards. From experience I would expect we need more cards visible before the chances of seeing a full suit are 50%. However, I never bothered to quantify this. Fortunately, it is relatively straightforward to run a computer simulation. We might simulate X hands and find e.g. we need on average Y cards instead of 60 before there is an even chance of finding a complete suit. If Y cards corresponds to the Zth percentile then we would know where we stand in terms of how lucky we were.

In this case, I ran 5000 iterations and found 60 or lower cards were needed to achieve a complete suit 1176 times. This corresponds to the 22nd or 23rd percentile. The mean and median is closer to 66.5 and 67.0 respectively.

As I suspected, we did get lucky with the Diamond suit, but nowhere near enough to justify improvising a rap song with the phrase “statistically significant” appearing once every ten seconds. To put this in Dungeons & Dragons terms, if this game of Spider Solitaire were a character, then we would have rolled better-than-average initial stats for complete-suits, but nobody in their right mind would accuse the dice of being rigged. Presumably we would have lousy initial stats for other abilities such as turnovers or shortages of particular ranks etc (I did have some misgivings about our winning chances during this hand therefore something had to be lousy), but unfortunately there is only so far one can go with the Dungeons & Dragons analogy.

Without the ability to remove Diamonds, I expect we would have been in more trouble than Ian Nepomniachtchi’s Bishop getting harassed by both Black rooks after capturing a poisoned pawn in Game 9 of the World Chess Championship since no other suit is close to completion. I’m not sure what’s the best way to test this conjecture via simulation, so if you have any clever ideas, please leave a comment 😊

By now you’ve all heard the great news: Monkey has proved it is possible to beat Spider Solitaire at the 4-suit level! All he needs to do is build in-suit at every opportunity and never break a suited connector. The bad news is the win rate is extremely low. Of course if the Monkey wins \$3000 from the Eagle thanks to a single victory then we’re not complaining – but can we do even better?

First I wanna differentiate between a random hand and a custom hand. Recall that I discussed a number of you-tube videos where random players show off their awesome skills by beating a hand of 4-suit spider sans 85,78,68,79, but the unmentioned caveat was the hand was ridiculously easy.

If I pick a ridiculously easy hand then Monkey wins 6 games in 500, or 1.2 percent of the time. If I pick a random hand then Monkey wins a <sarcasm> whopping </sarcasm> zero games in 500.

It turned out my simulated annealing/stochastic hill climbing/stochastic gradient descent/something else thingy was a red herring. It worked the first time, but later experimentation with the random number generator showed that “always Level 3” was the best way to go in the long run. But at least it made for a good story lol 😊

Guaranteed Minimum Evaluation Score

Good players (or avid readers of this blog or both!) will be aware of the concept of minimum guaranteed turn-overs. Basically we can calculate the minimum number of cards we are guaranteed to turn over even if the worst possible cards showed up. Of course we can do something similar for e.g. the number of suited connectors or some other “function” of the position. For instance, we might assign 10 points for flipping a card, 1 point for a suited connector and 100 points for removing a complete suit. And of course experienced players know that different programs will have different scoring systems. The important point is that we can (i) evaluate a given position (ii) compute a guaranteed minimum score even if the worst possible cards turned up.

This is an example of “looking ahead” since we are considering the consequences of a single move given the available information (i.e. face-up cards), which is obviously better than making a single move because it’s an in-suit build.

Here is a simple example: suppose the evaluation function was 10 points for exposing a card, 1 point for any suited connector, and no 1-point penalty for each move. Assuming best play in the start position below, what is your Guaranteed Minimum Evaluation Score (GMES) even if you turned over the worst possible cards?

I hoped you answered 54. Note that we are lucky in the sense we can build in-suit 4 times and off-suit once. Normally it’s the other way around.

A good first move is to build in-suit with the Jack-Ten of diamonds. The Ten is on one of six columns with equal-fewest cards and we also have spare off-suit Jack. But we can make many other moves and still come to the same GMES of 54. You could start with the Three-Two of Clubs or even the strange looking 9h-8d. But obviously the sensible option (for humans at least!) is Jd-0d.

The position you should be visualising is the one below, where a happy star represents any face-up card. We can easily check with a single glance that there are 5 turnovers and 4 in-suit builds, with a total evaluation score of 54.

If we want to train our Ninja Monkey to compute the best GMES we can simply iterate random sequences of moves from a given position. But we must be careful not to expose any new cards (e.g. if we shift the Ten of diamonds we will not flip the card underneath). Assuming we have enough iterations, Ninja Monkey will deduce e.g. shifting the Jack of diamonds onto the Club Queen is not a good move since it reduces the GMES from 54 to 53. Even worse would be 0d-9h, which reduces the GMES to 42. We lose one guaranteed turn-over and two in-suit builds with one bad move!

An example of a “good  enough sequence” is:  Jc-0d, 9d-8d, 3c-2d, 0d-98d, Jd-098d, Qc-Jc. Note that the first move was unnecessary but at least we got the correct GMES of 54.

Now Get Off Yer 65,82,83,79 And Start Procrastinating!

A lazy programmer would be satisfied with the above algorithm but of course it is possible to do even better than that. Note that there is no reason to commit ourselves to every move in the above sequence without examining the newly-turned over cards. After starting with Jc-0d, we can turn over a card and we always have the choice of 9d-8d, 3c-2d, 0d-98d, Jd-098d, Qc-Jc or the new card may give us an even better option. In effect we are procrastinating. In summary Ninja Monkey will find a sequence with the best possible MGES and execute it, but terminating the sequence prematurely if he turns over at least one new card.

How do we tell if it’s time to deal a new row of cards? Simples, as the Meerkat would say. If Ninja Monkey cannot increase the GMES then it is time to deal a new row of cards. If the stock is empty then either the game is won or the Monkey concedes defeat.

With the new algorithm, Monkey will beat an easy hand around half the time or a random hand 5 percent of the time. Random Hand statistics are shown in the graphic below

Let us say that a Spider hand is ImportNumbPieAsNP-Hard if Ninja Monkey cannot obtain one victory in 50 tries. In this case roughly half the hands are ImportNumbPieAsNP-Hard. At the other end of the spectrum there was also one very easy hand with 21 wins in 50 attempts. One implication of all this is if a Spider Solitaire server gives 40 ImportNumbPieAsNP-Hard games in a row then there’s a pretty good chance it’s biased.

Unfortunately all this comes at a price. Despite Ninja Monkey’s extremely fast metabolism, this takes a very long time to run.

Bring it on

One of my friends has challenged me to repeat my Ninja Monkey experiment for September 2019. Those who follow my Facebook page will know the good goss, but here it is in a nutshell:

• I have Spider Solitaire on my i-phone.
• This software comes with “daily challenges” where a fresh deal is provided for every day, akin to a random number generator – thus if you enter the same “seed” of e.g. 15 August 2019 fifty times you will play the same hand fifty times etc. NB: Players are restricted to any day between 1st of previous month and today.
• However, I believe the RNG is not uniform and if we choose any month then games corresponding to later days are harder than games corresponding to earlier days
• This can be done via statistics: choose 2 days of the same month at random and let Ninja Monkey compute the “equity” of games corresponding to both days. The probability that the later game is harder than the earlier game should be 50%,
• but my actual probability was significantly higher (p-value < 05) when I tested the games in July 2019.

I am now going to repeat the same experiment for September, and I am laying my friend 4:1 odds in his favour that I will not get the same result as my July data.  My friend told me that twenty 66,85,76,76,83,72,73,84 tests should give one significant result (as if I didn’t already know!). But I was definitely not cherry picking. Even if I am wrong, I will have no regrets. To put it in Poker terms, my experimental results force me to call all the way to the river and if I am beat … then I am beat. We are using happy stars as currency, so it’s not as though my credit card details are at stake.

Bring. It. On.

Monkey wins Man vs Wild

And it’s over! Spider GM played well to win 8 games, but monkey went one better with 9 victories. Most of the games were easy, but there were a few exceptions. Game 3 was just horrible: the final round was 833A5A8jk4 which is close to an instant loss (assuming no lucky suited connector such as a 3 landing on a 4 of the same suit). And that was not the only problem. Both human and monkey “agreed” on every game (i.e. both win or both lose) except game 4. Spider GM never found an empty column since the tactics were all wrong. Even so, 64% by the monkey was not the most convincing of victories. The conclusion is that the monkey’s win rate should have some correlation with an expert playing 4-suit solitaire. In other words, the effects of making random moves and playing at the one-suit level pretty much cancel each other out.

 Game SpiderGM Monkey 1 win 89% 2 win 64% 3 lose 26% 4 lose 64% 5 win 67% 6 win 72% 7 win 79% 8 win 55% 9 win 62% 10 loss 36% points 7 eight

Of course 10 trials is not a lot of data, and perhaps I need more games to say for sure. Also, lesser players may find similar results, but the threshold should be e.g. 30% not 50%.

Congrats to Monkey for a well-earned win and commiserations to Spider GM.

BTW If anyone knows how to change the word “eight” in the above spreadsheet into a number 8 without the unwanted percentage sign, please let me know 🙂

Blue Screen of Death: part 3 of trilogy

Here we are in the dreaded Blue Screen Of Death. It is reputed to be the hardest version of the game, and even the Spider GM loses over half the time. And it’s got an ominous picture of a Spider web in the background. It ain’t called the BSOD for nothing if you excuse the terrible cliché. And we’re not aiming just to escape: we’re here to destroy the thing.

I am in the ninth column. Scanning the tableau I find only one other card of adjacent rank: the Jack of Spades. Perhaps he knows something I don’t (not likely since my rank is higher but you never know). I know from sad experience that the occasional victory is not enough to destroy the BSOD.

“Well, we do have The Prophecy,” says the Jack of Spades.

“A Prophecy?!?!?” I ask. “I didn’t know we had one. How do you know-”

“I had a strange dream. After we exited the White Screen in Part Two of this trilogy I heard voices.”

“What did they say?” I ask impatiently.

It is clear all the other cards are paying full attention. Otherwise the Four of Hearts would have already moved onto the Five of the same suit by now.

Now if you are to break the Spider’s curse,” the Jack of Spades intones solemnly, “the value must be five percent or worse.”

“But – but that sounds lame.”

“True. But it’s the only prophecy we’ve got,” replies the Jack. “Or more precisely, it was the only thing I heard that vaguely sounds like some kind of prophecy.”

“Humph.”

I repeat the prophecy in my head several times, trying to make sense of it but to no avail.

“Hang on,” I say. “This prophecy is written in the iambic pentameter.”

“You’re right,” says the Jack of Spades. “I wonder if that has any signif-“

“But the Ninja Monkey has reproduced the entire works of Shakespeare, so surely he knows a thing or three about iambic pentameter.”

The monkey enters, as if on cue.

“Good point,” says the Jack of Spades.  “Maybe he can help us decipher Monkey the want prophecy to play! Monkey want to play! Play play play play play!”.

Ninja Monkey has taken maximum weirdification to the next level. This time he has brought along a friend named Ninja Mouse. Ninja Mouse controls a small arrow on the screen using his telekinetic powers, while the monkey caresses the mouse’s fur (presumably to keep him warm in this miserable cold weather). Spider GM clicks his fingers and the fun immediately begins.

And I must say, the monkey and mouse are 75,73,67,75,75,78,71 65,82,83,69. They have all of us sorted into the foundations in less than two seconds. They start another hand and again we’re sitting pretty in the foundations in less than two seconds.

“It’s all too good to be true.”

Now it’s my turn to hear voices – but to be fair, I should probably pay more attention to whatever’s inside the back of my head. Then I notice with horror the Monkey and Mouse aren’t playing properly. We’re neatly arranged in complete runs from Ace to King, but in different suits. They are pretending the game is one-suited. Clearly the Spider GM has given them the wrong instructions. And as luck would have it, the Spider GM is nowhere to be found.

“Ah, there it is, the 70,85,67,75,69,78,73,78,71.”

Hang on, I got distracted. I was supposed to be figuring out something. What was it again? Oh that’s right: the value must be five percent or worse. But what is this value? Perhaps every time we win a game (preferably Four-Suited sans 85,78,68,79) the Spider’s value decreases. The value could be a company’s market capitalisation. If that ever goes below 5 percent of GDP then we’re quids in … okay I’m clutching at straws if you pardon the cliché, but it’s hard to find a better explanation when the arrow is moving us from one column to the other at a million miles an hour (another terrible cliché I know). What I do know is that Monkey’s strategy is not improving, even with the Mouse’s help. If anything, it seems to be getting worse.

Then I pick up on something unusual. I started 100 consecutive games in exactly the same position: top of column Four after the third round of cards is dealt from the stock. It occurs to me that Spider GM has done this on purpose, and has given the monkey some special instructions. I’m not exactly in the mood to try to work out what the Grand Master is aiming for. At least the Monkey and Mouse play with their usual dexterity and I won’t have to wait too long until this saga is over. They complete 4000 games in a mere 2 hours.

Uh oh.  This can’t be good.

“Monkey is sad. Monkey is losing”.

Oh well, at least Monkey has fixed his bad habit of talking too fast and interrupting everybody else before they finish their sentence.

“Please don’t cry,” I say. “I want to help you – I can teach you how to play well at Four-Suit solitaire. I beg you, please slow down and only move a card when I tell you to.”

Spider GM kneels down and gives him a hug. Awwww … Unfortunately it looks like I won’t be teaching the Monkey how to kick 65,82,83,69 at Four Suit Spider any time soon. Oh well, so much for becoming the hero and saving the world.

“What happened?” I ask the Grand Master.

“Something is wrong with the Blue Screen of Death,” he replies. “The longer you play with it … the more difficult it is to win.”

I sense the Grand Master is also distraught, as he struggles to choose his words carefully.

“What do you mean?” I ask.

“Let me explain. Ninja Monkey played 40 hands and pretended the game was 1-suit, not 4-suit. He repeated each hand 100 times-“

“I gathered that,” I say. “I wasn’t born three days before yesterday.”

“So on each hand,” continues Spider GM, “we can estimate the chances of winning to be such-and-such percent.”

I nod in agreement.

“Would you care to pick two numbers between 1 and 40?” asks the GM.

“3 and 15”.

“Monkey won Game Three 88% of the time and Game Fifteen 45% of the time. That’s an inversion because Game Three is harder than game Fifteen. Pick again.”

“Okay, 5 and 32”

“That’s another inversion. Monkey won game Five 47% of the time and game Thirty-Two 11% of the time.”

“Hang on,” I say. “If the games are sorted by increasing winning percentage then there would be no inversions.”

“Correct, and if they are sorted in reverse order then there would be 780 inversions”.

I do a quick calculation: 40 * 39 / 2 = 780.

“Also correct,” I reply. “On average there should be 390 inversions.”

“Ninja Monkey says he got 468 inversions. That’s a lot bigger than 390”

Maybe that was bad luck,” I reply. “Besides, how do we know that a difference of 78 is reasonable or not?”

“But this is where statistics comes in. We can show that if the game were not biased then 468 or more inversions in 40 hands would occur less than once per 20 trials.”

I’m starting to get lost. At least the GM is talking English and is not saying something stupid like “Import Numb Pie As N. P.”

“Can you explain in more detail?” I ask.

Spider GM clicks his fingers. Ninja Monkey gets a large sheet of paper and cuts out 40 rectangles, numbered from 1 to 40. He then shuffles and deals them in a row. He counts 370 inversions and writes the number 370 on another sheet of paper. He then shuffles again and deals a new permutation of numbers 1 to 40. This time he gets 423 inversions, higher than the expected value of 390. He writes the number 423 next to 370. He rinses and repeats for one hundred thousand trials. Then he draws a pretty graph. Needless to say, all this in less than 25 seconds after Spider GM clicked his fingers. Only for 3558 trials is the number of inversions 468 or greater.

“You see,” continues Spider GM, “the null hypothesis says each hand occurs with equal probability. Assuming this is true, the chances of getting 468 or more inversions is 0.036. In statistical language we call 0.036 the p-value, which happens to be less than 0.05. Therefore 468 or more inversions is significant at the alpha equals 0.05 level.”

“What does that mean in layman’s terms?” I ask.

“It means we have evidence that The Game Is Rigged,” replies the Spider GM, adding a triumphant emphasis on the last four words.

“Hang on, what’s so special about 0.05? Why not 0.01 or some other number?”

Spider GM stands up and points to the exit. As Monkey and Mouse immediately scurry away stage left, the Grand Master clenches his fist and glares at the Blue Screen Of Death. I have barely enough time to convert 0.05 into a percentage and register that the prophecy has just been fulfilled.

Spider GM is a tower of strength as he slams his right fist into the computer screen.

He is a demigod who can do no wrong, and who gives a flying 70,85,67,75 if his right hand is bleeding profusely? Everything turns into a blur and I feel like we are being time-warped to somewhere new (why does this happen in every part of the trilogy?). But at least the curse is broken, so hopefully this can only be for the better.

Fast backward to 1965. No Limit Texas Holdem is the hottest game in the animal jungle, and personal computers and mobile devices haven’t been invented yet. A group of monkeys cheerfully shove large stacks of chips across the table without apparent rhyme, iambic pentameter, or reason. I guess I can try to reverse engineer the rules of Texas Holdem as the dealer turns over each card, but after listening to the Grand Master explaining Statistics 101 I am mentally exhausted. I surrender to their unbridled joy and delightful unconcern as a chimpanzee bangs away at the piano. The music sounds absolutely atrocious but nobody cares when everybody has a jolly good time. The Spider GM has no trouble maintaining law and order during poker nights. Nobody throws a tantrum after a bad beat and all the monkeys handle the cards and chips with the utmost care. Everybody lives happily ever after – except the guy(s) who designed the Blue Screen Of Death.

THE END

Yes, this really is a true story. The monkey correctly reproducing all of Shakespeare’s works, the mouse’s telekinetic powers and cards moving by themselves and speaking perfect English are all true. Even the part where I smash the computer screen with my right fist is true. And so is all the math and statistics. Okay, so I lied about time travelling back to 1965 but everything else is kosher.

If you still don’t believe any of this please check out my paper on Spider Solitaire.

I will however admit to never winning anything in a short story competition in my entire life. Not even an Honorable Mention. So if anyone can give me tips on improving my writing then please leave a comment or three 🙂

Is Spider Solitaire Out To Get You?

To start things off, we will kick off with a simple number puzzle:

Traditional expert advice says if you are having a 76,79,83,73,78,71 streak then it’s probably time to take a break from the game and return when you are in a better frame of mind. Of course every man dog and millipede on the planet knows the best revenge is PROVING THE GAME IS RIGGED. But that’s easier said than done if you excuse the cliche.

To prove the game is rigged, people use one of two methods: the Scientific Method or the Internet Forum Method. The IFM is easier to understand, more convenient and more likely to produce the desired results, but today I prefer to discuss the SM.

I’ve seen a forum or two where card players think Microsoft Hearts is rigged. For those in the dark, Hearts is a trick-taking game – or more correctly a trick avoidance game where the aim is to avoid taking penalty cards. The rules will not be discussed in detail and are easily found online. Some players have accused MS programmers of rigging the cards if they win too often, to compensate for the poor playing strength of the three AI players (to be fair Microsoft isn’t the only one accused of cheating, and it is also possible things have moved on and the forums are seriously out of date).

In the Scientific Method we need statistical hypothesis testing. I must admit I have never studied statistics formally in my uni days, and I only began to truly appreciate what it all meant when I was playing Spider Solitaire – for research purposes.

Roughly speaking, to prove a game is biased we wanna design a test that is

(1) Pre-determined

(2) Well-defined

(3) Measurable

Because Hearts is better known than Spider, I will discuss the former game. Of course similar considerations would apply to Four-Suit spider.

Let us suppose that Joe Bloggs plays a number of hands of Hearts against three AI players. A “game” is a series of hands until one player gets 100 penalty points or more and the winner is the player with fewest points. A win occurs if and only if Joe Bloggs has the fewest penalty points after the game. If we ignore ties than Joe Bloggs should win 25% of the time. But since the AI players are weak, Joe Bloggs expects to win 50% of the time. However to simplify matters, we assume Joe Bloggs is only interested in individual hands. Joe gets the following results:

11010 01101 00010 00101

where 1 represents a “good” hand and 0 is a “bad” hand. Let us assume for the moment that we have a sound definition of what makes a hand good or bad. JB is careful to define good/bad hands (which are player-independent) rather than use wins or losses – to remove the possibility that Joe is falsifying his results, intentionally or otherwise (e.g. by playing on tilt after a 76,79,83,83).

“Hmmm” … thinks Joe. “I started with 60% good after 10 hands but the next 10 hands had only 30% good. Therefore something is wrong.”

But why did JB stop at 20 hands? Did he sign a contract saying he will play exactly 20 hands in front of three senior witnesses working at Microsoft, or did he feel like stopping now because that would conveniently corroborate his gut feel the program is rigged? I guess only Joe Bloggs would know the answer to that question.

If Joe had stopped at 10 hands then he wouldn’t have noticed any difference between the first and second halves of his results (they would both be 60% good games). If Joe had stopped after 50 hands then maybe his luck would even out and Joe would realise the software is not rigged after all. Or perhaps his luck would deteriorate even further. The important point is of course JB must have pre-determined the number of hands before playing, otherwise his results are invalid. Suppose we define a hand as bad if it is impossible to avoid taking Ukula(*), assuming all opponents gang up on JB and the hand is played double-dummy (for simplicity we ignore the possibility of shooting the moon). In theory this is well-defined but it is not trivial to determine if a given hand allows JB to avoid taking Ukula if played double-dummy. So we need a different definition of good/bad hands.

(*) Ukula is the name of a well-mannered Pitjantjatjara-speaking woman who I befriended many years ago. Other players will probably have a different and more colorful name for the card that is worth 13 penalty points.

Joe Bloggs knows that many of his friends like to complain about the 2,3,4,5 of hearts being in separate hands. Assuming the Hearts are not broken, it is mathematically impossible for the 5 of Hearts to win a trick unless (i) somebody is void or (ii) the 2,3,4,5 are in different hands. The chances of event (ii) happening is

52/52 * 39/51 * 26/50 * 13/49 = 0.1055

Let us ignore heart voids and say a hand is bad if the 2,3,4,5 of hearts are in different hands, else it is good.

But wait! In Hearts, most of the hands begin by players passing three unwanted cards to an opponent before play begins. Of course players will not pass random cards so to simplify things we only consider hands with no passing. This implies we need to play four times as many games to get the correct data.

Assuming that the random number generator is truly random we can say this experiment is “measurable” in the sense that we can obtain a probability of certain events. For instance, if we played 2 hands the probability that both are bad is 0.0111 and the probability that at least one is bad is 0.1999.

More complex calculations are possible. For instance, if 20 hands are played then the chances of exactly 5 bad hands is 0.0381 and the chances of at least 5 bad hands is 0.0525.

What we are actually measuring is called a p-value. Assuming the null hypothesis is true, the  probability that we would have observed “our actual result” (or sometimes “at least as bad as our actual result”) is the p-value. If this p-value is less than 0.05 then we say the result is statistically significant at the alpha = 0.05  level. If it was less than 0.01 then it would be statistically significant at the alpha = 0.01 level. Of course alpha must be pre-determined otherwise we are back to the first problem of a test that is not pre-determined.

Our final test would be something like the following:

One final warning: A p-value less than alpha does not imply conclusive evidence. For instance, we may have been very lucky/unlucky and the Random Number Generator gods gave us evidence the game is rigged when in fact it wasn’t rigged after all. But it may enable us to justify further testing – which may then lead to conclusive evidence.

As a chess analogy: suppose Wile. E. Cheetah beats three grandmasters in consecutive games. The organisers suspect cheating because there is too much correlation with the moves of top chess engines. They perform complex calculations in their heads and find p < 0.05. The organisers then force Wile. E. Cheetah to play the next few rounds without his “lucky EMCA headphones” (i.e. further testing). Sure enough W. E. C. learns the hard way that 1 e4 c6 2 d4 d5 3 Nc3 dxe4 4 Nxe4 Nd7 5 Bc4 Ngf6 6 Ng5 e6 7 Qe2 h6?? is not the main line in the Caro-Kann and confesses to everything.

Yes, incidents like these have happened in top-level chess. Googling examples is left as an exercise for the reader.

So there you have it. To derive the final test, we needed to have some knowledge of the game itself (Hearts or Spider) and some basic statistical theory (e.g. hypothesis testing), and we needed to take some care to make sure our experiment is sound. After all it’s hard to prove something is rigged if your experiment is itself rigged!

DISCLAIMER: I have not studied stats formally at uni, so I won’t be surprised if someone can explain hypothesis testing much better than I did here. If you aced statistics at University or High School and found reading this was way beneath your dignity then congrats, perhaps you should start writing your own blog and I should be the one learning from you 😊

Congrats on reaching the end of this post and fully comprehending every word! Here is your reward:

Introduction to Artificial Stupidity

In a previous post, we looked at various features of a starting position such as the number of guaranteed turnovers and guaranteed suited turnovers. This means we can study two different starting positions and say that one is perhaps better than the other. However, ultimately we are interested in our chances of winning. If, for instance, my Dad was really awful at Spider and loses every game regardless of how good the start position is, the number guaranteed turnovers wouldn’t be particularly relevant.

Let us consider the following question: what are the chances of victory given we have N guaranteed turnovers at the start of the game? Obviously we would expect the more turnovers we start with, the greater our chances of winning.

I guess the obvious thing to do would be to play 1 million games of Spider Solitaire on my PC, record the number of guaranteed turnovers at the start, play each game to the best of my ability without 85,78,68,79 and record the result (either win or loss). After all, I’m addicted to Spider Solitaire, my blog is about the Spider Solitaire, the whole Spider Solitaire and nothing but the Spider Solitaire, and I consider myself to be the world’s greatest expert on Spider Solitaire. Unfortunately, playing 1 million games is time-consuming even by my standards. Perhaps I could program an AI to play the games for me. After all, I have a math Ph. D., I have published a number of math-related papers, I have a steady job, and I know a programming language or three. Last year, I created a Flappy Bird cover of the famous Wintergatan Marble Machine using Matlab and Audacity …. Uh oh, I’ve just realised that designing an algorithm to play Spider well is not so trivial. So perhaps we could compromise by designing an AS, where S stands for Stupidity

ENTER MY FRIEND, NINJA MONKEY

Fortunately I have an imaginary friend called Ninja Monkey (not to be confused with Monkey Magic) who is fluent in many languages such as Python-tongue (the language of pythons as well as other magical serpentine creatures), Javanese and C-plus-plus-speak. Thanks to an amazingly fast metabolism, Ninja Monkey is able to play 100 games of Spider Solitaire inside three minutes. On the down-side, his random move strategy is not very good, and he is yet to realise his dream of winning a single game of Spider Solitaire at the Four-Suit level. Nevertheless, he is able to win more than half the time if he pretends each game is played at the one-suit level. Not to mention that he is cute and friendly, and willing to give me a hug when I’m feeling down 😊

If you think about it, this gives us a way to estimate our chances of winning a game given a certain number of guaranteed turnovers. Given a large number of games, Ninja Monkey can record the number of guaranteed turnovers for a starting hand, play random moves pretending all cards are the same suit, report his result (win or loss). For instance, let us suppose that he plays 1000 games, 100 of which start with three turnovers. If NM wins 37 of those and loses the remaining 63, then we can reason that our winning chances are 37% given 3 turnovers. It’s likely not the most accurate estimate, but I guess we gotta start somewhere if you excuse the cliché!

With the help of my Ninja Monkey, I collated the following results

We immediately notice that the win ratio indeed increases as we increase our guaranteed turnovers. That is, if we ignore the division by zero error in column 9. That’s not so surprising when you think about it. After all, 9 turnovers implies a run of ten cards such as 3456789TJQ and the chances of starting with a run of ten cards are pretty slim. Also, there are very few games with 0 or 8 turnovers, so the stats are extremely reliable. With 1 or 7 turnovers we don’t have many data points so these may be doubtful. Around 4 turnovers, the results should be pretty reliable. I could have done more than 1000 iterations, but you get the gist.

You may be justified in asking why anyone would be interested in an algorithm that just outputs random moves and can’t even beat my dad at chess.

It turns out Artificial Stupidity has some important advantages over its well-known brother of Artificial Intelligence. As I already alluded to earlier, it’s often easier to come up with an AS than an AI. Also, Artificial Stupidities are easier to replicate than Artificial Intelligence, so it is easy for Joe Bloggs to design his own algorithm and indeed confirm my figures are correct, or approximately correct. But we shouldn’t be dismissive of AI either. Beating the top human players at Go, Chess or Backgammon is no small feat. I believe that any artificial entity can be beneficial to humankind, whether it be intelligent or stupid – provided it is used for the right purpose!

Steve N Brown (the author of a Spider Solitaire book I alluded to in an earlier post) attempted to compile his own statistics for win ratio vs guaranteed turnovers. He got the following results

We immediately notice there are only 306 games instead of 1000, so it is not surprising that division by zero occurs for 8 or 9 guaranteed turnovers. Also, there is a weird drop to 0.37 win-ratio for 3 guaranteed turnovers. This either suggests 62 data points is insufficient or that Guaranteed Turnovers is a very poor indicator of the overall chance of winning at the Four-Suit level. After all Spider Solitaire is not a Scratch-n-win Lotto Ticket – you can’t win a prize just because you start with 10 good numbers; you’ve got to earn your prize! It is certainly possible for an expert to recover after a poor start (or conversely a beginner to 70,85,67,75 up after a good one). And of course I can’t blame Steve for not playing 1000 games.

Steve has also compiled similar stats for other features such as Suited Guaranteed Turnovers or multiplicity, but this is outside the scope of this blog post.

SUMMARY

I believe neither the Ninja Monkey or Steve Brown has a definitive answer to the question of what is the win rate as a function of number of guaranteed turnovers (assuming expert play sans 85,78,68,79). Playing at the 1-suit level is far too large a handicap for Ninja Monkey’s results to be meaningful and Steve has too few games to make reliable conclusions. So perhaps I do need to design an Artificial Intelligence that can play well at the Four-Suit level. Or someone else can design one. If the latter, I would be perfectly happy to recognise I am not the world’s greatest Spider player (but I would still consider myself a GM 😊 )

Despite the negative result, I wouldn’t call this experiment an exercise in futility. The journey is more important than the destination, and hopefully this is food for thought for the interested reader, if you excuse the numerous clichés. For now, it’s farewell to the Ninja Monkey. Hopefully we might meet him again in better times. Oh, and I’m still looking for a way to get in touch with Steve Brown – so if you know him (or even better, if you ARE him) please leave a comment!

Evaluating a Start Position

We now consider the following question: How can we evaluate a starting position? That is, if you are given an initial game state with 10 exposed cards how do we determine if the chances of winning are good, average or poor? Can we quantify our winning chances as a percentage (e.g. 58%)?

NOTE: evaluating a start position is useful since most Spider Solitaire implementations allow the player to abandon a game without counting it as a loss. But if you are serious about improving your game, I strongly recommend you never abandon games with a poor initial state.

A first thought may be to look for “features” of a game state. For instance suppose we are watching some top quality chess at an unhealthy time of the day. We might notice that

• White has an extra pawn
• The position is an endgame: both sides want to activate their king without fear of suddenly being mated.
• Black’s rook and king are more active than their opposite numbers
• Both sides have vulnerable pawns

Bear in mind we are only identifying individual features at this early stage. Eventually we may wish to formulate an overall assessment by combining these features somehow, but that comes later.

QUESTION: What are plausible features to use in an opening game state in Spider Solitaire?

Avid readers of this blog (yes you!) would immediately identify “guaranteed turnovers” as a possible feature. In the above diagram you should be able to quickly identify 5 turnovers. Of course every man, dog and millipede on the planet knows that building in-suit is even more desirable. In this case we have Q-J in spades and 2-3 in clubs. Therefore we have 2 guaranteed suited turnovers (and hence 3 off-suit turnovers).

Finally we can look at rank multiplicity. All players know that having too much of one rank can be a problem, especially when the adjacent rank is in short supply. You don’t need a Ph. D. in economics to work out things are less than ideal when the Spider Solitaire gods have supplied five Jacks on the opening deal and there is little demand for them. For simplicity let us define the rank multiplicity as the count of the most frequent rank. For instance the above diagram has a rank multiplicity of 2 since we have two Threes/Deuces and no rank appears more than twice. In summary:

• We have 5 guaranteed turnovers
• We have 2 guaranteed suited turnovers
• The rank multiplicity is 2.

There may be other features to consider, but we’ll keep things simple for now.

Are these values good, bad, or average? It turns out one can use simulation to answer this question. For instance if I had nothing better to do, I could play 10 million games of Spider and compute the number of guaranteed turnovers should be 3.97 on average.

Of course the lazy solution is to write a computer program to do the simulation for me. The program can simply deal 10 cards, do the relevant calculations and then abandon the game. An even lazier solution is to copy the results from Steve Brown’s excellent book Spider Solitaire Winning Strategies. He got the following results:

Looking at these graphs, I would immediately dismiss rank multiplicity as a useful feature (the entry for 5 is non-zero but is too small to be visible). After all more than 90% of games will have a value of 2 or 3! It is true that one can tweak rank multiplicity somehow (e.g. giving more weight to Aces and Kings which are the bugbears of most players), but I wanted to keep things simple for the time being. The important point is these quantities are easily obtained via simulation.

Suited turnovers is nice, but I think it’s more important to have many turnovers at the start of the game. In other words, quantity is more important than quality. In the above example, we have 5 guaranteed turnovers and 2 suited, both of which are above average. Hence if given a choice, I would take this position over a random position.

If you are a beginner, I would estimate that: