The WHAT HOW and WHY of biased Spider Solitaire servers

In my paper on Spider Solitaire, I presented evidence that Opaque Solitaire(*) was biased: if a player won too many games the random number generator will favour difficult hands for future games. However I didn’t discuss why or how a software designer will do this. These are definitely valid questions, and will be the topic of this post.

(*) Opaque Solitaire is not the real name of the server.

Steve Brown, the author of Spider Solitaire Winning Strategies gives good reasons why most spider solitaire implementations will not rig deals if the player wins too much.

Steve Brown’s argument boils down to three main points.

  • The WHAT: claims of bias on internet forums cannot be substantiated with any evidence.
  • The HOW: It is not clear how to rig the random number generator without significant effort
  • The WHY: Why would a software developer 80,74,83,83 off its users?


However, for the Opaque Solitaire program in question I can (partly) refute Steve’s argument.


Obviously I have solid evidence Opaque Solitaire is biased: my paper is based on statistical hypothesis testing. I obtained a “magic number” that happened to be statistically significant at the alpha = 0.05 level. If you want more details, you know what to do 😊 The more interesting point of this post concerns the HOW. Suppose you wanted to design a Spider program that can adjust the difficulty level according to the player’s skill.


Let’s say you compiled a dataset of 100,000 hands (initial game states), and you wish to estimate the difficulty level of each hand. For simplicity assume difficulty is represented as an equity between 0 and 1. For instance, an equity of 0.58 means you are willing to bet 58 cents if a win pays $1. Hence higher equity means the game is easier to win. Getting an expert human player to test 100,000 hands is clearly impractical. One can write a program to automatically assess the equity of each hand, but that runs into trouble too. For instance, the ranks of the first 10 cards showing is a poor indicator of equity since it is more than possible to overcome a poor start (or conversely a good start can sour).

But why not get the players themselves to estimate the equity for you? Consider the following hypothetical table


There are 18 players and 9 games. For each game and player, the result is either WIN (1), LOSS(0) or NOT PLAYED (blank) and I have color-coded results for ease of reading (any subliminal messages or easter eggs you find are purely coincidence!). Most of the games have been played so only a few data points are missing. For instance we can deduce that Sam is a pretty good player since he won 7 games out of 9 whereas Owen or Fiona is much worse. Similarly we can look at individual columns and say e.g. Games 3 and 8 are equally easy or hard since they have 6 wins, 11 losses and 1 not played. Game 6 or 9 is easier since more players beat it. We therefore can decide Game 9 is suitable for Owen because Owen is a poor player and wants to play easier hands. But we would not assign Game 5 to Isabella since that is relatively hard. One can think of this table as a Mechanical Turk, but the crowdworkers don’t find the tasks very onerous because for some reason they actually enjoy playing Spider Solitaire 😊

Note that implementing this does not require us to know much about Spider Solitaire strategy. The results of the table speak for themselves. For instance Debbie “dominates” Anna because whenever Anna won a particular hand, so did Debbie. But the reverse is false. Hence we know Debbie is a better player than Anna.

Obviously a small number of data points is not reliable, but it’s not hard to imagine a similar table for a much large number of players and games. Note that it is not necessary for every player to play every hand for this to work. Anyways, you get the gist. Assuming your Spider Solitaire  program is online and you are able to store cookies, you can keep tabs on which players are better than others and which hands are easy or hard. Hence you can assign the “correct difficulty” hands to different players. There might even be a Ph. D. or two in this 😊

I’m not saying this is the best method to rate the difficulty of individual spider hands, but this is one way to do it.


As for the WHY, my best guess is the developer(s) of Opaque Solitaire wish to challenge a good player and not bore him with too many easy hands. Unfortunately statistical testing can only say the data is fishy, and cannot answer why someone would “make the data fishy”, if you will.

I’ve seen forums where players accuse Microsoft Hearts of cheating. Some players claim that MS must compensate for the AI’s poor strategy by collusion. Others say MS does this because the software designers have good intentions but don’t understand the expectations of players. I agree Joe Bloggs probably knows nothing about Statistics and he is probably on tilt after losing three hands in a row. But when Jane Citizen accuses the same program of reneging or playing a card known to be held by someone else then you know you’ve got issues. I haven’t played much Microsoft Hearts but I’m siding with the chumps. For the same reasons, I would not lightly diss anyone who complains about rigged games, Spider Solitaire or otherwise (NOTE: the MS Hearts forums may be out of date and the software may have changed for the better, but you get the gist). Since my paper was successfully published, I believe the onus of proof should rest on software developers: they should make it easy to test if a program is biased or not.


In summary, I believe most Spider programs are kosher, but not Opaque Solitaire. One word of warning: I do not have conclusive evidence that Opaque Solitaire deliberately rigs the cards because that’s the nature of Statistics – hypothesis testing alone does not equal conclusive proof: if your p-value is less than 0.05 then you might have obtained “unlucky data”, forcing you to jump to the wrong conclusion. But p < 0.05 can be used to justify further testing. But the point of my paper was to show that the bias exists and can be quantified using statistics.