The World’s Worst Math Teacher (another short story)

“Another one of life’s disappointments.”

“What’s wrong?” I ask.

“Marking assignments, the bane of every teacher,” growls Ms. Spider, as she angrily scrawls the word “DREADFUL” on a sheet of paper. “This goose just divided by zero.”

I’ve always enjoyed math, but I am all too aware that it represents a bugaboo for many ordinary folk. Not everybody can have higher than average IQ and not everybody can play piano and solve Rubik’s Cube at the same time. I agree we have to Make Math Great Again.

“I s’pose I could improve my presentations skills or learn Statistics 101,” admits Ms. Spider.

“I confess I never studied stats at uni,” I respond. “I had to pick it up all by myself.”

“Learning stats 101 sounds too much like work. Surely there must be a better way.”

“You could make the exams and homework easier,” I suggest.

“We can’t make it too easy,” responds Ms. Spider. “I’m sure the good students wouldn’t mind an extra challenge or two,”

I steal a glance at the goose’s assignment. Yes the goose is below average, but one of the assignment questions are badly worded. Another question has kilometres as a typo for metres, and I have to suppress a chuckle. I can see why some of Ms. Spider’s students call her the WWMT.

“Actually,” says Ms. Spider, “I was toying with a more radical solution”

“Which is?”

“We could give different exams to different students”

“What a revolutionary idea!” I exclaim. “Nobody has ever thought of this before!”

“From each according to his abilities … “

“From each according to his needs,” we chant in unison.

I am impressed: this Spider is clearly well-educated, not just in mathematics. She knows her clichés and sayings.

“Does that mean,” I ask, “if an awesome student correctly answers 40 assignment questions in a row then he will get a very difficult exam?”

“Exactly.”

“Hang on, what if an awesome student deliberately flunks the assignments …”

“Well … we could give the exam less weight than assignments,” the Spider responds somewhat nervously. “Then there is no advantage to tanking the assignments.”

“That’s Dandy!”

“For this to work,” continues Ms. Spider, “we have to come up with some way of measuring the difficulty of certain questions.”

“I understand,”

I mull over this for a while. We all know that students can be graded according to some chosen system. For instance, a math student can be Outstanding, Exceeds Expectations, Acceptable, Poor, Dreadful or Troll. But how can we grade certain questions?

The Spider writes two math questions on a sheet of paper:

mathquestionz

“Which of these problems is harder?” asks Ms. Spider.

“I think both are equally easy. After all, I participated in the International Mathematical Olympiad many years ago.”

Somehow, I think that was not the answer Ms. Spider expected.

Behind us, a monkey, eagle, mouse, elephant, lion and jackal are enjoying some Texas Holdem. As usual, the monkey has squandered away all his chips early, and the Eagle is schooling the rest of the field, having accumulated more than half the chips in play. The Spider eyes them warily: clearly they should not be privy to our discussion.

“You see,” says Ms. Spider. “Sometimes I find it hard to judge the difficulty of a single question. For instance, I expect problem X to be easier than Y, but for some reason the reverse holds when I mark the assignments.”

I mull over Ms Spider’s words. I am not really in a position to judge, given I have never marked any student assignments.

“I have an idea,” says Ms. Spider. “Let’s draw a table”

pic1

“For simplicity,” says Ms. Spider. “Let’s assume each question is either marked correct or not correct, hence there are no partial marks. I use blank instead of 0 for ease of reading. Sam is an awesome student since she answered most questions correctly. Owen is a Stupid student because he only scored 2 out of 9. Each individual is represented by a single row.”

“Okay.”

“But there is no reason we can’t do the same with columns if you pardon the double negative. For instance, only six people solved problem 8 but nine solved problem 9. Therefore problem 9 is harder than  problem 8 …”

“So even if you don’t understand the questions themselves you can still say things like Debbie is better than Anna”

“Exactly,” replies Ms. Spider.

“With 18 students and 9 problems, you don’t have a lot of data”

It’s a stupid observation, I know – but I am only trying to buy time as I try to digest her ideas.

“Well, the same logic applies if we had 1800 students and 900 problems.”

“I think I understand,” I say. “It’s like some kind of Mechanical Turk. Previous students have tried these questions (and of course you don’t have to pay them to do these exams!), so you can work out which questions are easy or hard.”

“Wasn’t the Mechanical Turk some kind of fake chess-playing machine by Wolfgang von Kempelen? What a disgraceful idea! I would never try to cheat chess players like that”.

Okay, didn’t see that one coming. We need to agree on a definition of Mechanical Turk.

“Do you think your students will eventually find out their exam papers are different?”

“That shouldn’t be an issue,” says Ms. Spider, as she squirms in her seat. “If a poor student finds out, he has no reason to complain. If a good student finds out then deep down in his heart he already knows he is better than the poor student, so the exam result doesn’t matter.”

Somehow I think her logic is very, very, unsatisfactory. But I do know that many of the greatest insights precisely come from those who are willing to suggest ideas that sound utterly outrageous. For instance Rivest, Shamir and Adleman are your average computer scientists, but with a bit of luck they might one day become famous, known to every student of cryptography. So I should cut her some slack.

In fact, I am more than looking forward to the results of her revolutionary teaching methods. After all, I’m not the teacher and I don’t set the exams. I was especially careful not to suggest any drastic ideas of my own. If the radioactive 83,72,73,84 hits the fan and grows to fill the size of the entire house then I am more than happy to watch, knowing my 65,82,83,69 is fully covered.

Bring. It. On.

 

 

The WHAT HOW and WHY of biased Spider Solitaire servers

In my paper on Spider Solitaire, I presented evidence that Opaque Solitaire(*) was biased: if a player won too many games the random number generator will favour difficult hands for future games. However I didn’t discuss why or how a software designer will do this. These are definitely valid questions, and will be the topic of this post.

(*) Opaque Solitaire is not the real name of the server.

Steve Brown, the author of Spider Solitaire Winning Strategies gives good reasons why most spider solitaire implementations will not rig deals if the player wins too much.

Steve Brown’s argument boils down to three main points.

  • The WHAT: claims of bias on internet forums cannot be substantiated with any evidence.
  • The HOW: It is not clear how to rig the random number generator without significant effort
  • The WHY: Why would a software developer 80,74,83,83 off its users?

 

However, for the Opaque Solitaire program in question I can (partly) refute Steve’s argument.

The WHAT

Obviously I have solid evidence Opaque Solitaire is biased: my paper is based on statistical hypothesis testing. I obtained a “magic number” that happened to be statistically significant at the alpha = 0.05 level. If you want more details, you know what to do 😊 The more interesting point of this post concerns the HOW. Suppose you wanted to design a Spider program that can adjust the difficulty level according to the player’s skill.

The HOW

Let’s say you compiled a dataset of 100,000 hands (initial game states), and you wish to estimate the difficulty level of each hand. For simplicity assume difficulty is represented as an equity between 0 and 1. For instance, an equity of 0.58 means you are willing to bet 58 cents if a win pays $1. Hence higher equity means the game is easier to win. Getting an expert human player to test 100,000 hands is clearly impractical. One can write a program to automatically assess the equity of each hand, but that runs into trouble too. For instance, the ranks of the first 10 cards showing is a poor indicator of equity since it is more than possible to overcome a poor start (or conversely a good start can sour).

But why not get the players themselves to estimate the equity for you? Consider the following hypothetical table

pic1

There are 18 players and 9 games. For each game and player, the result is either WIN (1), LOSS(0) or NOT PLAYED (blank) and I have color-coded results for ease of reading (any subliminal messages or easter eggs you find are purely coincidence!). Most of the games have been played so only a few data points are missing. For instance we can deduce that Sam is a pretty good player since he won 7 games out of 9 whereas Owen or Fiona is much worse. Similarly we can look at individual columns and say e.g. Games 3 and 8 are equally easy or hard since they have 6 wins, 11 losses and 1 not played. Game 6 or 9 is easier since more players beat it. We therefore can decide Game 9 is suitable for Owen because Owen is a poor player and wants to play easier hands. But we would not assign Game 5 to Isabella since that is relatively hard. One can think of this table as a Mechanical Turk, but the crowdworkers don’t find the tasks very onerous because for some reason they actually enjoy playing Spider Solitaire 😊

Note that implementing this does not require us to know much about Spider Solitaire strategy. The results of the table speak for themselves. For instance Debbie “dominates” Anna because whenever Anna won a particular hand, so did Debbie. But the reverse is false. Hence we know Debbie is a better player than Anna.

Obviously a small number of data points is not reliable, but it’s not hard to imagine a similar table for a much large number of players and games. Note that it is not necessary for every player to play every hand for this to work. Anyways, you get the gist. Assuming your Spider Solitaire  program is online and you are able to store cookies, you can keep tabs on which players are better than others and which hands are easy or hard. Hence you can assign the “correct difficulty” hands to different players. There might even be a Ph. D. or two in this 😊

I’m not saying this is the best method to rate the difficulty of individual spider hands, but this is one way to do it.

The WHY

As for the WHY, my best guess is the developer(s) of Opaque Solitaire wish to challenge a good player and not bore him with too many easy hands. Unfortunately statistical testing can only say the data is fishy, and cannot answer why someone would “make the data fishy”, if you will.

I’ve seen forums where players accuse Microsoft Hearts of cheating. Some players claim that MS must compensate for the AI’s poor strategy by collusion. Others say MS does this because the software designers have good intentions but don’t understand the expectations of players. I agree Joe Bloggs probably knows nothing about Statistics and he is probably on tilt after losing three hands in a row. But when Jane Citizen accuses the same program of reneging or playing a card known to be held by someone else then you know you’ve got issues. I haven’t played much Microsoft Hearts but I’m siding with the chumps. For the same reasons, I would not lightly diss anyone who complains about rigged games, Spider Solitaire or otherwise (NOTE: the MS Hearts forums may be out of date and the software may have changed for the better, but you get the gist). Since my paper was successfully published, I believe the onus of proof should rest on software developers: they should make it easy to test if a program is biased or not.

Summary

In summary, I believe most Spider programs are kosher, but not Opaque Solitaire. One word of warning: I do not have conclusive evidence that Opaque Solitaire deliberately rigs the cards because that’s the nature of Statistics – hypothesis testing alone does not equal conclusive proof: if your p-value is less than 0.05 then you might have obtained “unlucky data”, forcing you to jump to the wrong conclusion. But p < 0.05 can be used to justify further testing. But the point of my paper was to show that the bias exists and can be quantified using statistics.

Another short story (yay!!!!!)

Hooray! I have finally quit my day job and found something I really enjoy: teaching students how to play 4-suit Spider Solitaire well. According to legend, nobody in the Animal Kingdom has managed to beat the game at the highest difficulty level. Even the Ninja Monkey with an amazingly fast metabolism couldn’t achieve it despite 50 quintillion tries (and counting).

In my first class I have 6 students: a mouse, lion, Jackal, Elephant, Eagle, and last but not least, the monkey.

I have already gone over the basics: empty columns are good, suited connectors are good, but aces and kings are usually not your friends. I notice the monkey is taking copious notes. He is an ideal teacher’s pet, if you excuse the lousy pun.

“Take a look at this position,” I say. “Do you think we should win with correct play?”

example_position

“I think so,” replies the mouse. “I would bet 3 dollars.”

“That means you think it’s not possible,” quips the lion.

“What do you mean?”

“Your bet is too small,” replies the lion. “If you thought this is a win, you would be betting $30, not $3.”

“I would raise it to $60 at least,” offers the Jackal.

“Yeah sure,” says the elephant.

“We do have 4 suits removed and two empty columns. Sixty dollars says we win this.”

“I wouldn’t bluff if I were you,” replies the Elephant. “Look at that pile of 83,72,73,84 on Column 4.”

“Come on guys,” says the eagle. “I think we need to analyse this seriously. This game is about math, not people or animals.

I turn my attention to the monkey, who as so far been silent.

“What’s your opinion?” I say, putting him on the spot.

The monkey looks embarrassed.

“This would be easier if it were one-suit.”

Everyone laughs. The monkey looks like he wants to kill himself.

“But that’s way too easy!” exclaims the mouse. “Even I would go all-in.”

“The monkey raises a good point,” I say.

Stunned silence. All the other animals look at each other, unable to believe what they heard.

“Okay,” I continue. “Let’s change the rules: the game is one-suit but we cannot remove suits to the foundations until all cards are exposed.”

“But that’s cheating!” shout all the animals in unison (except the monkey). “You’ve already moved four suits to the foundations”.

Don’t ask me how the animals manage to speak in perfect unison without proper rehearsal. I guess everyone has their own unique talents.

“For purposes of this exercise, let’s assume I changed the rules mid-game and you have to Deal With It.

The animals discuss this for a few minutes.

“K to 3 in column Five,” says the mouse. “9 into column Seven. 10 to A onto the Jack in -”

“Not-so-fast,” replies the eagle-eyed eagle. “There are two aces in column 4.”

“But we can create another empty column,” says the Jackal. “4-3 in column Eight onto 8-7-6-5 in column Nine.”

“Once we reach the 4 of Clubs the rest should be easy. Column 2 becomes empty and 4 of Clubs into Column 2 etc.”

“And it doesn’t matter what order the hidden cards are in.”

I am pleased that all my students are participating in the discussion.

“So we can win at 1-suit,” I say. “Note that if we couldn’t win with 1-suit we can deduce it probably won’t be possible at 4-suits (unless we can quickly complete a suit).”

All the animals nod in agreement.

“Now back to the original problem,” I say. “How to continue at the Four-Suit level?”

The animals quickly discover the right plan. Once the King of spades lands in an empty column we can recover an empty column by moving the 8-7-6-5 onto the Nine of spades. We then have to shift the Nine of spades and Ace of clubs into two empty columns. Then we have to hope we have enough empty columns to finally shift the 10 of Hearts and reach the Four of clubs. Of course, all this is easier said than done, if you excuse the terrible cliché.

The monkey brings out a set of cards and arranges them into the diagram position. He tries and tries but to no avail. Despite his repeated failures the other animals are amazed at the monkey’s dexterity and eidetic memory as he quickly reorders the cards into the starting position without error.

“Let me have a try,” says the eagle.

The eagle quickly discovers a truly remarkable solution to this problem, which this blog is unfortunately too small to contain. All the animals applaud loudly as he smacks the last suit onto the foundations with a satisfying thud.

“How the 70,85,67,75 did you do that?” asks the Monkey.

Uh oh. I’m beginning to have doubts about the quality of the monkey’s copious notes. His strategy is still as lousy as before. He is not really a model student after all.

“Let me have a look at your notes,” I say.

I confiscate the monkey’s book and riffle through his “oodles of doodles”, none of which have any artistic merit. I am about to 83,80,65,78,75 the monkey in front of everybody and rip his doodles to shreds. But at the last minute I suddenly remember that I owe the monkey a tremendous debt. Without him I wouldn’t have been able to publish a paper in the International Journal of Arachnids, Primates and Other Predatory Species. Then I stumble upon a very strange picture:

bestdoodleever

So the monkey does have occasional flashes of brilliance when doodling after all. Okay, the happy star isn’t exactly centred properly, but it’s not a bad job for someone with only pencil and paper. I doubt that any human could produce art like that. I return the book to the monkey and bow before him.

Seeing that the monkey got off without even a warning, I guess it’s only fair that the last word belongs to Shakespeare: All’s Well That Ends Well.

Class dismissed.

I like my new job. Okay, the pay isn’t great and the diet leave much to be desired, but at least I’m my own boss and I get to set my own hours. All the animals are easy going and real friendly, and we can cuss and swear with impunity. And best of all, nobody smokes around here (because smoking is bad for you). 70,85,67,75 89,69,65,72!

Time to spill the beans I guess

I guess it’s time to spill the beans (though avid readers of this blog may have gathered already):

spillbeanspic

Earlier this year, my Spider Solitaire paper was published in the excellent journal Parabola, a mathematics journal aimed primarily at high school students. In this paper I showed that a particular Spider Solitaire server is biased: If a player wins too often the cards will be stacked, making it harder to win (assuming I did not “hit” the 1 in 20 chance of incorrectly rejecting the null hypothesis). I do not know why or how the server does this, but perhaps that will be the subject of a future post 😊 What I do know is the Spider Solitaire server in question is very badly designed. The company in question does a number of card games involving the well-known Klondike, Freecell and the like. if you look past the beautiful graphics, sound and animations, the server has a number of “fundamental errors” such as not knowing almost every game in Freecell is winnable or that every tile in MahJong Solitaire should appear exactly 4 times. Once upon a time I played 24 games of Spider Solitaire after resetting my stats. I won 50%, had a longest winning streak of 8 and a longest losing streak of 1. Go figure.

I kid you not.

The key observation I made was that making random moves is sufficient to beat 1-suit solitaire without undo more than half the time. Ergo, we can estimate the difficulty or a particular hand by repeated simulation. If the game is played at 4-suit, we can still estimate the difficulty of a hand by pretending it is 1-suit. All this requires that we are able to determine the identity of every unseen card in the initial game state.

In my experiment, I bought a new computer (to remove the possibility that the computer already knows I am an experienced player). I played 40 games, because that provides a reasonable amount of data without being too onerous (I definitely want my experiment to be replicable for less experienced players). I deliberately used undo to ensure that every game was won (and also to record the identity of every unseen card). To test whether games get harder, I computed the probability that of two randomly chosen games the latter would be more difficult than the former. I found the result to be statistically significant at the alpha = 0.05 level.

I highly recommend Parabola for the serious mathematicians among you. The feature articles are very well written. The problems are somewhat beneath my dignity (but what do you expect given I competed in the 1995 International Mathematics Olympiad and composed my own problem for 2016?) but I can see how they are intended to make high school students enjoy mathematics. High school teachers will definitely want in on this. Yes, I thought that Square Root of Negative Pun and 2Z or Not 2Z are a bit weak (at least with Bad Chess Puns you get to sharpen your tactics), but overall I think Parabola has much to recommend it.

badchesspuns-image

For me, the most pleasing aspect of this paper was how I was able to combine various “elements” such as statistics, random walks, basic Spider Solitaire strategy etc and combine them into a harmonious whole, resulting in something more awesome than my Flappy Bird cover of the Wintergatan Marble Machine. In closing, I will leave the final word to Thomas Britz, editor of Parabola: “In each their way, these items remind me of some of the many reasons for why I love mathematics: maths is elegantly useful and usefully elegant; it is beautifully surprising and surprisingly beautiful; and it provides insights into connections and connections between insights. It challenges; it entertains and it provokes much humour.”

Artificial Stupidity in Chess

You may remember some time ago I discussed an algorithm for Spider Solitaire that is not very good: it simply outputs random moves. It turns out somebody did a much better job in the game of chess. Some dude designed no less than 30 Artificial Stupidities and organised a Tournament of Fools, and published a number of papers in SIGBOVIK. Ideas for weird algorithms include color preference (e.g. White prefers to play pieces onto light squares), random moves, blindfold algorithms (simulating a novice trying to play blindfold), algorithms based on mathematical constants like π and e, single player (pretending opponent will pass) and linear interpolation between Stockfish and some other lousy algorithm (e.g. choose Stockfish’s best move with probability p, lousy move with probability 1-p. But my favourite algorithm was the Mechanical 68,79,82,75 that proved a forced win for Black after 1 d2-d4?? a7xd4!! checkmate 🙂

You can watch all the fun in the video below:

I’m not sure if these ideas will be applicable to Spider Solitaire. Color Preference is easy since we can prefer to move red cards or black cards, and single-player is even easier given the nature of the game, but I am not aware of any equivalent of Stockfish. Mathematical constants should be easy but probably not very interesting. It may be possible to simulate a blindfold (human) player who struggles to remember every card, but I’m, not sure how to do that yet. And I don’t know of a (sensible) variant of Spider Solitaire where all the red cards are replaced with chess pieces. Since Western chess has Black vs White, it may be more appropriate to use Xiangqi, which has Red vs Black pieces. Perhaps something to think about for next time.

Thanks to my good friend Tristrom Cooke for the heads up.

Monkey wins Man vs Wild

And it’s over! Spider GM played well to win 8 games, but monkey went one better with 9 victories. Most of the games were easy, but there were a few exceptions. Game 3 was just horrible: the final round was 833A5A8jk4 which is close to an instant loss (assuming no lucky suited connector such as a 3 landing on a 4 of the same suit). And that was not the only problem. Both human and monkey “agreed” on every game (i.e. both win or both lose) except game 4. Spider GM never found an empty column since the tactics were all wrong. Even so, 64% by the monkey was not the most convincing of victories. The conclusion is that the monkey’s win rate should have some correlation with an expert playing 4-suit solitaire. In other words, the effects of making random moves and playing at the one-suit level pretty much cancel each other out.

GameSpiderGMMonkey
1win89%
2win 64%
3lose26%
4lose 64%
5win67%
6win72%
7win79%
8win55%
9win62%
10loss36%
points7eight

Of course 10 trials is not a lot of data, and perhaps I need more games to say for sure. Also, lesser players may find similar results, but the threshold should be e.g. 30% not 50%.

Congrats to Monkey for a well-earned win and commiserations to Spider GM.

BTW If anyone knows how to change the word “eight” in the above spreadsheet into a number 8 without the unwanted percentage sign, please let me know 🙂