I have finally achieved awesomeness! My Spider Solitaire Sudoku puzzle has been featured on Cracking the Cryptic.
For those who are interested in the puzzle only, here is the grid: if you’ve played any Spider Solitaire the rules should be guessable – and if you get a unique solution then you know you’ve guessed correctly 😊 But if you’re interested in the back-story then please read on.
If you follow this blog regularly, you are probably aware of a paper I published some time in 2019. I showed that a particular Spider Solitaire server was biased: if you win too many games then future games will have the cards stacked against you – and one could “prove” this using Statistics.
I use quote marks because the nature of Statistical testing always implies some degree of uncertainty. For instance if you are 95% confident of a hypothesis, then there is a 5% chance you made an error. But it is commonly accepted practice. If your experiment is sound and you get a sufficiently small p-value then go ahead and publish it anyway. You may be wrong, but – to put it in Poker terms – your results pretty much force you to call all the way to the river. If you are beat then you are beat.
Of course, getting the results you want is only the first step. We all know the academic publishing model is broken. The peer review model is hopelessly flawed. At best, peer review is based on good intentions and met the demands of research scientists 30 years ago – but certainly not today. It already takes long enough to get accepted into a mediocre journal, or even the dreaded arXiv. If you’re that desperate you might be willing to spell arXiv backwards. And don’t get me started on predatory journals. I won’t describe the ills of academic publishing in all its gory detail. Someone else can probably explain it much better than I can. In my case I ended up publishing into a high school journal. Parabola from UNSW to be exact.
But at the end of the day, publishing is essentially “a way to prove or showcase your research skills”. Once you complete your thesis (or minor thesis, 3-month vacation employment, etc) and may or may not be a major component of your career depending on your employment. (It is true that my Spider Solitaire paper is not relevant to my job, but that has nothing to do with the Fundamental Theorem of Calculus.)
Still, the Parabola publication still wasn’t entirely satisfactory. My paper wasn’t truly a publication. It was a story. I wanted to tell a story about how a certain Spider Solitaire was broken. There is nothing intrinsically wrong with Parabola (with the possible exception of some really lame comics and puns), but try telling that to the average Joe Bloggs with an average job, has little aptitude for mathematical puzzles and swears by Nova FM. In fact, telling this story was the original motivation for me starting this blog in the first place.
Scientists don’t have a way of getting their work recognised. They have no way of “controlling the narrative” if you will. I can publish a paper in some journal. Or I can post something on a blog and have all the scientific evidence to back it up. But how many people are going to read it, let alone believe it?
Enter Cracking the Cryptic.
You may have already guessed I am a fan of CtC (not necessarily because of this blog!). I was vaguely aware of it last year. It seemed to be massive in the UK. I tried one of the harder puzzles. Solving it was beneath my dignity – after all I scored a silver medal in the 1995 International Mathematical Olympiad. Okay I get it. There’s a pandemic going on. People are struggling in the UK. Some viewers have even commented on YouTube how watching episodes of two people solving Sudoku puzzles helped their mental health issues. I’m living in Australia not the UK. Australia really is the lucky country, so who am I to judge?
I then stumbled on this puzzle by Lucy Audrin.
This is a “Sandwich Sudoku with a twist”. Before solving the puzzle Mark briefly mentions Lucy’s website and eventually finishes the puzzle in just over 15 minutes.
You read that right. Lucy wanted to draw attention to her website. All she had to do is submit a half-decent puzzle to CtC and Mark will take care of the rest. To be fair her puzzle is more than half-decent and a good illustration of how one can keep the puzzles fresh by tweaking various rulesets (such as thermometer, anti-Knight, XV, arrows etc). If Simon and Mark only did classic Sudoku every day of the week, CtC would have finished long ago. I should also mention that Lucy can write much better stories than I can!
Great – if Lucy can draw attention to her website then perhaps I can do the exact same thing with Spider Solitaire.
This was much harder than anticipated.
It would surprise nobody if I claimed I could construct a correct Sudoku puzzle with a unique solution and Spider Solitaire theme. There was one obvious hurdle: if I submit my puzzle and it gets rejected – then good luck trying to resubmit the same puzzle a second time. I decided to play it safe by first submitting “test puzzles”.
It was a long process. Essentially I needed to “play the networking game” and gradually build up reputation. I spent a significant amount of time testing puzzles by other setters, joining the Discord server and chatting, signing up for Patreon, creating my own puzzles, etc. I submitted the above Spider Solitaire paper to the Discord a few months ago, but eventually realised that was not the same as submitting directly to CtC (submitting to Discord only means CtC have permission to do it, if it gets nominated). Yes, networking really did make things a lot easier in the long run. If you play nice and do all the right things then eventually people will help you when you need them to. If Sudoku is your thing then I heartily recommend you join the discord server. Great people, great puzzles, great jokes and cultural references. Occasionally somebody may attempt to pull off a rick-roll. What’s not to like? 😊
I emailed CtC my puzzle earlier this week and finally my luck was in.
So there you have it. If you follow my blog regularly, then I hope you enjoyed the journey as much as I did. Until next time, happy Spider Solitairing 😊
6 thoughts on “Awesomeness has been achieved!”
Wow, interesting if unfamiliar terrain. Angling and elbowing to get what a person produces noticed is something I both dislike and am not very good at. But I’m delighted you were able to pull that off, with considerable cleverness and persistence! What I missed (maybe by joining the blog late) was a link to the actual paper where you showed that Spider server was biased (I think you alluded to it in a post that I read but I missed any link). I have a Ph.D. in psycholinguistics (from 1982) so I know what P values are and have some knowledge of experimental design, etc., so it would be fun to take a look. I think I get how that Sudoku should work and it sounds like fun! I’ll see if I can figure out how to print it and give it a try.
The Spider Sudoku game
So your name is Trevor Tao. Nice to meet you! If there was some way to determine that from what was visible in your blog, I didn’t know what it is. Sounds like you have proven smartness in a great many areas. Do you have a day job? You seem to like to remain shrouded in some mystery.
Having a world-class brilliant brother is also pretty stunning, though regression towards the mean would predict a lesser level of brilliance for you. With a brother like that a person might feel inferior even if they are in the top .001% of intellects worldwide (heck, there would be 100,000 people smarter than you worldwide). I’m nowhere near that level. I might be top .1% if I’m lucky. I’m not sure that necessarily translates to your always being right about every Spider difference of opinion we have, though I am open to the possibility. 😦
First, regarding the Sudoku… I could pretty well figure out the rules just by looking at the diagram, though the negative constraints were not entirely obvious. If you imagined those rectangular boxes as indicating in-suit runs in a 4-suit game, then a 5 of spades below a 6 of diamonds would meet the constraint but not show in the same box. The video made it clear early on what you actually meant.
First observation is that it was fun! I would enjoy doing a book of puzzles constructed this way — though hopefully just a bit easier than this one.
I made one mistake early and had to start over. On my second attempt, I got the top third of the puzzle (which I could verify by watching the video). But I found some contradiction in what I had written in the middle third, meaning I had made a mistake there. Instead of persevering I gave up. One reason I gave up Sudoku many years ago was this tendency for decisions to cascade… one mistake might not be detectible until a great many more decisions had been made, with no obvious trace-back mechanism. I’m confident I could have solved it with a bit more persistence. Watching the guy do it in the video I followed what he did. Occasionally I saw an easy reason to fill in part of the puzzle that he had been missing.
I presume a computer can almost instantly solve any solvable Sudoku using brute force methods. So that’s not interesting. But some years ago I made a program that instead simulated a few specific strategies that humans use — ones that are taught in standard “how to win Sudoku” courses. I could then enter a given Sudoku and see what the program did. Understandably, for “easy” problems it typically solved the puzzle completely. For other cases it got stuck and presented you with a situation that required some more in-depth thinking. I might be able to retrieve it if desired.
Hi Bart, thanks for your comments. Don’t feel guilty about not persevering when a Sudoku is too hard. If anything, Mark/Simon should the guilty ones since they do cherry-picking (I assume you know what that means!) and discard the videos when they can’t solve the puzzle in a reasonable amount of time. And yes, you’re not the only one to spot easy digits that Mark and Simon are missing! Some folk on the Discord Server have written their own solvers, and it’s easy enough to google solvers that are publicly available.
The Bias Paper
Next, I took a look at the paper proving Spider gets harder. I think followed it, and it looks sound to me. My following thoughts might well be incorrect or even dumb. But I figured you might appreciate some possibly-dumb feedback over none at all.
1. It looks like you showed that the games got harder to win as time went on, and you had a condition in which you were winning a lot of games. I don’t recall any control condition where you were NOT winning games. It would be valuable to see what it does in that case. It also might not just be looking at “win/loss” but the quality of the loss. For instance, one way to lose is to simply resign immediately. Another is to play through to a win but resign just before removing the last suit. An intermediate one might be to resign just after the last deal. Hopefully there’s a “restore factory settings” button that will reliably clear all data about past games. But that would give you a better sense of what the programmers are aiming for.
It would be interesting to see if succeeding games got easier under certain conditions of lots of losses or just stayed the same.
Without the control conditions, it didn’t look to me like you had addressed the alternative explanation that the games simply got harder over time, independent of how often a person won. This in turn could conceivably be due to a simple bug in the programming, like perhaps a faulty random number generator. You showed increasing bias with time, but not what triggered it.
2. Given the replication crisis (https://en.wikipedia.org/wiki/Replication_crisis#:~:text=The%20replication%20crisis%20(also%20called,the%20social%20sciences%20and%20medicine.) , using a p value cutoff of .05 has an “iffy” vibe to it. I think the statisticians argue that p values aren’t the right thing to use in the first place. I might have gotten a glimmer of their arguments from time to time but they didn’t stay with me. But I did observe that if you use a P value cutoff of .001, that in practice is also significant using their more complex methods. If it were just a matter of turning the crank on a program, I would think you missed an opportunity for a more persuasive statistical result. But this did require you to play 40 games by hand (and I’m impressed that you as an actual human won all 40 of them even using undo), so I’m sympathetic with a desire to not be required to play another 80 games, or whatever it would require. I wasn’t clear if you started with 40 games and turned the crank to get your .05 significance, or whether you played 20 games, saw a P value of .25, played 30 and saw .15, and then with 40 saw .05 and stopped, but prepared to add more games until a significant result was achieved. That would be methodologically questionable and make the hair of the hard-nosed statisticians stand up on their heads.
So on the whole I would say that this is best seen as great exploratory research, done by an amateur in their spare time. But more research could answer other interesting questions and provide more solid statistical foundations.
3. Free Spider Solitaire is in fact the program that I use to play Spider. A win rate of 25% is what I had in the old days with the Microsoft Windows XP program — with my home computer from the 2010 to 2016 timeframe, stats show 368 wins and 1134 losses. I could fire up my older computer and find what the stats were on that in the 2004-2012 timeframe, but so far haven’t seen a good reason to do so. With the new program I’ve won 20 and lost 82, for a win rate of 20%. On the whole I’ll chalk this up to idiosyncratic factors and not even begin to claim it has anything to do with bias. I am less likely to have the desire and stamina and ability to analyze situations as deeply as might have earlier. (Have I mentioned that I’m 66 years old?)
I do give up on games that look initially unpromising. My goal is to get into an interesting “middle game”, and don’t care if I give up on games I might have won if I had slogged through things looking not so promising. So I figure my win percentage if I made it a priority would be a point or two higher. But it’s not my goal.
Hi Bart, thank you very much for your comments.
I very much appreciate your feedback, and your feedback is far from being dumb. You raise some legitimate points.
I agree control conditions are interesting. I could have run a control experiment where I lost every game (but I still needed to determine the location of every card for this to work). However, my aim was merely to ask the question of whether the program is biased, and not why (or when) it is biased. In fact (to my shame) I haven’t studied Statistics formally, so you may well know more about control conditions than I do!
When players cheat at top-level chess, what typically happens is that a player’s moves is found to have a significant match with AI play i.e. sufficiently small p-value. This isn’t conclusive proof, but it’s enough for justify further action, such as asking Joe Bloggs to take off his shoes. If Joe Bloggs complains that he can’t take off his shoes due to his smelly feet then he is disqualified 😊 Similarly I can claim that “there is no conclusive proof but further investigation of Spider program XYZ is warranted”.
BTW I don’t trust the “reset factory settings” option. After some experimentation I somehow ended up with 12 wins from 24 games, with a longest winning streak of 8 and longest losing streak of 1. I have no idea how I managed to do that, and hence no idea how to replicate the bug. Unfortunately, I didn’t save the screen-shot. At least the software correctly computed my win rate was exactly 50% 😊
My main concern was to do with reproducibility. I wanted to write my paper in such a way that anyone with sufficient aptitude in Spider and writing software (plus enough spare time on their hands) can replicate my results. If they get low p-values then I am onto something. If they get just above 0.05 then I will fall back on my poker analogy – I was forced to call all the way to the river, and I was beat.
FWIW I googled the company (Treecardgames) and there seems to be some negative sentiment going around. A few bad reviews here and there and I won’t be surprised if many companies routinely delete unfavourable reviews.
Master T, allow me to offer my congratulations for your Achievement of Awesomeness along with my apologies for my normal lateness in giving you this back pat.
U da’ man !!! Nicely done, Sir. You ARE awesome.
Concerning the subject at hand I assume my familiar position leagues behind yourself and Esteemed Scholar Bart so I can, as usual, add nothing to the intellectual aspect of this conversation.
But permit me this. Master T you lament the difficulty of being published. You further state that, “The peer review model is hopelessly flawed”. I suspect that we could find this same comment written in the scrolls of Ancient Greece and before. Which is too bad because if we could take the “ego-outta’-it”, peer review could be a powerful and wonderful tool. If you are not acquainted already I offer what I found to be a great short read: “Longitude” by Dava Sobel.
I look forward to joining in the conversation during the next Game On. I have one in depth comment on the current hand…..Hope springs eternal in the human breast.