Mano has recently mentioned a little kerfuffle in the online chess community involving an American International Master Levy Rozman and an Indonesian cheater Dadang Subur, who was banned o chess.com shortly after a match between the two raised the suspicion of Levy Rozman and he (and possibly a lot of people who watch his twitch streams) reported him as a suspected cheater. Chess.com evaluated the situation and banned the suspected cheater, thus turning him from suspected cheater to confirmed cheater.
Chess.com guards the tools they are using to evaluate whether someone is cheating or not pretty closely so cheaters cannot learn how to circumvent them, which is understandable. It is also a bit annoying for someone who likes to make statistical analyses of their own, like me. I cannot know the tools they use, neither do I have the access to their data, but that does not stop me from speculating. And today I would like to share one of those speculations on the off-chance that there are more people who like this kind of stuff around here.
In the comment section at Mano’s, I have speculated a bit:
They have probably several criteria to look at, and here is my guess at what they are:
1. The time between moves. Experienced players can play memorized opening moves within a fraction of a second. If someone consistently has a high rating and takes a long time to make beginning moves, it is an indicator of engine use.
2. Distribution of times the moves take during a game. I have not made a proper analysis, but my guess based on looking at my own games would be that they should conform to a Weibull distribution.
3. The length of winning-losing streaks. These should probably be pretty randomly long. Consistent patterns of extremely long winning streaks and no losing streaks are a bit suspicious.
4. The win/loss ratio. The site does a fairly good job at pairing people of similar strength, so it should be about 50/50. Even when your ELO is going up. I have gained 300 ELO over half a year and I do have circa 50/50 win to lose ratio.
5. Game accuracy and consistency. It is possible, even for weak players like me, to get accuracy over 90%, or even an occasional perfect game without mistakes and blunders. But a streak of twenty nearly flawless games is unlikely, even for titled players.
6. Rating growth speed. Titled players can send in their certificate and they get assigned rating accordingly, they do not need to start at the basic rating like everyone else. For an untitled player, the faster they gain rating, the more suspicious it is.
From all these, points 1 and 2 are relatively easy to check with just a few games, so I did that. I have downloaded ten of my games, ten games from Magnus Carlsen, and twenty games from one cheater whom I have recently played. My and cheater’s games were all 10 minutes games with no time increment, Magnus Carlsen’s games were, unfortunately, ten and fifteen minutes games with 2 seconds increments, so I had to cut those at fifty moves. But for the purpose of this demonstration, it is sufficient. And why twenty games from the cheater? Because he was an intermittent cheater. He had long winning streaks of nearly perfect games and then long losing streaks of crappy ones with one occasional win by the skin of his teeth. And while it is easy to get a long losing streak of crappy games (I should know), getting one long streak of nearly perfect wins is not very plausible – unless you are Magnus Carlsen, that is.
So the first picture that I would like to share is a so-called dotplot of move times in these games.
On the x-axis are the times in seconds and each dot represents up to ten moves. With the most simple of statistical analyses, the so-called “Lookandsee analysis” one can already see some discrepancies. Both the world champion and I have a very similar distribution of times, with most times being in the range up to ten seconds, with the peak at the category 0 seconds (moves shorter than 1 sec). For the cheater, who had about the same ELO as me, it is different in both his OK games and his fraudulent ones.
In his OK games, he too made a lot of moves in fifteen seconds or less, but he was much slower, with a peak at five seconds category. That indicates the cheater was taking a lot more time than he should even for easy moves, as befits someone who is currently trying to punch way above his weight class.
In his fraudulent games, this becomes even more profound. Almost no moves are made faster than five seconds (and those are usually the first moves of the game) and most take between ten to fifteen seconds.
If the moves were adhering to a normal distribution, there would be a number of easy-to-make visualization tools and statistical tests available. Alas, they do not. I have speculated that they will have Weibull distribution, which was speculation based on the fact that they have a lower limit (0 seconds) and an upper limit (duration of the game, also 10 minutes). As it turns out, Lognormal distribution is even better fit, although Weibull did fit occasionally too.
In a probability plot, if the fit is good the dots should be distributed along with the straight diagonal line and between the curved lines of the same color, which they mostly, although not perfectly, are.
You might say that AD (Anderson-Darling) values say otherwise, and they do, they are a bit high. The p-value also is too low for a good fit for those tests where it could be calculated. But that is in part a problem with these statistical tests, which generally do not work very well with grainy data. And here we have all times rounded to 0.1 seconds, so it is very grainy at the lower end, where, coincidentally, most of the data is. I could transform the data, but it was a lot of work as it is and I am sure I am losing some readers already. So take my word for it that both Lognormal and Weibull distributions are reasonable approximations.
So as a last picture, let us look at a histogram with an overlaid best-fit lognormal curve.
I am sure that chess.com has software solutions to dig through the data of suspected cheaters and to dredge up comparisons similar to these for all the points that I have mentioned. There probably are some correlations between move time and its quality with regard to the situation on the board etc.
All in all, I do believe that when someone is banned on chess.com for cheating that they were indeed cheating. And there are things that cheaters will probably never be able to fool. The example here is, I think, one of them.
In order to cheat, either the cheater or their assistant must go through the loop of inputting the moves into a computer, waiting for the algorithm to spit out the answer and then inputting the answer to the game. This inevitably prolongs the time. So to keep the move times consistent with those of an honest player might be the most difficult, if not impossible, hurdle for these scumbags.
Andrew Dalke says
Years ago I was tangentially involved with the Rybka/Fruit controversy. Not as a chess player, but because I wanted to see how one might determine that a binary was derived from source code in a way that infringed on copyright. (My analysis was that the reverse-engineered code which claimed to show infringement in Rybka instead showed non-infringement.)
One of the techniques I learned about, for identifying if two chess programs were functionally similar, was to compare what they would do in similar situations.
That could be applied to identifying a cheater: compare a player’s moves with those of the most common chess programs given the same board position. If the different programs give different moves, and the player’s moves consistently match the moves of one of the chess programs, then that suggests the player might be cheating by using that program.
I’m surprised that a cheater can be turned up by something as simple as how long they took for their turns. A lot of this stuff has been studied for decades people working at casinos and people trying to cheat at casinos. Making sure that the time you take to make actions is consistent is well known.
I’m guessing the chess cheaters have not set out at cheating as a planned course of action the same way casino cheaters do. Casino cheaters study previous methods and how they were caught to avoid the same problems. The chess cheaters probably just sort of stumbled into the obvious methods. They are playing somewhat competitively, they have a big game coming up and the temptation to keep a program open to the side is just too much.
Marcus Ranum says
What if the cheater’s assistant input the current move, but also had multiple instances (a good use for the cloud) of the chess assist engine; use those to branch on several assumptions of the opponent’s move, so the cheater would not have to delay while the assist engine came up with an answer? This is basic optimistic branch computation, such as superscalar/superpipelined CPUs sometimes do.
@Andrew Dalke, that is done too. For example, when IM Anna Rudolf was falsely accused of cheating in over-the-board games, she was cleared in part by showing that her moves differ from those proposed by PC engines.
@JM Checking the turn time only works in rapid games online, of course, it does not work in over-the-board tournaments of classical chess, where cheating techniques are different.
@Marcus Ranum. Most cheaters on Chess.com are not professionals, they are just noobs who thought they are much more clever than they actually are. Although even grandmasters sometimes get caught. Anyhoo, what you propose might shave off some time off the loop, but it would still take several fractions of a second longer to make each move than if the player were doing the proper thing of calculating it in their head. The time it takes to communicate the move back to the player can be reduced, but never to actual zero. Except if someone had programmed an engine to play on its own of course, which would be cheating too. But then they would have to remember to program the engine properly to not make each move with robotical precision and instantly irrespective of difficulty…
Of course, no one criterion is sufficient to convince someone of cheating on its own. There are probably multiple criteria with different weighing.