03. September 2023

High DTZ values and endgame studies

Recently I made the claim that EGTBs can suggest promising starting positions for endgame studies.

I explained: "The most common EGTBs are nowadays known under the name Syzygy. Each material constellation, such as KRN vs KPPP, has two files, and one of them contains distance-to-zero (DTZ) values. Positions with high DTZ values are more promising." (See the section My View in this post.)

(When I speak of starting positions I mean any kind of position where some play starts. For a real study an introduction can be added having positions having with low DTZ values.)

My friend Hagen, who has also been a long-time tester of my Chess Suite, asked for evidence. That's one thing I want to provide in this post. But the real motivation is a worry: If judges do not understand the mechanisms of mining, sooner or later any study with EGTB material could be devalued in any tourney.

If others share this worry, then perhaps at some point it will make sense to test positions for mining indicators, just as anticipations are tested today.

For those less familiar with DTZ values, I'll attempt an introduction first.

DTZ values in detail

DTZ values apply to won and lost positions. They represent the number of plies until the next capture or the next pawn move, assuming that the winning side is trying to minimize and the loosing side is trying to maximize this number.

This rather mathematical formulation can be illustrated as follows. DTZ values are used by chess engines to detect draws by the 50-moves-rule in advance. Assume that in some position both sides have already made 8 moves without a capture or a pawn move. Then the weaker side can force a draw by the 50-moves-rule if and only if the DTZ value of the position is greater or equal to 84. If you don't get these numbers, then note that the 50-move-rule is a 100-ply-rule and 100 - (2x8) = 84.

From now on, for the sake of simplicity, we will assume that White has a guaranteed win in a given position. Then a high DTZ value of that position means that Black cannot avoid a loss but can force a long sequence of moves without any pawn moves or captures. The presence of such a maneuvering phase increases the likelihood that this is a suitable starting position of an endgame study.

But what does increasing the likelihood exactly mean? Or more to the point: Is this claimed increase so large that it is worth examining positions by the size of their DTZ values? That will be answered below.

Before that I would like to formulate a few properties.

Firstly, DTZ values are not strongly related to cooks or duals. DTZ values restrict White's options to non-captures and piece moves, but they do not force uniqueness of White's winning moves.

Secondly, not every study contains a high value DTZ position. If there happens a pawn move or a capture every here and then, then all positions of the study can have moderate or small DTZ values. Personally, I believe that most studies do not contain any position with exceptionally high DTZ value.

Thirdly, capture- and pawn-move-free sequences in a solution provide lower bounds for DTZ values.

If the mainline of a study contains a capture- and pawn-move-free sequence of moves, which is also free of cooks and duals (as it should be anyway), then the DTZ value of the first position in this sequence is greater or equal to the length of the sequence.

How do we get evidence?

I wrote a program especially for this post that can be used to check large sets of positions. It's the first time I've worked with DTZ values programmatically.

The program proceeds the following steps:

Generate a set of positions all having the same material, i.e. all these positions belong to the same Syzygy EGTB.
Determine the frequency of all DTZ values of all generated positions and output some statistics as well as the FENs of positions with high DTZ values and the move sequences starting therewith.

The technical details of the program will be explained at the end. The hardware used can be described as rather outdated.

Now come the examples.

KP vs KP

This endgame has about 3.7 million positions, not counting symmetrical positions. 1,612,481 positions are won for White. Their DTZ values are distributed as follows.

DTZ	0	1	3	5	7	9
#Pos	2	1,488,714	84,643	23,449	10,888	3,492

DTZ	11	13	15	17	19	21
#Pos	947	170	104	53	18	1

The generation took 36 seconds and getting the DTZ values took another 49 seconds.

I had a closer look to the positions with DTZ values 17, 19 and 21. The vast majority contains a pair of blocked pawns and the White king uses opposition to conquer the black pawn. That is interesting textbook material, but not the studies I am looking for. Therefore I excluded all positions with both pawns on the same file.

This left me with 3,33 million positions generated, of which 1,446,475 are won for White. And this is the new DTZ distribution.

DTZ	0	1	3	5	7	9	11	13
#Pos	2	1,352,406	71,889	14,982	5,512	1352	313	9

Note the drastical changes in the upper range of DTZ values!

The Encyclopedia of Chess Endings (ECE), first edition has 19 examples, studies and game fragments that belong to this set. How many of these can we expect among the 322 positions with DTZ=13 and DTZ=11? If they were equally distributed then the chance would be 322 x 19 /1,446,475 = 0.00423. In other words, it would be a sensation to find one at all.

But already among the 9 positions with DTZ=13 we find the start positions of Bianchetti's study of 1925 and Adamson's study of 1915. With DTZ=11 follows the start position of Moravec's study of 1952. And these are only the cases where the exact start positions were found. I saw many variations of a maneuver that appears in a Grigoriev study of 1931 (after an introduction containing pawn moves). And most probably there are more cases like this.

Can we call it a success? Certainly.

One more observation. The solutions of Bianchetti's and Adamson's study contain long sequences of non-zeroing moves, but the solution of Moravec's study does not. The latter is not uncommon (see below).

Single examples

I could try to continue with KPP vs KP in the same way and then maybe even with KPP vs KPP. But the number of positions to be generated is growing too fast. Instead I go the other way around. I choose a study where the material doesn't change for a long time (pawn moves are okay though) and then I create a large but manageable set of positions that includes the study's starting position and see where the DTZ value of the starting position is located in the distribution of the DTZ values. If this value is at the top, we would have found the starting position of the study with the proposed method!

I did this successfully for a number of contemporary studies until I realized that by naming the composers, malicious people could insinuate that the composers found their studies in exactly this way. That's why I'm now showing the examples in anonymized form.

Example 1:

A KPP vs KPP study without passed pawns, all four pawns have space to move.
The solution starts with 9 non-zeroing moves, hence the DTZ value is at least 9.

The generator was restricted to all positions with pawns on the same files as in the study. 655,151 positions were generated.
329,715 are won, 143,324 drawn and 182,112 are lost from White's perspective.
The program needed 9 seconds.

The frequency of the DTZ values of the won positions are as follows:

DTZ 1: 230,928
DTZ 3: 35,136
DTZ 5: 31,708
DTZ 7: 18,068
DTZ 9: 8,132
DTZ 11: 3,099
DTZ 13: 1,333
DTZ 15: 612
DTZ 17: 456
DTZ 19: 201
DTZ 21: 42.

The DTZ value of the study's start position is 19. Would you find this study if you had to examine up to 243 positions? Well, I don't have to. Positions that lead to cooks or duals can be sorted out automatically. So the human work is small. Applying this procedure to the 243 positions with DTZ=21 or DTZ=19, three positions came clearly on top: the start position of the study and two slight modifications of it.

Example 2.

We leave the pawn endgames.

An study with 3 light pieces and one pawn.
The white king was fixed to one square and the pawn had only two squares.
4,87 million positions were generated.
4,680,092 are won, the others are drawn.
The computation time was 23 Minutes.

DTZ 1, 2, 3, ..., 23: 3,770,255, ... , 14,238.
DTZ 25, 27 and 29: 6,822, 3,944 and 2,197
DTZ 31: 830
DTZ 33: 197
DTZ 35: 89
DTZ 37: 8
DTZ 39: 6
DTZ 41: 8
DTZ 43: 1

The DTZ of the study's start position is 35, but the solution's mainline guarantees only DTZ >= 19.

Definitely another easy find!

Example 3

And what about my own studies? Well, I don't have more than a handful of studies, and yet there's one that could be found in minutes! More precisely, it's not the starting position, but the position after Black's second move. Luckily it's one of the knight tour studies, which are more for entertainment anyway.

Example 4

There are material constellations which are less suitable for the approach. Consider KQ vs KRP, where the black pawn is on c7. In this situation the material already determines the character of play: In order to win, White has to prevent Black from building a fortress. Black is not interested in moving the pawn, and any capture ends the play. So all moves count for the DTZ.

With this material the DTZ values are as large as 151. But since the queen has so many options, cooks duals are extremely frequent, too. I've had a 5x lower success rate here, but it was still worth it.

Special Offer

If you have your own study that you would like to have checked, just write to me.

The technical side of the program

The program was written in a single day. But I did not start from scratch. As part of my endgame research, I wrote several position generators years ago. So I can use existing software here.

Getting the millions of DTZ values from the EGTBs within a program was the real problem for me and I had not done this before. When I posted the basic idea, I wrote:

"You don't even have to be a particular smart programmer to do all this, since access libraries to Syzygy EGTBs exist for different programming languages and are freely available."

I decided to use the Syzygy Bridge by Laurens Winkelhagen, which gives me access in Java.

I needed about 3 hours to get the bridge running within my program. The bridge is a few years old and I learned the hard way that it does not support 7 men EGTBs. A very similar alternative can be found within the Bagatur engine software. Syzygy support is relatively new there and 7 men EGTBs access works. I got it running too, but some minor differences in the interface let me stick with the Syzygy Bridge.

Then I needed two hours for data conversion (generator to bridge) and for writing statistics, FENs and the maximal DTZ sequences as PGN game fragments. Then testing examples could begun.

For online access to Syzygy EGTBs of individual positions, there is the page https://syzygy-tables.info, which also offers a download of a maximal DTZ sequence as PGN starting for the current position. I noticed that the values supplied by the Syzygy Bridge and this page sometimes differ by one.

Furthermore, this page also offers extremal examples for every EGTB. On the subpage https://syzygy-tables.info/endgames you can even find a PGN database with all the examples. There are some mistakes like illegal positions and switched results, but you might find it interesting to browse to some of these fragments.

12th April 2024 (published first as a blog post)

Cursed DTZ

My friend Hagen follows my steps closely and occasionally sends feedback. When I showed him my study for the Bilokin 85 MT, he showed little interest. I was all the more surprised when Hagen sent me an email about the tourney's award. He did not enjoy it and gave me a number of examples, but only the very last one is of interest here.

"What is a study worth if the final position has a DTZ value of 300+? This is absolute crap! Why doesn't the judge throw out this nonsense? In addition, this study is also incorrect because Black cooperates so that White has unique moves. Your *friend* LMG is making fun of you all.

By the way, I noticed on this occasion that your DTZ article contains some errors."

I'll leave the first sentences to others. This post is solely on the last sentence.

Strictly speaking, Hagen is correct in one point. I simplified my explanation in the earlier article and did not foresee that the extreme cases I left out could be relevant for endgame studies. Let's see the details.

The whole truth about DTZ values

DTZ values were primarily made for chess engines. And chess engines can be configured to respect or to ignore the 50-moves rule. Some basics are given in the "About" paragraph here. And if you follow the very first link you will get some detailed information on the meaning of the values which I discuss now.

You will see that a DTZ value n with 100 >= n >= 1 means the position is winning, and a zeroing move or checkmate can be forced in n or n + 1 half-moves. This is close to the definition I gave in the article (the n/n+1 stuff is a small difference, which will be explained below, too (*)). A major difference happens when n > 100:

"A DTZ value n > 100 means the position is winning, but drawn under the 50-move rule. A zeroing move or checkmate can be forced in n or n + 1 half-moves, or in n - 100 or n + 1 - 100 half-moves if a later phase is responsible for the draw."

At that time I ignored this n-100/n+1-100 stuff, because the aim of the article - my program - doesn't deal with such later phases.

Let's look at Hagen's example. This study entered tablebase territory after Black's 9th move. The DTZ value is 127. This tells us in particular that it is a cursed win, i.e. the position is winning but a draw according to the 50-moves rule.

Fifteen moves later White captures a pawn and the study ends. The final position has a DTZ value of 386! And it is the next phase that causes this number. This means that both sides have to make about (386-100)/2 = 143 moves until checkmate or a capture occurs (there are no pawns). So one could rightly say that this study ended a bit early.

On the other hand, it is a RB vs NN ending, which is known for this feature ever since the corresponding tablebases were created. Crap? Nonsense? You decide!

I seize the opportunity to point out another problem: Not all Syzygy tablebases are the same (*).

This is another information from the website mentioned above. Some tablebases contain only rounded numbers of moves and not the exact numbers of plies. So, strictly speaking, there are different DTZ values. This rounding is probably the reason for the phrasing "n or n+1" and also a possible explanation for the occasional differences that I mentioned already at the end of my article above.