High DTZ values and endgame studies

Recently I made the claim that EGTBs can suggest promising starting positions for endgame studies.

I explained: "The most common EGTBs are nowadays known under the name Syzygy. Each material constellation, such as KRN vs KPPP, has two files, and one of them contains distance-to-zero (DTZ) values. Positions with high DTZ values are more promising." (See the section My View in this post.)

(When I speak of starting positions I mean any kind of position where some play starts. For a real study an introduction can be added having positions having with low DTZ values.)

My friend Hagen, who has also been a long-time tester of my Chess Suite, asked for evidence. That's one thing I want to provide in this post. But the real motivation is a worry: If judges do not understand the mechanisms of mining, sooner or later any study with EGTB material could be devalued in any tourney.

If others share this worry, then perhaps at some point it will make sense to test positions for mining indicators, just as anticipations are tested today.

For those less familiar with DTZ values, I'll attempt an introduction first.

DTZ values in detail

DTZ values apply to won and lost positions. They represent the number of plies until the next capture or the next pawn move, assuming that the winning side is trying to minimize and the loosing side is trying to maximize this number.

This rather mathematical formulation can be illustrated as follows. DTZ values are used by chess engines to detect draws by the 50-moves-rule in advance. Assume that in some position both sides have already made 8 moves without a capture or a pawn move. Then the weaker side can force a draw by the 50-moves-rule if and only if the DTZ value of the position is greater or equal to 84. If you don't get these numbers, then note that the 50-move-rule is a 100-ply-rule and 100 - (2x8) = 84.

From now on, for the sake of simplicity, we will assume that White has a guaranteed win in a given position. Then a high DTZ value of that position means that Black cannot avoid a loss but can force a long sequence of moves without any pawn moves or captures. The presence of such a maneuvering phase increases the likelihood that this is a suitable starting position of an endgame study.

But what does increasing the likelihood exactly mean? Or more to the point: Is this claimed increase so large that it is worth examining positions by the size of their DTZ values? That will be answered below.

Before that I would like to formulate a few properties.

Firstly, DTZ values are not strongly related to cooks or duals. DTZ values restrict White's options to non-captures and piece moves, but they do not force uniqueness of White's winning moves.

Secondly, not every study contains a high value DTZ position. If there happens a pawn move or a capture every here and then, then all positions of the study can have moderate or small DTZ values. Personally, I believe that most studies do not contain any position with exceptionally high DTZ value.

Thirdly, capture- and pawn-move-free sequences in a solution provide lower bounds for DTZ values.

If the mainline of a study contains a capture- and pawn-move-free sequence of moves, which is also free of cooks and duals (as it should be anyway), then the DTZ value of the first position in this sequence is greater or equal to the length of the sequence.

How do we get evidence?

I wrote a program especially for this post that can be used to check large sets of positions. It's the first time I've worked with DTZ values programmatically.

The program proceeds the following steps:

Generate a set of positions all having the same material, i.e. all these positions belong to the same Syzygy EGTB.
Determine the frequency of all DTZ values of all generated positions and output some statistics as well as the FENs of positions with high DTZ values and the move sequences starting therewith.

The technical details of the program will be explained at the end. The hardware used can be described as rather outdated.

Now come the examples.

KP vs KP

This endgame has about 3.7 million positions, not counting symmetrical positions. 1,612,481 positions are won for White. Their DTZ values are distributed as follows.

DTZ	0	1	3	5	7	9
#Pos	2	1,488,714	84,643	23,449	10,888	3,492

DTZ	11	13	15	17	19	21
#Pos	947	170	104	53	18	1

The generation took 36 seconds and getting the DTZ values took another 49 seconds.

I had a closer look to the positions with DTZ values 17, 19 and 21. The vast majority contains a pair of blocked pawns and the White king uses opposition to conquer the black pawn. That is interesting textbook material, but not the studies I am looking for. Therefore I excluded all positions with both pawns on the same file.

This left me with 3,33 million positions generated, of which 1,446,475 are won for White. And this is the new DTZ distribution.

DTZ	0	1	3	5	7	9	11	13
#Pos	2	1,352,406	71,889	14,982	5,512	1352	313	9

Note the drastical changes in the upper range of DTZ values!

The Encyclopedia of Chess Endings (ECE), first edition has 19 examples, studies and game fragments that belong to this set. How many of these can we expect among the 322 positions with DTZ=13 and DTZ=11? If they were equally distributed then the chance would be 322 x 19 /1,446,475 = 0.00423. In other words, it would be a sensation to find one at all.

But already among the 9 positions with DTZ=13 we find the start positions of Bianchetti's study of 1925 and Adamson's study of 1915. With DTZ=11 follows the start position of Moravec's study of 1952. And these are only the cases where the exact start positions were found. I saw many variations of a maneuver that appears in a Grigoriev study of 1931 (after an introduction containing pawn moves). And most probably there are more cases like this.

Can we call it a success? Certainly.

One more observation. The solutions of Bianchetti's and Adamson's study contain long sequences of non-zeroing moves, but the solution of Moravec's study does not. The latter is not uncommon (see below).

Single examples

I could try to continue with KPP vs KP in the same way and then maybe even with KPP vs KPP. But the number of positions to be generated is growing too fast. Instead I go the other way around. I choose a study where the material doesn't change for a long time (pawn moves are okay though) and then I create a large but manageable set of positions that includes the study's starting position and see where the DTZ value of the starting position is located in the distribution of the DTZ values. If this value is at the top, we would have found the starting position of the study with the proposed method!

I did this successfully for a number of contemporary studies until I realized that by naming the composers, malicious people could insinuate that the composers found their studies in exactly this way. That's why I'm now showing the examples in anonymized form.

Example 1:

A KPP vs KPP study without passed pawns, all four pawns have space to move.
The solution starts with 9 non-zeroing moves, hence the DTZ value is at least 9.

The generator was restricted to all positions with pawns on the same files as in the study. 655,151 positions were generated.
329,715 are won, 143,324 drawn and 182,112 are lost from White's perspective.
The program needed 9 seconds.

The frequency of the DTZ values of the won positions are as follows:

DTZ 1: 230,928
DTZ 3: 35,136
DTZ 5: 31,708
DTZ 7: 18,068
DTZ 9: 8,132
DTZ 11: 3,099
DTZ 13: 1,333
DTZ 15: 612
DTZ 17: 456
DTZ 19: 201
DTZ 21: 42.

The DTZ value of the study's start position is 19. Would you find this study if you had to examine up to 243 positions? Well, I don't have to. Positions that lead to cooks or duals can be sorted out automatically. So the human work is small. Applying this procedure to the 243 positions with DTZ=21 or DTZ=19, three positions came clearly on top: the start position of the study and two slight modifications of it.

Example 2.

We leave the pawn endgames.

An study with 3 light pieces and one pawn.
The white king was fixed to one square and the pawn had only two squares.
4,87 million positions were generated.
4,680,092 are won, the others are drawn.
The computation time was 23 Minutes.

DTZ 1, 2, 3, ..., 23: 3,770,255, ... , 14,238.
DTZ 25, 27 and 29: 6,822, 3,944 and 2,197
DTZ 31: 830
DTZ 33: 197
DTZ 35: 89
DTZ 37: 8
DTZ 39: 6
DTZ 41: 8
DTZ 43: 1

The DTZ of the study's start position is 35, but the solution's mainline guarantees only DTZ >= 19.

Definitely another easy find!

Example 3

And what about my own studies? Well, I don't have more than a handful of studies, and yet there's one that could be found in minutes! More precisely, it's not the starting position, but the position after Black's second move. Luckily it's one of the knight tour studies, which are more for entertainment anyway.

Example 4

There are material constellations which are less suitable for the approach. Consider KQ vs KRP, where the black pawn is on c7. In this situation the material already determines the character of play: In order to win, White has to prevent Black from building a fortress. Black is not interested in moving the pawn, and any capture ends the play. So all moves count for the DTZ.

With this material the DTZ values are as large as 151. But since the queen has so many options, cooks duals are extremely frequent, too. I've had a 5x lower success rate here, but it was still worth it.

Special Offer

If you have your own study that you would like to have checked, just write to me.

The technical side of the program

The program was written in a single day. But I did not start from scratch. As part of my endgame research, I wrote several position generators years ago. So I can use existing software here.

Getting the millions of DTZ values from the EGTBs within a program was the real problem for me and I had not done this before. When I posted the basic idea, I wrote:

"You don't even have to be a particular smart programmer to do all this, since access libraries to Syzygy EGTBs exist for different programming languages and are freely available."

I decided to use the Syzygy Bridge by Laurens Winkelhagen, which gives me access in Java.

I needed about 3 hours to get the bridge running within my program. The bridge is a few years old and I learned the hard way that it does not support 7 men EGTBs. A very similar alternative can be found within the Bagatur engine software. Syzygy support is relatively new there and 7 men EGTBs access works. I got it running too, but some minor differences in the interface let me stick with the Syzygy Bridge. (Addendum December 3, 2023: There are some problems with the Bagatur Syzygy support, too. That's why I ended up building my own bridge on December 1, 2023.)

Then I needed two hours for data conversion (generator to bridge) and for writing statistics, FENs and the maximal DTZ sequences as PGN game fragments. Then testing examples could begun.

For online access to Syzygy EGTBs of individual positions, there is the page https://syzygy-tables.info, which also offers a download of a maximal DTZ sequence as PGN starting for the current position. I noticed that the values supplied by the Syzygy Bridge and this page sometimes differ by one.

Furthermore, this page also offers extremal examples for every EGTB. On the subpage https://syzygy-tables.info/endgames you can even find a PGN database with all the examples. There are some mistakes like illegal positions and switched results, but you might find it interesting to browse to some of these fragments.