2008-08-31 23:59:13 -06:00
|
|
|
/*
|
2008-10-19 10:56:28 -06:00
|
|
|
Stockfish, a UCI chess playing engine derived from Glaurung 2.1
|
Add NNUE evaluation
This patch ports the efficiently updatable neural network (NNUE) evaluation to Stockfish.
Both the NNUE and the classical evaluations are available, and can be used to
assign a value to a position that is later used in alpha-beta (PVS) search to find the
best move. The classical evaluation computes this value as a function of various chess
concepts, handcrafted by experts, tested and tuned using fishtest. The NNUE evaluation
computes this value with a neural network based on basic inputs. The network is optimized
and trained on the evalutions of millions of positions at moderate search depth.
The NNUE evaluation was first introduced in shogi, and ported to Stockfish afterward.
It can be evaluated efficiently on CPUs, and exploits the fact that only parts
of the neural network need to be updated after a typical chess move.
[The nodchip repository](https://github.com/nodchip/Stockfish) provides additional
tools to train and develop the NNUE networks.
This patch is the result of contributions of various authors, from various communities,
including: nodchip, ynasu87, yaneurao (initial port and NNUE authors), domschl, FireFather,
rqs, xXH4CKST3RXx, tttak, zz4032, joergoster, mstembera, nguyenpham, erbsenzaehler,
dorzechowski, and vondele.
This new evaluation needed various changes to fishtest and the corresponding infrastructure,
for which tomtor, ppigazzini, noobpwnftw, daylen, and vondele are gratefully acknowledged.
The first networks have been provided by gekkehenker and sergiovieri, with the latter
net (nn-97f742aaefcd.nnue) being the current default.
The evaluation function can be selected at run time with the `Use NNUE` (true/false) UCI option,
provided the `EvalFile` option points the the network file (depending on the GUI, with full path).
The performance of the NNUE evaluation relative to the classical evaluation depends somewhat on
the hardware, and is expected to improve quickly, but is currently on > 80 Elo on fishtest:
60000 @ 10+0.1 th 1
https://tests.stockfishchess.org/tests/view/5f28fe6ea5abc164f05e4c4c
ELO: 92.77 +-2.1 (95%) LOS: 100.0%
Total: 60000 W: 24193 L: 8543 D: 27264
Ptnml(0-2): 609, 3850, 9708, 10948, 4885
40000 @ 20+0.2 th 8
https://tests.stockfishchess.org/tests/view/5f290229a5abc164f05e4c58
ELO: 89.47 +-2.0 (95%) LOS: 100.0%
Total: 40000 W: 12756 L: 2677 D: 24567
Ptnml(0-2): 74, 1583, 8550, 7776, 2017
At the same time, the impact on the classical evaluation remains minimal, causing no significant
regression:
sprt @ 10+0.1 th 1
https://tests.stockfishchess.org/tests/view/5f2906a2a5abc164f05e4c5b
LLR: 2.94 (-2.94,2.94) {-6.00,-4.00}
Total: 34936 W: 6502 L: 6825 D: 21609
Ptnml(0-2): 571, 4082, 8434, 3861, 520
sprt @ 60+0.6 th 1
https://tests.stockfishchess.org/tests/view/5f2906cfa5abc164f05e4c5d
LLR: 2.93 (-2.94,2.94) {-6.00,-4.00}
Total: 10088 W: 1232 L: 1265 D: 7591
Ptnml(0-2): 49, 914, 3170, 843, 68
The needed networks can be found at https://tests.stockfishchess.org/nns
It is recommended to use the default one as indicated by the `EvalFile` UCI option.
Guidelines for testing new nets can be found at
https://github.com/glinscott/fishtest/wiki/Creating-my-first-test#nnue-net-tests
Integration has been discussed in various issues:
https://github.com/official-stockfish/Stockfish/issues/2823
https://github.com/official-stockfish/Stockfish/issues/2728
The integration branch will be closed after the merge:
https://github.com/official-stockfish/Stockfish/pull/2825
https://github.com/official-stockfish/Stockfish/tree/nnue-player-wip
closes https://github.com/official-stockfish/Stockfish/pull/2912
This will be an exciting time for computer chess, looking forward to seeing the evolution of
this approach.
Bench: 4746616
2020-08-05 09:11:15 -06:00
|
|
|
Copyright (C) 2004-2020 The Stockfish developers (see AUTHORS file)
|
2008-08-31 23:59:13 -06:00
|
|
|
|
2008-10-19 10:56:28 -06:00
|
|
|
Stockfish is free software: you can redistribute it and/or modify
|
2008-08-31 23:59:13 -06:00
|
|
|
it under the terms of the GNU General Public License as published by
|
|
|
|
the Free Software Foundation, either version 3 of the License, or
|
|
|
|
(at your option) any later version.
|
2008-09-06 04:30:07 -06:00
|
|
|
|
2008-10-19 10:56:28 -06:00
|
|
|
Stockfish is distributed in the hope that it will be useful,
|
2008-08-31 23:59:13 -06:00
|
|
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
|
|
GNU General Public License for more details.
|
2008-09-06 04:30:07 -06:00
|
|
|
|
2008-08-31 23:59:13 -06:00
|
|
|
You should have received a copy of the GNU General Public License
|
|
|
|
along with this program. If not, see <http://www.gnu.org/licenses/>.
|
|
|
|
*/
|
|
|
|
|
2014-12-30 02:31:50 -07:00
|
|
|
#include <cstring> // For std::memset
|
2011-01-07 05:00:25 -07:00
|
|
|
#include <iostream>
|
2018-05-14 11:52:21 -06:00
|
|
|
#include <thread>
|
2008-08-31 23:59:13 -06:00
|
|
|
|
2012-07-07 12:30:27 -06:00
|
|
|
#include "bitboard.h"
|
2018-05-14 11:52:21 -06:00
|
|
|
#include "misc.h"
|
2018-12-08 15:03:42 -07:00
|
|
|
#include "thread.h"
|
2008-08-31 23:59:13 -06:00
|
|
|
#include "tt.h"
|
2018-05-14 11:52:21 -06:00
|
|
|
#include "uci.h"
|
2008-08-31 23:59:13 -06:00
|
|
|
|
2011-04-15 10:14:12 -06:00
|
|
|
TranspositionTable TT; // Our global transposition table
|
2008-08-31 23:59:13 -06:00
|
|
|
|
2020-06-24 14:19:58 -06:00
|
|
|
/// TTEntry::save() populates the TTEntry with a new node's data, possibly
|
2019-01-01 06:13:08 -07:00
|
|
|
/// overwriting an old position. Update is not atomic and can be racy.
|
|
|
|
|
|
|
|
void TTEntry::save(Key k, Value v, bool pv, Bound b, Depth d, Move m, Value ev) {
|
2018-07-03 16:58:16 -06:00
|
|
|
|
|
|
|
// Preserve any existing move for the same position
|
Use 128 bit multiply for TT index
Remove super cluster stuff from TT and just use a 128 bit multiply.
STC https://tests.stockfishchess.org/tests/view/5ee719b3aae8aec816ab7548
LLR: 2.94 (-2.94,2.94) {-1.50,0.50}
Total: 12736 W: 2502 L: 2333 D: 7901
Ptnml(0-2): 191, 1452, 2944, 1559, 222
LTC https://tests.stockfishchess.org/tests/view/5ee732d1aae8aec816ab7556
LLR: 2.93 (-2.94,2.94) {-1.50,0.50}
Total: 27584 W: 3431 L: 3350 D: 20803
Ptnml(0-2): 173, 2500, 8400, 2511, 208
Scheme back to being derived from https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/
Also the default optimized version of the index calculation now uses fewer instructions.
https://godbolt.org/z/Tktxbv
Might benefit from mulx (requires -mbmi2)
closes https://github.com/official-stockfish/Stockfish/pull/2744
bench: 4320954
2020-06-15 00:35:07 -06:00
|
|
|
if (m || (uint16_t)k != key16)
|
2018-07-03 16:58:16 -06:00
|
|
|
move16 = (uint16_t)m;
|
|
|
|
|
Allow TT entries with key16==0 to be fetched
Fix the issue where a TT entry with key16==0 would always be reported
as a miss. Instead, we'll use depth8 to detect whether the TT entry is
occupied. In order to do that, we'll change DEPTH_OFFSET to -7
(depth8==0) to distinguish between an unoccupied entry and the
otherwise lowest possible depth, i.e., DEPTH_NONE (depth8==1).
To prevent a performance regression, we'll reorder the TT entry fields
by the access order of TranspositionTable::probe(). Memory in general
works fastest when accessed in sequential order. We'll also match the
store order in TTEntry::save() with the entry field order, and
re-order the 'if-or' expressions in TTEntry::save() from the cheapest
to the most expensive.
Finally, as we now have a proper TT entry occupancy test, we'll fix a
minor corner case with hashfull reporting. To reproduce:
- Use a big hash
- Either:
a. Start 31 very quick searches (this wraparounds generation to 0); or
b. Force generation of the first search to 0.
- go depth infinite
Before the fix, hashfull would incorrectly report nearly full hash
immediately after the search start, since
TranspositionTable::hashfull() used to consider only the entry
generation and not whether the entry was actually occupied.
STC:
LLR: 2.95 (-2.94,2.94) {-0.25,1.25}
Total: 36848 W: 4091 L: 3898 D: 28859
Ptnml(0-2): 158, 2996, 11972, 3091, 207
https://tests.stockfishchess.org/tests/view/5f3f98d5dc02a01a0c2881f7
LTC:
LLR: 2.95 (-2.94,2.94) {0.25,1.25}
Total: 32280 W: 1828 L: 1653 D: 28799
Ptnml(0-2): 34, 1428, 13051, 1583, 44
https://tests.stockfishchess.org/tests/view/5f3fe77a87a5c3c63d8f5332
closes https://github.com/official-stockfish/Stockfish/pull/3048
Bench: 3760677
2020-08-21 03:12:39 -06:00
|
|
|
// Overwrite less valuable entries (cheapest checks first)
|
|
|
|
if (b == BOUND_EXACT
|
|
|
|
|| (uint16_t)k != key16
|
|
|
|
|| d - DEPTH_OFFSET > depth8 - 4)
|
2018-07-03 16:58:16 -06:00
|
|
|
{
|
Allow TT entries with key16==0 to be fetched
Fix the issue where a TT entry with key16==0 would always be reported
as a miss. Instead, we'll use depth8 to detect whether the TT entry is
occupied. In order to do that, we'll change DEPTH_OFFSET to -7
(depth8==0) to distinguish between an unoccupied entry and the
otherwise lowest possible depth, i.e., DEPTH_NONE (depth8==1).
To prevent a performance regression, we'll reorder the TT entry fields
by the access order of TranspositionTable::probe(). Memory in general
works fastest when accessed in sequential order. We'll also match the
store order in TTEntry::save() with the entry field order, and
re-order the 'if-or' expressions in TTEntry::save() from the cheapest
to the most expensive.
Finally, as we now have a proper TT entry occupancy test, we'll fix a
minor corner case with hashfull reporting. To reproduce:
- Use a big hash
- Either:
a. Start 31 very quick searches (this wraparounds generation to 0); or
b. Force generation of the first search to 0.
- go depth infinite
Before the fix, hashfull would incorrectly report nearly full hash
immediately after the search start, since
TranspositionTable::hashfull() used to consider only the entry
generation and not whether the entry was actually occupied.
STC:
LLR: 2.95 (-2.94,2.94) {-0.25,1.25}
Total: 36848 W: 4091 L: 3898 D: 28859
Ptnml(0-2): 158, 2996, 11972, 3091, 207
https://tests.stockfishchess.org/tests/view/5f3f98d5dc02a01a0c2881f7
LTC:
LLR: 2.95 (-2.94,2.94) {0.25,1.25}
Total: 32280 W: 1828 L: 1653 D: 28799
Ptnml(0-2): 34, 1428, 13051, 1583, 44
https://tests.stockfishchess.org/tests/view/5f3fe77a87a5c3c63d8f5332
closes https://github.com/official-stockfish/Stockfish/pull/3048
Bench: 3760677
2020-08-21 03:12:39 -06:00
|
|
|
assert(d > DEPTH_OFFSET);
|
|
|
|
assert(d < 256 + DEPTH_OFFSET);
|
2019-05-02 11:36:25 -06:00
|
|
|
|
Use 128 bit multiply for TT index
Remove super cluster stuff from TT and just use a 128 bit multiply.
STC https://tests.stockfishchess.org/tests/view/5ee719b3aae8aec816ab7548
LLR: 2.94 (-2.94,2.94) {-1.50,0.50}
Total: 12736 W: 2502 L: 2333 D: 7901
Ptnml(0-2): 191, 1452, 2944, 1559, 222
LTC https://tests.stockfishchess.org/tests/view/5ee732d1aae8aec816ab7556
LLR: 2.93 (-2.94,2.94) {-1.50,0.50}
Total: 27584 W: 3431 L: 3350 D: 20803
Ptnml(0-2): 173, 2500, 8400, 2511, 208
Scheme back to being derived from https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/
Also the default optimized version of the index calculation now uses fewer instructions.
https://godbolt.org/z/Tktxbv
Might benefit from mulx (requires -mbmi2)
closes https://github.com/official-stockfish/Stockfish/pull/2744
bench: 4320954
2020-06-15 00:35:07 -06:00
|
|
|
key16 = (uint16_t)k;
|
Allow TT entries with key16==0 to be fetched
Fix the issue where a TT entry with key16==0 would always be reported
as a miss. Instead, we'll use depth8 to detect whether the TT entry is
occupied. In order to do that, we'll change DEPTH_OFFSET to -7
(depth8==0) to distinguish between an unoccupied entry and the
otherwise lowest possible depth, i.e., DEPTH_NONE (depth8==1).
To prevent a performance regression, we'll reorder the TT entry fields
by the access order of TranspositionTable::probe(). Memory in general
works fastest when accessed in sequential order. We'll also match the
store order in TTEntry::save() with the entry field order, and
re-order the 'if-or' expressions in TTEntry::save() from the cheapest
to the most expensive.
Finally, as we now have a proper TT entry occupancy test, we'll fix a
minor corner case with hashfull reporting. To reproduce:
- Use a big hash
- Either:
a. Start 31 very quick searches (this wraparounds generation to 0); or
b. Force generation of the first search to 0.
- go depth infinite
Before the fix, hashfull would incorrectly report nearly full hash
immediately after the search start, since
TranspositionTable::hashfull() used to consider only the entry
generation and not whether the entry was actually occupied.
STC:
LLR: 2.95 (-2.94,2.94) {-0.25,1.25}
Total: 36848 W: 4091 L: 3898 D: 28859
Ptnml(0-2): 158, 2996, 11972, 3091, 207
https://tests.stockfishchess.org/tests/view/5f3f98d5dc02a01a0c2881f7
LTC:
LLR: 2.95 (-2.94,2.94) {0.25,1.25}
Total: 32280 W: 1828 L: 1653 D: 28799
Ptnml(0-2): 34, 1428, 13051, 1583, 44
https://tests.stockfishchess.org/tests/view/5f3fe77a87a5c3c63d8f5332
closes https://github.com/official-stockfish/Stockfish/pull/3048
Bench: 3760677
2020-08-21 03:12:39 -06:00
|
|
|
depth8 = (uint8_t)(d - DEPTH_OFFSET);
|
|
|
|
genBound8 = (uint8_t)(TT.generation8 | uint8_t(pv) << 2 | b);
|
2018-07-03 16:58:16 -06:00
|
|
|
value16 = (int16_t)v;
|
|
|
|
eval16 = (int16_t)ev;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2008-08-31 23:59:13 -06:00
|
|
|
|
2014-02-09 09:31:45 -07:00
|
|
|
/// TranspositionTable::resize() sets the size of the transposition table,
|
2013-02-09 00:17:03 -07:00
|
|
|
/// measured in megabytes. Transposition table consists of a power of 2 number
|
2015-01-17 14:15:15 -07:00
|
|
|
/// of clusters and each cluster consists of ClusterSize number of TTEntry.
|
2008-08-31 23:59:13 -06:00
|
|
|
|
2014-07-01 04:13:20 -06:00
|
|
|
void TranspositionTable::resize(size_t mbSize) {
|
2008-08-31 23:59:13 -06:00
|
|
|
|
2018-12-08 15:03:42 -07:00
|
|
|
Threads.main()->wait_for_search_finished();
|
|
|
|
|
Add large page support for NNUE weights and simplify TT mem management
Use TT memory functions to allocate memory for the NNUE weights. This
should provide a small speed-up on systems where large pages are not
automatically used, including Windows and some Linux distributions.
Further, since we now have a wrapper for std::aligned_alloc(), we can
simplify the TT memory management a bit:
- We no longer need to store separate pointers to the hash table and
its underlying memory allocation.
- We also get to merge the Linux-specific and default implementations
of aligned_ttmem_alloc().
Finally, we'll enable the VirtualAlloc code path with large page
support also for Win32.
STC: https://tests.stockfishchess.org/tests/view/5f66595823a84a47b9036fba
LLR: 2.94 (-2.94,2.94) {-0.25,1.25}
Total: 14896 W: 1854 L: 1686 D: 11356
Ptnml(0-2): 65, 1224, 4742, 1312, 105
closes https://github.com/official-stockfish/Stockfish/pull/3081
No functional change.
2020-08-30 10:41:30 -06:00
|
|
|
aligned_large_pages_free(table);
|
2013-04-30 00:08:54 -06:00
|
|
|
|
Use 128 bit multiply for TT index
Remove super cluster stuff from TT and just use a 128 bit multiply.
STC https://tests.stockfishchess.org/tests/view/5ee719b3aae8aec816ab7548
LLR: 2.94 (-2.94,2.94) {-1.50,0.50}
Total: 12736 W: 2502 L: 2333 D: 7901
Ptnml(0-2): 191, 1452, 2944, 1559, 222
LTC https://tests.stockfishchess.org/tests/view/5ee732d1aae8aec816ab7556
LLR: 2.93 (-2.94,2.94) {-1.50,0.50}
Total: 27584 W: 3431 L: 3350 D: 20803
Ptnml(0-2): 173, 2500, 8400, 2511, 208
Scheme back to being derived from https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/
Also the default optimized version of the index calculation now uses fewer instructions.
https://godbolt.org/z/Tktxbv
Might benefit from mulx (requires -mbmi2)
closes https://github.com/official-stockfish/Stockfish/pull/2744
bench: 4320954
2020-06-15 00:35:07 -06:00
|
|
|
clusterCount = mbSize * 1024 * 1024 / sizeof(Cluster);
|
Add large page support for NNUE weights and simplify TT mem management
Use TT memory functions to allocate memory for the NNUE weights. This
should provide a small speed-up on systems where large pages are not
automatically used, including Windows and some Linux distributions.
Further, since we now have a wrapper for std::aligned_alloc(), we can
simplify the TT memory management a bit:
- We no longer need to store separate pointers to the hash table and
its underlying memory allocation.
- We also get to merge the Linux-specific and default implementations
of aligned_ttmem_alloc().
Finally, we'll enable the VirtualAlloc code path with large page
support also for Win32.
STC: https://tests.stockfishchess.org/tests/view/5f66595823a84a47b9036fba
LLR: 2.94 (-2.94,2.94) {-0.25,1.25}
Total: 14896 W: 1854 L: 1686 D: 11356
Ptnml(0-2): 65, 1224, 4742, 1312, 105
closes https://github.com/official-stockfish/Stockfish/pull/3081
No functional change.
2020-08-30 10:41:30 -06:00
|
|
|
|
|
|
|
table = static_cast<Cluster*>(aligned_large_pages_alloc(clusterCount * sizeof(Cluster)));
|
|
|
|
if (!table)
|
2008-09-06 04:19:29 -06:00
|
|
|
{
|
2011-04-03 02:19:08 -06:00
|
|
|
std::cerr << "Failed to allocate " << mbSize
|
2011-07-02 06:33:06 -06:00
|
|
|
<< "MB for transposition table." << std::endl;
|
2011-04-03 02:19:08 -06:00
|
|
|
exit(EXIT_FAILURE);
|
2008-08-31 23:59:13 -06:00
|
|
|
}
|
2012-07-07 12:30:27 -06:00
|
|
|
|
2018-01-01 02:10:41 -07:00
|
|
|
clear();
|
2008-08-31 23:59:13 -06:00
|
|
|
}
|
|
|
|
|
|
|
|
|
2018-07-03 16:48:20 -06:00
|
|
|
/// TranspositionTable::clear() initializes the entire transposition table to zero,
|
|
|
|
// in a multi-threaded way.
|
2008-08-31 23:59:13 -06:00
|
|
|
|
|
|
|
void TranspositionTable::clear() {
|
2008-09-06 04:19:29 -06:00
|
|
|
|
2018-05-14 11:52:21 -06:00
|
|
|
std::vector<std::thread> threads;
|
2018-07-03 16:48:20 -06:00
|
|
|
|
2018-12-23 08:10:07 -07:00
|
|
|
for (size_t idx = 0; idx < Options["Threads"]; ++idx)
|
2018-05-14 11:52:21 -06:00
|
|
|
{
|
2018-11-22 15:50:03 -07:00
|
|
|
threads.emplace_back([this, idx]() {
|
2018-07-03 16:48:20 -06:00
|
|
|
|
|
|
|
// Thread binding gives faster search on systems with a first-touch policy
|
2018-09-04 05:36:42 -06:00
|
|
|
if (Options["Threads"] > 8)
|
2018-05-14 11:52:21 -06:00
|
|
|
WinProcGroup::bindThisThread(idx);
|
2018-07-03 16:48:20 -06:00
|
|
|
|
|
|
|
// Each thread will zero its part of the hash table
|
2020-05-23 05:26:13 -06:00
|
|
|
const size_t stride = size_t(clusterCount / Options["Threads"]),
|
|
|
|
start = size_t(stride * idx),
|
2018-07-03 16:48:20 -06:00
|
|
|
len = idx != Options["Threads"] - 1 ?
|
|
|
|
stride : clusterCount - start;
|
|
|
|
|
2018-05-14 11:52:21 -06:00
|
|
|
std::memset(&table[start], 0, len * sizeof(Cluster));
|
2018-11-22 15:50:03 -07:00
|
|
|
});
|
2018-05-14 11:52:21 -06:00
|
|
|
}
|
2008-08-31 23:59:13 -06:00
|
|
|
|
2020-05-23 05:26:13 -06:00
|
|
|
for (std::thread& th : threads)
|
2018-05-14 11:52:21 -06:00
|
|
|
th.join();
|
|
|
|
}
|
2008-08-31 23:59:13 -06:00
|
|
|
|
2020-06-24 14:19:58 -06:00
|
|
|
|
2014-12-13 01:27:39 -07:00
|
|
|
/// TranspositionTable::probe() looks up the current position in the transposition
|
|
|
|
/// table. It returns true and a pointer to the TTEntry if the position is found.
|
|
|
|
/// Otherwise, it returns false and a pointer to an empty or least valuable TTEntry
|
2015-08-15 05:22:54 -06:00
|
|
|
/// to be replaced later. The replace value of an entry is calculated as its depth
|
|
|
|
/// minus 8 times its relative age. TTEntry t1 is considered more valuable than
|
|
|
|
/// TTEntry t2 if its replace value is greater than that of t2.
|
2008-08-31 23:59:13 -06:00
|
|
|
|
2014-12-13 00:16:35 -07:00
|
|
|
TTEntry* TranspositionTable::probe(const Key key, bool& found) const {
|
2009-04-27 06:21:49 -06:00
|
|
|
|
2014-10-31 23:48:19 -06:00
|
|
|
TTEntry* const tte = first_entry(key);
|
Use 128 bit multiply for TT index
Remove super cluster stuff from TT and just use a 128 bit multiply.
STC https://tests.stockfishchess.org/tests/view/5ee719b3aae8aec816ab7548
LLR: 2.94 (-2.94,2.94) {-1.50,0.50}
Total: 12736 W: 2502 L: 2333 D: 7901
Ptnml(0-2): 191, 1452, 2944, 1559, 222
LTC https://tests.stockfishchess.org/tests/view/5ee732d1aae8aec816ab7556
LLR: 2.93 (-2.94,2.94) {-1.50,0.50}
Total: 27584 W: 3431 L: 3350 D: 20803
Ptnml(0-2): 173, 2500, 8400, 2511, 208
Scheme back to being derived from https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/
Also the default optimized version of the index calculation now uses fewer instructions.
https://godbolt.org/z/Tktxbv
Might benefit from mulx (requires -mbmi2)
closes https://github.com/official-stockfish/Stockfish/pull/2744
bench: 4320954
2020-06-15 00:35:07 -06:00
|
|
|
const uint16_t key16 = (uint16_t)key; // Use the low 16 bits as key inside the cluster
|
2008-08-31 23:59:13 -06:00
|
|
|
|
2015-01-17 14:15:15 -07:00
|
|
|
for (int i = 0; i < ClusterSize; ++i)
|
Allow TT entries with key16==0 to be fetched
Fix the issue where a TT entry with key16==0 would always be reported
as a miss. Instead, we'll use depth8 to detect whether the TT entry is
occupied. In order to do that, we'll change DEPTH_OFFSET to -7
(depth8==0) to distinguish between an unoccupied entry and the
otherwise lowest possible depth, i.e., DEPTH_NONE (depth8==1).
To prevent a performance regression, we'll reorder the TT entry fields
by the access order of TranspositionTable::probe(). Memory in general
works fastest when accessed in sequential order. We'll also match the
store order in TTEntry::save() with the entry field order, and
re-order the 'if-or' expressions in TTEntry::save() from the cheapest
to the most expensive.
Finally, as we now have a proper TT entry occupancy test, we'll fix a
minor corner case with hashfull reporting. To reproduce:
- Use a big hash
- Either:
a. Start 31 very quick searches (this wraparounds generation to 0); or
b. Force generation of the first search to 0.
- go depth infinite
Before the fix, hashfull would incorrectly report nearly full hash
immediately after the search start, since
TranspositionTable::hashfull() used to consider only the entry
generation and not whether the entry was actually occupied.
STC:
LLR: 2.95 (-2.94,2.94) {-0.25,1.25}
Total: 36848 W: 4091 L: 3898 D: 28859
Ptnml(0-2): 158, 2996, 11972, 3091, 207
https://tests.stockfishchess.org/tests/view/5f3f98d5dc02a01a0c2881f7
LTC:
LLR: 2.95 (-2.94,2.94) {0.25,1.25}
Total: 32280 W: 1828 L: 1653 D: 28799
Ptnml(0-2): 34, 1428, 13051, 1583, 44
https://tests.stockfishchess.org/tests/view/5f3fe77a87a5c3c63d8f5332
closes https://github.com/official-stockfish/Stockfish/pull/3048
Bench: 3760677
2020-08-21 03:12:39 -06:00
|
|
|
if (tte[i].key16 == key16 || !tte[i].depth8)
|
2009-04-27 06:21:49 -06:00
|
|
|
{
|
2019-01-09 08:27:47 -07:00
|
|
|
tte[i].genBound8 = uint8_t(generation8 | (tte[i].genBound8 & 0x7)); // Refresh
|
2014-12-13 00:16:35 -07:00
|
|
|
|
Allow TT entries with key16==0 to be fetched
Fix the issue where a TT entry with key16==0 would always be reported
as a miss. Instead, we'll use depth8 to detect whether the TT entry is
occupied. In order to do that, we'll change DEPTH_OFFSET to -7
(depth8==0) to distinguish between an unoccupied entry and the
otherwise lowest possible depth, i.e., DEPTH_NONE (depth8==1).
To prevent a performance regression, we'll reorder the TT entry fields
by the access order of TranspositionTable::probe(). Memory in general
works fastest when accessed in sequential order. We'll also match the
store order in TTEntry::save() with the entry field order, and
re-order the 'if-or' expressions in TTEntry::save() from the cheapest
to the most expensive.
Finally, as we now have a proper TT entry occupancy test, we'll fix a
minor corner case with hashfull reporting. To reproduce:
- Use a big hash
- Either:
a. Start 31 very quick searches (this wraparounds generation to 0); or
b. Force generation of the first search to 0.
- go depth infinite
Before the fix, hashfull would incorrectly report nearly full hash
immediately after the search start, since
TranspositionTable::hashfull() used to consider only the entry
generation and not whether the entry was actually occupied.
STC:
LLR: 2.95 (-2.94,2.94) {-0.25,1.25}
Total: 36848 W: 4091 L: 3898 D: 28859
Ptnml(0-2): 158, 2996, 11972, 3091, 207
https://tests.stockfishchess.org/tests/view/5f3f98d5dc02a01a0c2881f7
LTC:
LLR: 2.95 (-2.94,2.94) {0.25,1.25}
Total: 32280 W: 1828 L: 1653 D: 28799
Ptnml(0-2): 34, 1428, 13051, 1583, 44
https://tests.stockfishchess.org/tests/view/5f3fe77a87a5c3c63d8f5332
closes https://github.com/official-stockfish/Stockfish/pull/3048
Bench: 3760677
2020-08-21 03:12:39 -06:00
|
|
|
return found = (bool)tte[i].depth8, &tte[i];
|
2009-04-27 06:21:49 -06:00
|
|
|
}
|
2010-06-28 01:20:34 -06:00
|
|
|
|
2014-12-13 00:16:35 -07:00
|
|
|
// Find an entry to be replaced according to the replacement strategy
|
2014-10-31 23:48:19 -06:00
|
|
|
TTEntry* replace = tte;
|
2015-01-17 14:15:15 -07:00
|
|
|
for (int i = 1; i < ClusterSize; ++i)
|
2015-08-20 13:24:37 -06:00
|
|
|
// Due to our packed storage format for generation and its cyclic
|
2019-01-09 08:27:47 -07:00
|
|
|
// nature we add 263 (256 is the modulus plus 7 to keep the unrelated
|
|
|
|
// lowest three bits from affecting the result) to calculate the entry
|
2015-08-20 13:24:37 -06:00
|
|
|
// age correctly even after generation8 overflows into the next cycle.
|
2019-01-09 07:05:28 -07:00
|
|
|
if ( replace->depth8 - ((263 + generation8 - replace->genBound8) & 0xF8)
|
|
|
|
> tte[i].depth8 - ((263 + generation8 - tte[i].genBound8) & 0xF8))
|
2014-10-31 23:48:19 -06:00
|
|
|
replace = &tte[i];
|
2008-08-31 23:59:13 -06:00
|
|
|
|
2014-12-13 01:27:39 -07:00
|
|
|
return found = false, replace;
|
2008-08-31 23:59:13 -06:00
|
|
|
}
|
2015-01-25 00:57:51 -07:00
|
|
|
|
|
|
|
|
2016-09-23 11:28:34 -06:00
|
|
|
/// TranspositionTable::hashfull() returns an approximation of the hashtable
|
|
|
|
/// occupation during a search. The hash is x permill full, as per UCI protocol.
|
|
|
|
|
|
|
|
int TranspositionTable::hashfull() const {
|
2015-01-25 00:57:51 -07:00
|
|
|
|
|
|
|
int cnt = 0;
|
2020-01-27 10:53:25 -07:00
|
|
|
for (int i = 0; i < 1000; ++i)
|
2018-12-23 08:10:07 -07:00
|
|
|
for (int j = 0; j < ClusterSize; ++j)
|
Allow TT entries with key16==0 to be fetched
Fix the issue where a TT entry with key16==0 would always be reported
as a miss. Instead, we'll use depth8 to detect whether the TT entry is
occupied. In order to do that, we'll change DEPTH_OFFSET to -7
(depth8==0) to distinguish between an unoccupied entry and the
otherwise lowest possible depth, i.e., DEPTH_NONE (depth8==1).
To prevent a performance regression, we'll reorder the TT entry fields
by the access order of TranspositionTable::probe(). Memory in general
works fastest when accessed in sequential order. We'll also match the
store order in TTEntry::save() with the entry field order, and
re-order the 'if-or' expressions in TTEntry::save() from the cheapest
to the most expensive.
Finally, as we now have a proper TT entry occupancy test, we'll fix a
minor corner case with hashfull reporting. To reproduce:
- Use a big hash
- Either:
a. Start 31 very quick searches (this wraparounds generation to 0); or
b. Force generation of the first search to 0.
- go depth infinite
Before the fix, hashfull would incorrectly report nearly full hash
immediately after the search start, since
TranspositionTable::hashfull() used to consider only the entry
generation and not whether the entry was actually occupied.
STC:
LLR: 2.95 (-2.94,2.94) {-0.25,1.25}
Total: 36848 W: 4091 L: 3898 D: 28859
Ptnml(0-2): 158, 2996, 11972, 3091, 207
https://tests.stockfishchess.org/tests/view/5f3f98d5dc02a01a0c2881f7
LTC:
LLR: 2.95 (-2.94,2.94) {0.25,1.25}
Total: 32280 W: 1828 L: 1653 D: 28799
Ptnml(0-2): 34, 1428, 13051, 1583, 44
https://tests.stockfishchess.org/tests/view/5f3fe77a87a5c3c63d8f5332
closes https://github.com/official-stockfish/Stockfish/pull/3048
Bench: 3760677
2020-08-21 03:12:39 -06:00
|
|
|
cnt += table[i].entry[j].depth8 && (table[i].entry[j].genBound8 & 0xF8) == generation8;
|
2018-12-23 08:10:07 -07:00
|
|
|
|
2020-01-27 10:53:25 -07:00
|
|
|
return cnt / ClusterSize;
|
2015-01-25 00:57:51 -07:00
|
|
|
}
|