1
0
Fork 0
stockfish/src/misc.h

135 lines
4.1 KiB
C
Raw Normal View History

2008-08-31 23:59:13 -06:00
/*
Stockfish, a UCI chess playing engine derived from Glaurung 2.1
Add NNUE evaluation This patch ports the efficiently updatable neural network (NNUE) evaluation to Stockfish. Both the NNUE and the classical evaluations are available, and can be used to assign a value to a position that is later used in alpha-beta (PVS) search to find the best move. The classical evaluation computes this value as a function of various chess concepts, handcrafted by experts, tested and tuned using fishtest. The NNUE evaluation computes this value with a neural network based on basic inputs. The network is optimized and trained on the evalutions of millions of positions at moderate search depth. The NNUE evaluation was first introduced in shogi, and ported to Stockfish afterward. It can be evaluated efficiently on CPUs, and exploits the fact that only parts of the neural network need to be updated after a typical chess move. [The nodchip repository](https://github.com/nodchip/Stockfish) provides additional tools to train and develop the NNUE networks. This patch is the result of contributions of various authors, from various communities, including: nodchip, ynasu87, yaneurao (initial port and NNUE authors), domschl, FireFather, rqs, xXH4CKST3RXx, tttak, zz4032, joergoster, mstembera, nguyenpham, erbsenzaehler, dorzechowski, and vondele. This new evaluation needed various changes to fishtest and the corresponding infrastructure, for which tomtor, ppigazzini, noobpwnftw, daylen, and vondele are gratefully acknowledged. The first networks have been provided by gekkehenker and sergiovieri, with the latter net (nn-97f742aaefcd.nnue) being the current default. The evaluation function can be selected at run time with the `Use NNUE` (true/false) UCI option, provided the `EvalFile` option points the the network file (depending on the GUI, with full path). The performance of the NNUE evaluation relative to the classical evaluation depends somewhat on the hardware, and is expected to improve quickly, but is currently on > 80 Elo on fishtest: 60000 @ 10+0.1 th 1 https://tests.stockfishchess.org/tests/view/5f28fe6ea5abc164f05e4c4c ELO: 92.77 +-2.1 (95%) LOS: 100.0% Total: 60000 W: 24193 L: 8543 D: 27264 Ptnml(0-2): 609, 3850, 9708, 10948, 4885 40000 @ 20+0.2 th 8 https://tests.stockfishchess.org/tests/view/5f290229a5abc164f05e4c58 ELO: 89.47 +-2.0 (95%) LOS: 100.0% Total: 40000 W: 12756 L: 2677 D: 24567 Ptnml(0-2): 74, 1583, 8550, 7776, 2017 At the same time, the impact on the classical evaluation remains minimal, causing no significant regression: sprt @ 10+0.1 th 1 https://tests.stockfishchess.org/tests/view/5f2906a2a5abc164f05e4c5b LLR: 2.94 (-2.94,2.94) {-6.00,-4.00} Total: 34936 W: 6502 L: 6825 D: 21609 Ptnml(0-2): 571, 4082, 8434, 3861, 520 sprt @ 60+0.6 th 1 https://tests.stockfishchess.org/tests/view/5f2906cfa5abc164f05e4c5d LLR: 2.93 (-2.94,2.94) {-6.00,-4.00} Total: 10088 W: 1232 L: 1265 D: 7591 Ptnml(0-2): 49, 914, 3170, 843, 68 The needed networks can be found at https://tests.stockfishchess.org/nns It is recommended to use the default one as indicated by the `EvalFile` UCI option. Guidelines for testing new nets can be found at https://github.com/glinscott/fishtest/wiki/Creating-my-first-test#nnue-net-tests Integration has been discussed in various issues: https://github.com/official-stockfish/Stockfish/issues/2823 https://github.com/official-stockfish/Stockfish/issues/2728 The integration branch will be closed after the merge: https://github.com/official-stockfish/Stockfish/pull/2825 https://github.com/official-stockfish/Stockfish/tree/nnue-player-wip closes https://github.com/official-stockfish/Stockfish/pull/2912 This will be an exciting time for computer chess, looking forward to seeing the evolution of this approach. Bench: 4746616
2020-08-05 09:11:15 -06:00
Copyright (C) 2004-2020 The Stockfish developers (see AUTHORS file)
2008-08-31 23:59:13 -06:00
Stockfish is free software: you can redistribute it and/or modify
2008-08-31 23:59:13 -06:00
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
Stockfish is distributed in the hope that it will be useful,
2008-08-31 23:59:13 -06:00
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
2008-08-31 23:59:13 -06:00
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
#ifndef MISC_H_INCLUDED
2008-08-31 23:59:13 -06:00
#define MISC_H_INCLUDED
#include <cassert>
#include <chrono>
#include <ostream>
2008-08-31 23:59:13 -06:00
#include <string>
#include <vector>
#include "types.h"
2008-08-31 23:59:13 -06:00
const std::string engine_info(bool to_uci = false);
const std::string compiler_info();
void prefetch(void* addr);
void start_logger(const std::string& fname);
Add NNUE evaluation This patch ports the efficiently updatable neural network (NNUE) evaluation to Stockfish. Both the NNUE and the classical evaluations are available, and can be used to assign a value to a position that is later used in alpha-beta (PVS) search to find the best move. The classical evaluation computes this value as a function of various chess concepts, handcrafted by experts, tested and tuned using fishtest. The NNUE evaluation computes this value with a neural network based on basic inputs. The network is optimized and trained on the evalutions of millions of positions at moderate search depth. The NNUE evaluation was first introduced in shogi, and ported to Stockfish afterward. It can be evaluated efficiently on CPUs, and exploits the fact that only parts of the neural network need to be updated after a typical chess move. [The nodchip repository](https://github.com/nodchip/Stockfish) provides additional tools to train and develop the NNUE networks. This patch is the result of contributions of various authors, from various communities, including: nodchip, ynasu87, yaneurao (initial port and NNUE authors), domschl, FireFather, rqs, xXH4CKST3RXx, tttak, zz4032, joergoster, mstembera, nguyenpham, erbsenzaehler, dorzechowski, and vondele. This new evaluation needed various changes to fishtest and the corresponding infrastructure, for which tomtor, ppigazzini, noobpwnftw, daylen, and vondele are gratefully acknowledged. The first networks have been provided by gekkehenker and sergiovieri, with the latter net (nn-97f742aaefcd.nnue) being the current default. The evaluation function can be selected at run time with the `Use NNUE` (true/false) UCI option, provided the `EvalFile` option points the the network file (depending on the GUI, with full path). The performance of the NNUE evaluation relative to the classical evaluation depends somewhat on the hardware, and is expected to improve quickly, but is currently on > 80 Elo on fishtest: 60000 @ 10+0.1 th 1 https://tests.stockfishchess.org/tests/view/5f28fe6ea5abc164f05e4c4c ELO: 92.77 +-2.1 (95%) LOS: 100.0% Total: 60000 W: 24193 L: 8543 D: 27264 Ptnml(0-2): 609, 3850, 9708, 10948, 4885 40000 @ 20+0.2 th 8 https://tests.stockfishchess.org/tests/view/5f290229a5abc164f05e4c58 ELO: 89.47 +-2.0 (95%) LOS: 100.0% Total: 40000 W: 12756 L: 2677 D: 24567 Ptnml(0-2): 74, 1583, 8550, 7776, 2017 At the same time, the impact on the classical evaluation remains minimal, causing no significant regression: sprt @ 10+0.1 th 1 https://tests.stockfishchess.org/tests/view/5f2906a2a5abc164f05e4c5b LLR: 2.94 (-2.94,2.94) {-6.00,-4.00} Total: 34936 W: 6502 L: 6825 D: 21609 Ptnml(0-2): 571, 4082, 8434, 3861, 520 sprt @ 60+0.6 th 1 https://tests.stockfishchess.org/tests/view/5f2906cfa5abc164f05e4c5d LLR: 2.93 (-2.94,2.94) {-6.00,-4.00} Total: 10088 W: 1232 L: 1265 D: 7591 Ptnml(0-2): 49, 914, 3170, 843, 68 The needed networks can be found at https://tests.stockfishchess.org/nns It is recommended to use the default one as indicated by the `EvalFile` UCI option. Guidelines for testing new nets can be found at https://github.com/glinscott/fishtest/wiki/Creating-my-first-test#nnue-net-tests Integration has been discussed in various issues: https://github.com/official-stockfish/Stockfish/issues/2823 https://github.com/official-stockfish/Stockfish/issues/2728 The integration branch will be closed after the merge: https://github.com/official-stockfish/Stockfish/pull/2825 https://github.com/official-stockfish/Stockfish/tree/nnue-player-wip closes https://github.com/official-stockfish/Stockfish/pull/2912 This will be an exciting time for computer chess, looking forward to seeing the evolution of this approach. Bench: 4746616
2020-08-05 09:11:15 -06:00
void* std_aligned_alloc(size_t alignment, size_t size);
void std_aligned_free(void* ptr);
void* aligned_ttmem_alloc(size_t size, void*& mem);
void aligned_ttmem_free(void* mem); // nop if mem == nullptr
void dbg_hit_on(bool b);
void dbg_hit_on(bool c, bool b);
void dbg_mean_of(int v);
void dbg_print();
typedef std::chrono::milliseconds::rep TimePoint; // A value in milliseconds
static_assert(sizeof(TimePoint) == sizeof(int64_t), "TimePoint should be 64 bits");
inline TimePoint now() {
return std::chrono::duration_cast<std::chrono::milliseconds>
(std::chrono::steady_clock::now().time_since_epoch()).count();
}
template<class Entry, int Size>
struct HashTable {
Entry* operator[](Key key) { return &table[(uint32_t)key & (Size - 1)]; }
private:
std::vector<Entry> table = std::vector<Entry>(Size); // Allocate on the heap
};
enum SyncCout { IO_LOCK, IO_UNLOCK };
std::ostream& operator<<(std::ostream&, SyncCout);
#define sync_cout std::cout << IO_LOCK
#define sync_endl std::endl << IO_UNLOCK
Simpler PRNG and faster magics search This patch replaces RKISS by a simpler and faster PRNG, xorshift64* proposed by S. Vigna (2014). It is extremely simple, has a large enough period for Stockfish's needs (2^64), requires no warming-up (allowing such code to be removed), and offers slightly better randomness than MT19937. Paper: http://xorshift.di.unimi.it/ Reference source code (public domain): http://xorshift.di.unimi.it/xorshift64star.c The patch also simplifies how init_magics() searches for magics: - Old logic: seed the PRNG always with the same seed, then use optimized bit rotations to tailor the RNG sequence per rank. - New logic: seed the PRNG with an optimized seed per rank. This has two advantages: 1. Less code and less computation to perform during magics search (not ROTL). 2. More choices for random sequence tuning. The old logic only let us choose from 4096 bit rotation pairs. With the new one, we can look for the best seeds among 2^64 values. Indeed, the set of seeds[][] provided in the patch reduces the effort needed to find the magics: 64-bit SF: Old logic -> 5,783,789 rand64() calls needed to find the magics New logic -> 4,420,086 calls 32-bit SF: Old logic -> 2,175,518 calls New logic -> 1,895,955 calls In the 64-bit case, init_magics() take 25 ms less to complete (Intel Core i5). Finally, when playing with strength handicap, non-determinism is achieved by setting the seed of the static RNG only once. Afterwards, there is no need to skip output values. The bench only changes because the Zobrist keys are now different (since they are random numbers straight out of the PRNG). The RNG seed has been carefully chosen so that the resulting Zobrist keys are particularly well-behaved: 1. All triplets of XORed keys are unique, implying that it would take at least 7 keys to find a 64-bit collision (test suggested by ceebo) 2. All pairs of XORed keys are unique modulo 2^32 3. The cardinality of { (key1 ^ key2) >> 48 } is as close as possible to the maximum (65536) Point 2 aims at ensuring a good distribution among the bits that determine an TT entry's cluster, likewise point 3 among the bits that form the TT entry's key16 inside a cluster. Details: Bitset card(key1^key2) ------ --------------- RKISS key16 64894 = 99.020% of theoretical maximum low18 180117 = 99.293% low32 305362 = 99.997% Xorshift64*, old seed key16 64918 = 99.057% low18 179994 = 99.225% low32 305350 = 99.993% Xorshift64*, new seed key16 65027 = 99.223% low18 181118 = 99.845% low32 305371 = 100.000% Bench: 9324905 Resolves #148
2014-12-07 17:10:57 -07:00
/// xorshift64star Pseudo-Random Number Generator
/// This class is based on original code written and dedicated
/// to the public domain by Sebastiano Vigna (2014).
/// It has the following characteristics:
///
Simpler PRNG and faster magics search This patch replaces RKISS by a simpler and faster PRNG, xorshift64* proposed by S. Vigna (2014). It is extremely simple, has a large enough period for Stockfish's needs (2^64), requires no warming-up (allowing such code to be removed), and offers slightly better randomness than MT19937. Paper: http://xorshift.di.unimi.it/ Reference source code (public domain): http://xorshift.di.unimi.it/xorshift64star.c The patch also simplifies how init_magics() searches for magics: - Old logic: seed the PRNG always with the same seed, then use optimized bit rotations to tailor the RNG sequence per rank. - New logic: seed the PRNG with an optimized seed per rank. This has two advantages: 1. Less code and less computation to perform during magics search (not ROTL). 2. More choices for random sequence tuning. The old logic only let us choose from 4096 bit rotation pairs. With the new one, we can look for the best seeds among 2^64 values. Indeed, the set of seeds[][] provided in the patch reduces the effort needed to find the magics: 64-bit SF: Old logic -> 5,783,789 rand64() calls needed to find the magics New logic -> 4,420,086 calls 32-bit SF: Old logic -> 2,175,518 calls New logic -> 1,895,955 calls In the 64-bit case, init_magics() take 25 ms less to complete (Intel Core i5). Finally, when playing with strength handicap, non-determinism is achieved by setting the seed of the static RNG only once. Afterwards, there is no need to skip output values. The bench only changes because the Zobrist keys are now different (since they are random numbers straight out of the PRNG). The RNG seed has been carefully chosen so that the resulting Zobrist keys are particularly well-behaved: 1. All triplets of XORed keys are unique, implying that it would take at least 7 keys to find a 64-bit collision (test suggested by ceebo) 2. All pairs of XORed keys are unique modulo 2^32 3. The cardinality of { (key1 ^ key2) >> 48 } is as close as possible to the maximum (65536) Point 2 aims at ensuring a good distribution among the bits that determine an TT entry's cluster, likewise point 3 among the bits that form the TT entry's key16 inside a cluster. Details: Bitset card(key1^key2) ------ --------------- RKISS key16 64894 = 99.020% of theoretical maximum low18 180117 = 99.293% low32 305362 = 99.997% Xorshift64*, old seed key16 64918 = 99.057% low18 179994 = 99.225% low32 305350 = 99.993% Xorshift64*, new seed key16 65027 = 99.223% low18 181118 = 99.845% low32 305371 = 100.000% Bench: 9324905 Resolves #148
2014-12-07 17:10:57 -07:00
/// - Outputs 64-bit numbers
/// - Passes Dieharder and SmallCrush test batteries
/// - Does not require warm-up, no zeroland to escape
/// - Internal state is a single 64-bit integer
/// - Period is 2^64 - 1
/// - Speed: 1.60 ns/call (Core i7 @3.40GHz)
///
Simpler PRNG and faster magics search This patch replaces RKISS by a simpler and faster PRNG, xorshift64* proposed by S. Vigna (2014). It is extremely simple, has a large enough period for Stockfish's needs (2^64), requires no warming-up (allowing such code to be removed), and offers slightly better randomness than MT19937. Paper: http://xorshift.di.unimi.it/ Reference source code (public domain): http://xorshift.di.unimi.it/xorshift64star.c The patch also simplifies how init_magics() searches for magics: - Old logic: seed the PRNG always with the same seed, then use optimized bit rotations to tailor the RNG sequence per rank. - New logic: seed the PRNG with an optimized seed per rank. This has two advantages: 1. Less code and less computation to perform during magics search (not ROTL). 2. More choices for random sequence tuning. The old logic only let us choose from 4096 bit rotation pairs. With the new one, we can look for the best seeds among 2^64 values. Indeed, the set of seeds[][] provided in the patch reduces the effort needed to find the magics: 64-bit SF: Old logic -> 5,783,789 rand64() calls needed to find the magics New logic -> 4,420,086 calls 32-bit SF: Old logic -> 2,175,518 calls New logic -> 1,895,955 calls In the 64-bit case, init_magics() take 25 ms less to complete (Intel Core i5). Finally, when playing with strength handicap, non-determinism is achieved by setting the seed of the static RNG only once. Afterwards, there is no need to skip output values. The bench only changes because the Zobrist keys are now different (since they are random numbers straight out of the PRNG). The RNG seed has been carefully chosen so that the resulting Zobrist keys are particularly well-behaved: 1. All triplets of XORed keys are unique, implying that it would take at least 7 keys to find a 64-bit collision (test suggested by ceebo) 2. All pairs of XORed keys are unique modulo 2^32 3. The cardinality of { (key1 ^ key2) >> 48 } is as close as possible to the maximum (65536) Point 2 aims at ensuring a good distribution among the bits that determine an TT entry's cluster, likewise point 3 among the bits that form the TT entry's key16 inside a cluster. Details: Bitset card(key1^key2) ------ --------------- RKISS key16 64894 = 99.020% of theoretical maximum low18 180117 = 99.293% low32 305362 = 99.997% Xorshift64*, old seed key16 64918 = 99.057% low18 179994 = 99.225% low32 305350 = 99.993% Xorshift64*, new seed key16 65027 = 99.223% low18 181118 = 99.845% low32 305371 = 100.000% Bench: 9324905 Resolves #148
2014-12-07 17:10:57 -07:00
/// For further analysis see
/// <http://vigna.di.unimi.it/ftp/papers/xorshift.pdf>
class PRNG {
uint64_t s;
Simpler PRNG and faster magics search This patch replaces RKISS by a simpler and faster PRNG, xorshift64* proposed by S. Vigna (2014). It is extremely simple, has a large enough period for Stockfish's needs (2^64), requires no warming-up (allowing such code to be removed), and offers slightly better randomness than MT19937. Paper: http://xorshift.di.unimi.it/ Reference source code (public domain): http://xorshift.di.unimi.it/xorshift64star.c The patch also simplifies how init_magics() searches for magics: - Old logic: seed the PRNG always with the same seed, then use optimized bit rotations to tailor the RNG sequence per rank. - New logic: seed the PRNG with an optimized seed per rank. This has two advantages: 1. Less code and less computation to perform during magics search (not ROTL). 2. More choices for random sequence tuning. The old logic only let us choose from 4096 bit rotation pairs. With the new one, we can look for the best seeds among 2^64 values. Indeed, the set of seeds[][] provided in the patch reduces the effort needed to find the magics: 64-bit SF: Old logic -> 5,783,789 rand64() calls needed to find the magics New logic -> 4,420,086 calls 32-bit SF: Old logic -> 2,175,518 calls New logic -> 1,895,955 calls In the 64-bit case, init_magics() take 25 ms less to complete (Intel Core i5). Finally, when playing with strength handicap, non-determinism is achieved by setting the seed of the static RNG only once. Afterwards, there is no need to skip output values. The bench only changes because the Zobrist keys are now different (since they are random numbers straight out of the PRNG). The RNG seed has been carefully chosen so that the resulting Zobrist keys are particularly well-behaved: 1. All triplets of XORed keys are unique, implying that it would take at least 7 keys to find a 64-bit collision (test suggested by ceebo) 2. All pairs of XORed keys are unique modulo 2^32 3. The cardinality of { (key1 ^ key2) >> 48 } is as close as possible to the maximum (65536) Point 2 aims at ensuring a good distribution among the bits that determine an TT entry's cluster, likewise point 3 among the bits that form the TT entry's key16 inside a cluster. Details: Bitset card(key1^key2) ------ --------------- RKISS key16 64894 = 99.020% of theoretical maximum low18 180117 = 99.293% low32 305362 = 99.997% Xorshift64*, old seed key16 64918 = 99.057% low18 179994 = 99.225% low32 305350 = 99.993% Xorshift64*, new seed key16 65027 = 99.223% low18 181118 = 99.845% low32 305371 = 100.000% Bench: 9324905 Resolves #148
2014-12-07 17:10:57 -07:00
uint64_t rand64() {
s ^= s >> 12, s ^= s << 25, s ^= s >> 27;
return s * 2685821657736338717LL;
Simpler PRNG and faster magics search This patch replaces RKISS by a simpler and faster PRNG, xorshift64* proposed by S. Vigna (2014). It is extremely simple, has a large enough period for Stockfish's needs (2^64), requires no warming-up (allowing such code to be removed), and offers slightly better randomness than MT19937. Paper: http://xorshift.di.unimi.it/ Reference source code (public domain): http://xorshift.di.unimi.it/xorshift64star.c The patch also simplifies how init_magics() searches for magics: - Old logic: seed the PRNG always with the same seed, then use optimized bit rotations to tailor the RNG sequence per rank. - New logic: seed the PRNG with an optimized seed per rank. This has two advantages: 1. Less code and less computation to perform during magics search (not ROTL). 2. More choices for random sequence tuning. The old logic only let us choose from 4096 bit rotation pairs. With the new one, we can look for the best seeds among 2^64 values. Indeed, the set of seeds[][] provided in the patch reduces the effort needed to find the magics: 64-bit SF: Old logic -> 5,783,789 rand64() calls needed to find the magics New logic -> 4,420,086 calls 32-bit SF: Old logic -> 2,175,518 calls New logic -> 1,895,955 calls In the 64-bit case, init_magics() take 25 ms less to complete (Intel Core i5). Finally, when playing with strength handicap, non-determinism is achieved by setting the seed of the static RNG only once. Afterwards, there is no need to skip output values. The bench only changes because the Zobrist keys are now different (since they are random numbers straight out of the PRNG). The RNG seed has been carefully chosen so that the resulting Zobrist keys are particularly well-behaved: 1. All triplets of XORed keys are unique, implying that it would take at least 7 keys to find a 64-bit collision (test suggested by ceebo) 2. All pairs of XORed keys are unique modulo 2^32 3. The cardinality of { (key1 ^ key2) >> 48 } is as close as possible to the maximum (65536) Point 2 aims at ensuring a good distribution among the bits that determine an TT entry's cluster, likewise point 3 among the bits that form the TT entry's key16 inside a cluster. Details: Bitset card(key1^key2) ------ --------------- RKISS key16 64894 = 99.020% of theoretical maximum low18 180117 = 99.293% low32 305362 = 99.997% Xorshift64*, old seed key16 64918 = 99.057% low18 179994 = 99.225% low32 305350 = 99.993% Xorshift64*, new seed key16 65027 = 99.223% low18 181118 = 99.845% low32 305371 = 100.000% Bench: 9324905 Resolves #148
2014-12-07 17:10:57 -07:00
}
public:
PRNG(uint64_t seed) : s(seed) { assert(seed); }
Simpler PRNG and faster magics search This patch replaces RKISS by a simpler and faster PRNG, xorshift64* proposed by S. Vigna (2014). It is extremely simple, has a large enough period for Stockfish's needs (2^64), requires no warming-up (allowing such code to be removed), and offers slightly better randomness than MT19937. Paper: http://xorshift.di.unimi.it/ Reference source code (public domain): http://xorshift.di.unimi.it/xorshift64star.c The patch also simplifies how init_magics() searches for magics: - Old logic: seed the PRNG always with the same seed, then use optimized bit rotations to tailor the RNG sequence per rank. - New logic: seed the PRNG with an optimized seed per rank. This has two advantages: 1. Less code and less computation to perform during magics search (not ROTL). 2. More choices for random sequence tuning. The old logic only let us choose from 4096 bit rotation pairs. With the new one, we can look for the best seeds among 2^64 values. Indeed, the set of seeds[][] provided in the patch reduces the effort needed to find the magics: 64-bit SF: Old logic -> 5,783,789 rand64() calls needed to find the magics New logic -> 4,420,086 calls 32-bit SF: Old logic -> 2,175,518 calls New logic -> 1,895,955 calls In the 64-bit case, init_magics() take 25 ms less to complete (Intel Core i5). Finally, when playing with strength handicap, non-determinism is achieved by setting the seed of the static RNG only once. Afterwards, there is no need to skip output values. The bench only changes because the Zobrist keys are now different (since they are random numbers straight out of the PRNG). The RNG seed has been carefully chosen so that the resulting Zobrist keys are particularly well-behaved: 1. All triplets of XORed keys are unique, implying that it would take at least 7 keys to find a 64-bit collision (test suggested by ceebo) 2. All pairs of XORed keys are unique modulo 2^32 3. The cardinality of { (key1 ^ key2) >> 48 } is as close as possible to the maximum (65536) Point 2 aims at ensuring a good distribution among the bits that determine an TT entry's cluster, likewise point 3 among the bits that form the TT entry's key16 inside a cluster. Details: Bitset card(key1^key2) ------ --------------- RKISS key16 64894 = 99.020% of theoretical maximum low18 180117 = 99.293% low32 305362 = 99.997% Xorshift64*, old seed key16 64918 = 99.057% low18 179994 = 99.225% low32 305350 = 99.993% Xorshift64*, new seed key16 65027 = 99.223% low18 181118 = 99.845% low32 305371 = 100.000% Bench: 9324905 Resolves #148
2014-12-07 17:10:57 -07:00
template<typename T> T rand() { return T(rand64()); }
/// Special generator used to fast init magic numbers.
/// Output values only have 1/8th of their bits set on average.
template<typename T> T sparse_rand()
{ return T(rand64() & rand64() & rand64()); }
Simpler PRNG and faster magics search This patch replaces RKISS by a simpler and faster PRNG, xorshift64* proposed by S. Vigna (2014). It is extremely simple, has a large enough period for Stockfish's needs (2^64), requires no warming-up (allowing such code to be removed), and offers slightly better randomness than MT19937. Paper: http://xorshift.di.unimi.it/ Reference source code (public domain): http://xorshift.di.unimi.it/xorshift64star.c The patch also simplifies how init_magics() searches for magics: - Old logic: seed the PRNG always with the same seed, then use optimized bit rotations to tailor the RNG sequence per rank. - New logic: seed the PRNG with an optimized seed per rank. This has two advantages: 1. Less code and less computation to perform during magics search (not ROTL). 2. More choices for random sequence tuning. The old logic only let us choose from 4096 bit rotation pairs. With the new one, we can look for the best seeds among 2^64 values. Indeed, the set of seeds[][] provided in the patch reduces the effort needed to find the magics: 64-bit SF: Old logic -> 5,783,789 rand64() calls needed to find the magics New logic -> 4,420,086 calls 32-bit SF: Old logic -> 2,175,518 calls New logic -> 1,895,955 calls In the 64-bit case, init_magics() take 25 ms less to complete (Intel Core i5). Finally, when playing with strength handicap, non-determinism is achieved by setting the seed of the static RNG only once. Afterwards, there is no need to skip output values. The bench only changes because the Zobrist keys are now different (since they are random numbers straight out of the PRNG). The RNG seed has been carefully chosen so that the resulting Zobrist keys are particularly well-behaved: 1. All triplets of XORed keys are unique, implying that it would take at least 7 keys to find a 64-bit collision (test suggested by ceebo) 2. All pairs of XORed keys are unique modulo 2^32 3. The cardinality of { (key1 ^ key2) >> 48 } is as close as possible to the maximum (65536) Point 2 aims at ensuring a good distribution among the bits that determine an TT entry's cluster, likewise point 3 among the bits that form the TT entry's key16 inside a cluster. Details: Bitset card(key1^key2) ------ --------------- RKISS key16 64894 = 99.020% of theoretical maximum low18 180117 = 99.293% low32 305362 = 99.997% Xorshift64*, old seed key16 64918 = 99.057% low18 179994 = 99.225% low32 305350 = 99.993% Xorshift64*, new seed key16 65027 = 99.223% low18 181118 = 99.845% low32 305371 = 100.000% Bench: 9324905 Resolves #148
2014-12-07 17:10:57 -07:00
};
inline uint64_t mul_hi64(uint64_t a, uint64_t b) {
#if defined(__GNUC__) && defined(IS_64BIT)
__extension__ typedef unsigned __int128 uint128;
return ((uint128)a * (uint128)b) >> 64;
#else
uint64_t aL = (uint32_t)a, aH = a >> 32;
uint64_t bL = (uint32_t)b, bH = b >> 32;
uint64_t c1 = (aL * bL) >> 32;
uint64_t c2 = aH * bL + c1;
uint64_t c3 = aL * bH + (uint32_t)c2;
return aH * bH + (c2 >> 32) + (c3 >> 32);
#endif
}
/// Under Windows it is not possible for a process to run on more than one
/// logical processor group. This usually means to be limited to use max 64
/// cores. To overcome this, some special platform specific API should be
/// called to set group affinity for each thread. Original code from Texel by
/// Peter Österlund.
namespace WinProcGroup {
void bindThisThread(size_t idx);
}
namespace CommandLine {
void init(int argc, char* argv[]);
extern std::string binaryDirectory; // path of the executable directory
extern std::string workingDirectory; // path of the working directory
}
#endif // #ifndef MISC_H_INCLUDED