Rewrite the code in SF style, simplify and
document it.
Code is now much clear and bug free (no mem-leaks and
other small issues) and is also smaller (more than
600 lines of code removed).
All the code has been rewritten but root_probe() and
root_probe_wdl() that are completely misplaced and should
be retired altogheter. For now just leave them in the
original version.
Code is fully and deeply tested for equivalency both in
functionality and in speed with hundreds of games and
test positions and is guaranteed to be 100% equivalent
to the original.
Tested with tb_dbg branch for functional equivalency on
more than 12M positions.
stockfish.exe bench 128 1 16 syzygy.epd
Position: 2016/2016
Total 12121156 Hits 0 hit rate (%) 0
Total time (ms) : 4417851
Nodes searched : 1100151204
Nodes/second : 249024
Tested with 5,000 games match against master, 1 Thread,
128 MB Hash each, tc 40+0.4, which is almost equivalent
to LTC in Fishtest on this machine. 3-, 4- and 5-men syzygy
bases on SSD, 12-moves opening book to emphasize mid- and endgame.
Score of SF-SyzygyC++ vs SF-Master: 633 - 617 - 3750 [0.502] 5000
ELO difference: 1
No functional change.
Stephane's patch removes the only usage of Position::see, where the
returned value isn't immediately compared with a value. So I replaced
this function by its optimised and more specific version see_ge. This
function also supersedes the function Position::see_sign.
bool Position::see_ge(Move m, Value v) const;
This function tests if the SEE of a move is greater or equal than a
given value. We use forward iteration on captures instread of backward
one, therefore we don't need the swapList array. Also we stop as soon
as we have enough information to obtain the result, avoiding unnecessary
calls to the min_attacker function.
Speed tests (Windows 7), 20 runs for each engine:
Test engine: mean 866648, st. dev. 5964
Base engine: mean 846751, st. dev. 22846
Speedup: 1.023
Speed test by Stephane Nicolet
Fishtest STC test:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 26040 W: 4675 L: 4442 D: 16923
http://tests.stockfishchess.org/tests/view/57f648990ebc59038170fa03
No functional change.
Use the following transformations:
- to check that A is included in B, testing "(A & ~B) == 0" is faster
than "(A & B) == A"
- to remove the intersection of A and B from A, doing "A &= ~B;" is as
fast as "if (A & B) A &= ~B;" but is simpler.
Overall, the simpler patch version is 0.3% than current master.
No functional change.
Correct pinners calculation and fix bug with pinned
pieces giving check. With this patch 'pinners' only
returns sliders with exactly one defensive piece between
the slider and the attacked square (in other words, pinners
returns exact pinners).
This was a co-operation between Marco Costalba,
Stéphane Nicolet and me.
Special thanks to Ronald de Man for reporting the bug with
pinned pieces giving check, discussed here:
https://groups.google.com/forum/?fromgroups=#!topic/fishcooking/S_4E_Xs5HaE
STC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 132118 W: 23578 L: 23645 D: 84895
LTC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 36424 W: 4770 L: 4670 D: 26984
bench: 6272231
This is the most compact and neatest version
is was able to produce.
On normal builds I have a small slowdown:
normal builds base vs. simplification (gcc 4.8.1 Win7-64 i7-3770 @ 3.4GHz x86-64-modern)
Results for 20 tests for each version:
Base Test Diff
Mean 1974744 1969333 5411
StDev 11825 10281 5874
p-value: 0,178
speedup: -0,003
On pgo-builds however I measure a nice 1.1% speedup
pgo-builds base vs. simplification
Results for 20 tests for each version:
Base Test Diff
Mean 1974119 1995444 -21325
StDev 8703 5717 4623
p-value: 1
speedup: 0,011
No functional change.
Don't allow pinned pieces to attack the exchange-square as long all
pinners (this includes also potential ones) are on their original
square.
As soon a pinner moves to the exchange-square or get captured on it, we
fall back to standard SEE behaviour.
This correctly handles the majority of cases with absolute pins.
bench: 6883133
This greately simplifies usage because hides to the
search the implementation specific CheckInfo.
This is based on the work done by Marco in pull request #716,
implementing on top of it the ideas in the discussion: caching
the calls to slider_blockers() in the CheckInfo structure,
and simplifying the slider_blockers() function by removing its
first parameter.
Compared to master, bench is identical but the number of calls
to slider_blockers() during bench goes down from 22461515 to 18853422,
hopefully being a little bit faster overall.
archlinux, gcc-6
make profile-build ARCH=x86-64-bmi2
50 runs each
bench:
base = 2356320 +/- 981
test = 2403811 +/- 981
diff = 47490 +/- 1828
speedup = 0.0202
P(speedup > 0) = 1.0000
perft 6:
base = 175498484 +/- 429925
test = 183997959 +/- 429925
diff = 8499474 +/- 469401
speedup = 0.0484
P(speedup > 0) = 1.0000
perft 7 (but only 10 runs):
base = 185403228 +/- 468705
test = 188777591 +/- 468705
diff = 3374363 +/- 476687
speedup = 0.0182
P(speedup > 0) = 1.0000
$ ./pyshbench ../Stockfish/master ../Stockfish/test 20
run base test diff
...
base = 2501728 +/- 182034
test = 2532997 +/- 182034
diff = 31268 +/- 5116
speedup = 0.0125
P(speedup > 0) = 1.0000
No functional change.
Seems to give around 1% speed-up for CPUs with popcnt support.
Seems to give a very minor speed-up for CPUs without popcnt.
No functional change
Resolves#646
And passed in do_move(), this ensures maximum efficiency and
speed and at the same time unlimited move numbers.
The draw back is that to handle Position init we need to
reserve a StateInfo inside Position itself and use at
init time and when copying from another Position.
After lazy SMP we don't need anymore this gimmick and we can
get rid of this special case and always pass an external
StateInfo to Position object.
Also rewritten and simplified Position constructors.
Verified it does not regress with a 3 threads SMP test:
ELO: -0.00 +-12.7 (95%) LOS: 50.0%
Total: 1000 W: 173 L: 173 D: 654
No functional change.
Also a speedup(about 1%) on 64-bit w/o hardware popcnt
Retire Max15 and Full template parameters
(Contributed by Marco Costalba)
Now that we have just SW and HW versions, use
template default parameter to get rid of explicit
template parameters.
Retire bitcount.h and move the only defined
function to bitboard.h
No functional change
Resolves#620
Use Position::square and Position::squares instead.
This allow us to remove king_square(), simplify
endgames and to have more naming uniformity.
Moreover, this is a prerequisite step in case
in the future we decide to retire piece lists
altoghter and use pop_lsb() to loop across
pieces and serialize the moves. In this way
we just need to change definition of Position::square
to something like:
template<PieceType Pt> inline
Square Position::square(Color c) const {
return lsb(byColorBB[c]);
}
No functional change.
In Chess960 we can have legal positions with
opponent rook in A or H file and with castling
available, for instance:
4k3/pppppppp/8/8/8/8/PPPPPPPP/rR2K3 w Q - 0 1
In those cases we pick up the wrong rook when
setting castling.
Fix it by checking the color of the rook.
Bug reported by Matthew Lai.
No functional change.
Easier for tuning psq tables:
TUNE(myParameters, PSQT::init);
Also move PSQT code in a new *.cpp file, and retire the
old and hacky psqtab.h that required to be included only
once to work correctly, this is not idiomatic for a header
file.
Give wide visibility to psq tables (previously visible only
in position.cpp), this will easy the use of psq tables outside
Position, for instance in move ordering.
Finally trivial code style fixes of the latest patches.
Original patch of Lucas Braesch.
No functional change.
Currently Zobrist::castling[] are not properly zeroed
and rely on the compiler to do this at startup, but this
makes Position::init() to set different values every time
it is called!
This is a bit odd, and although not impacting normal usage,
can yield to subtle misbehaviour, very difficult to track
down, in case we happen to call it more than once for some
reason. I found this while developing tuning support and
it took me a while to track it down.
So properly init Zobrist::castling[]
No functional change.
Resolves#329
This micro-optimization only complicates the code and provides no benefit.
Removing it is even a speedup on my machine (i7-3770k, linux, gcc 4.9.1):
stat test master diff
mean 2,403,118 2,390,904 12,214
stdev 12,043 10,620 3,677
speedup 0.51%
P(speedup>0) 100.0%
No functional change.
Also remove useless StateCopySize64 optimization:
compiler uses SSE movups instruction anyhow and
does not need this trick (verified with fishbench).
No functional change.
Funny enough, gcc __builtin_prefetch() expects
already a void*, instead Windows's _mm_prefetch()
requires a char*.
The patch allows to remove ugly casts from caller
sites.
No functional change.
Results for 10 tests for each version (gcc 4.8.3 on mingw):
Base Test Diff
Mean 1502447 1507917 -5470
StDev 3119 1364 4153
p-value: 0,906
speedup: 0,004
Results for 10 tests for each version (MSVC 2013):
Base Test Diff
Mean 1400899 1403713 -2814
StDev 1273 2804 2700
p-value: 0,851
speedup: 0,002
No functional change.
It is up to material (and pawn) table look up
code to know where the per-thread tables are,
so change API to reflect this.
Also some comment fixing while there
No functional change.
This patch replaces RKISS by a simpler and faster PRNG, xorshift64* proposed
by S. Vigna (2014). It is extremely simple, has a large enough period for
Stockfish's needs (2^64), requires no warming-up (allowing such code to be
removed), and offers slightly better randomness than MT19937.
Paper: http://xorshift.di.unimi.it/
Reference source code (public domain):
http://xorshift.di.unimi.it/xorshift64star.c
The patch also simplifies how init_magics() searches for magics:
- Old logic: seed the PRNG always with the same seed,
then use optimized bit rotations to tailor the RNG sequence per rank.
- New logic: seed the PRNG with an optimized seed per rank.
This has two advantages:
1. Less code and less computation to perform during magics search (not ROTL).
2. More choices for random sequence tuning. The old logic only let us choose
from 4096 bit rotation pairs. With the new one, we can look for the best seeds
among 2^64 values. Indeed, the set of seeds[][] provided in the patch reduces
the effort needed to find the magics:
64-bit SF:
Old logic -> 5,783,789 rand64() calls needed to find the magics
New logic -> 4,420,086 calls
32-bit SF:
Old logic -> 2,175,518 calls
New logic -> 1,895,955 calls
In the 64-bit case, init_magics() take 25 ms less to complete (Intel Core i5).
Finally, when playing with strength handicap, non-determinism is achieved
by setting the seed of the static RNG only once. Afterwards, there is no need
to skip output values.
The bench only changes because the Zobrist keys are now different (since they
are random numbers straight out of the PRNG).
The RNG seed has been carefully chosen so that the
resulting Zobrist keys are particularly well-behaved:
1. All triplets of XORed keys are unique, implying that it
would take at least 7 keys to find a 64-bit collision
(test suggested by ceebo)
2. All pairs of XORed keys are unique modulo 2^32
3. The cardinality of { (key1 ^ key2) >> 48 } is as close
as possible to the maximum (65536)
Point 2 aims at ensuring a good distribution among the bits
that determine an TT entry's cluster, likewise point 3
among the bits that form the TT entry's key16 inside a
cluster.
Details:
Bitset card(key1^key2)
------ ---------------
RKISS
key16 64894 = 99.020% of theoretical maximum
low18 180117 = 99.293%
low32 305362 = 99.997%
Xorshift64*, old seed
key16 64918 = 99.057%
low18 179994 = 99.225%
low32 305350 = 99.993%
Xorshift64*, new seed
key16 65027 = 99.223%
low18 181118 = 99.845%
low32 305371 = 100.000%
Bench: 9324905
Resolves#148
This is a regression from 428962a
We have to cast to char here, otherwise the compiler
interprets it as an integer, and writes a number.
No functional change
Resolves#122
Objects that are only accessible at file-scope should be put in the anonymous namespace.
This is what the C++ standard recommends, rather than using static, which is really C-style and results in static linkage.
Stockfish already does this throughout the code. So let's weed out the few exceptions,
because... they have no reason to be exceptional.
No functional change.
Resolves#84
It is more idiomatic, we didn't used it
in the past because Position::pretty(Move)
had a calling argument, but now we can.
As an added benefit, we avoid a lot of string
copies in the process because now we avoid
std::ostringstream ss.
No functional change.
Where they better belong.
Also, this removes '#include <string>' from types.h, which reduces the amount of code to compile (every
translation unit includes types.h).
No functional change.
The eval already returns zero in KK, KBK, KNK (see material.cpp). The difference is:
- we lose the "TB pruning" benefit of the draw rule (ie. search goes on even if eval is zero)
- we gain some speed by removing a useless test from the hot path
STC:
LLR: 0.05 (-2.94,2.94) [-3.00,1.00]
Total: 128000 W: 21357 L: 21560 D: 85083
LTC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 33023 W: 4613 L: 4509 D: 23901
bench 7461881
First, remove some dead code (function never called with a Move argument).
Then, remove printing of legal moves, which does not belong here. Let's keep commands orthogonal and minimal:
- the "d" command should display the board, nothing more, or less.
- "perft 1" will display the list of legal moves.
No functional change.
This apparentely silly tweak allows
to speed up the bench by almost 3%.
Not clear why, repeating with perft,
the speed up vanishes.
Suggested by Jonathan Calovski.
No functional change.
Big simplification of pawn move check.
Code has been tested with a brute force approach: for
every position reached during a bench search, the function
has been called for each combinations of Move(from, to)
and verified the result is the same of old code.
Actually this function is very critical becuase is the
one that ensures corrupted TT moves are discarded, so
to properly test it a simple bench is not enough.
Verified also speed is not changed.
No functional chnage.
Currently king has no material key associated because
it can never happen to find a legal position without
both kings, so there is no need to keep track of it.
The consequence is that a position with only the two
kings has material key set at zero and if the material
hash table is empty any entry will match and this is
wrong.
Normally bug is hidden becuase the checking for a draw
with pos.is_draw() is done earlier than evaluate() call,
so that we never check in gameplay the material key of a
position with two kings.
Nevertheless the bug is there and can be reproduced setting
at startup a position with only two kings and typing
'eval' from prompt.
The fix is very simple: add a random key also for the king.
Also fixed the condition in material.cpp to avoid asserting
when a 'just 2 kings' postion is evaluated.
No functional change.
Another SEE speed up that passed the SPRT short TC test!
LLR: 2.96 (-2.94,2.94) [-1.50,4.50]
Total: 81337 W: 15060 L: 14745 D: 51532
No functional change.
Verified there are no hidden bugs and is
actually a speed optimization:
Fixed games at 15+0.05 TC
ELO: 1.72 +-2.9 (95%) LOS: 87.5%
Total: 20000 W: 3741 L: 3642 D: 12617
No functional change
To align to same named Position function and
avoid using std::cout directly.
Also remove some stale <iostream> include while
there.
No functional change.
Don't need a struct here. Speed test shows
result is teh same. Moreover RKISS is used
mainly at startup to compute magics, so
prefer to keep it simple...RKISS ;-)
Also some assorted triviality while there.
No functional change.
Now that we use pre-increment on enums, it
make sense, for code style uniformity, to
swith to pre-increment also for native types,
although there is no speed difference.
No functional change.
ENABLE_OPERATORS_ON has incorrect definitions of
post-increment and post-decrement operators.
In particularly the returned value is the variable
already incremented/decremented, while instead they
should return the variable _before_ inc/dec.
This has no real effect because are only used in loops
and where the returned value is never used, neverthless
it is wrong. The fix would be to copy the variable to a
dummy, then inc/dec the variable, then return the dummy.
So instead, rename to pre-increment that can be implemented
without the dummy, actually the current implementation
it is already the correct pre-increment, with the only change
to return a reference (an l-value) and not a copy, so
to properly mimic the pre-increment on native integers.
Spotted by Kojirion.
No functional change.
Not a real functional change, but bench changed due to different piecelist
reordering. To verify it a temporary my canonicalize_rooks function was
written as follows. It just ensures that the rook on the "smaller" square
is listed first.
void Position::canonicalize_rooks(Color c)
{
if (pieceCount[c][ROOK] == 2)
{
Square s0 = pieceList[c][ROOK][0];
Square s1 = pieceList[c][ROOK][1];
if (s0 > s1)
{
pieceList[c][ROOK][0] = s1;
pieceList[c][ROOK][1] = s0;
index[s0] = 1;
index[s1] = 0;
}
}
}
With this both bench and the test on Chess960 positions
./stockfish bench 128 1 8 Chess960.epd file > /dev/null
Gives same result.
bench: 4424151
The new Position methods add_piece, move_piece, and remove_piece
now manage the member variables pieceList, pieceCount, and index,
and 9 blocks of code in Position that used to manipulate those
data structures by hand now call the new methods.
There is a slightly slowdown (< 1%) on Clang and on perft,
but the cleanup compensates the little speed loss.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
This very speed critical code was full of clever (!)
tricks and subtle details.
So I have rewritten it in a more straithforward way
and, as very often happens, result is even faster
than original.
No functional change.
This reverts commit 3e95800814
For some reason it fails the short TC test:
LLR: -2.96 (-2.94,2.94)
Total: 20033 W: 4214 L: 4265 D: 11554
bench: 4769737
It is somewhat unbilievable but our SEE is broken !
If the first SEE move is a king capture and square is
defended then SEE continues instead of breaking.
The bug shows only on normal SEE, not see_sign() so
probing with a:
dbg_hit_on_c(slIndex==1, captured == KING);
reports just a tiny:
Total 3465656 Hits 6646 hit rate (%) 0
Bug was there since Retire seeValues[] and move PieceValue[] out of Position of 26/6/2011 (!)
although for some reason didn't show immediately, indeed the
bougous patch was a "No functional change" (!!)
bench: 4699504
It is somewhat unbilievable but our SEE is broken !
If the first SEE move is a king capture and square is
defended then SEE continues instead of breaking.
The bug shows only on normal SEE, not see_sign() so
probing with a:
dbg_hit_on_c(slIndex==1, captured == KING);
reports just a tiny:
Total 3465656 Hits 6646 hit rate (%) 0
Bug was there since 351ef5c85b of 26/6/2011 (!)
although for some reason didn't show immediately, indeed the
bougous patch was a "No functional change" (!!)
bench: 4793754
Without patch we have 333198 nps, with patch 334249.
A very small +0.3%, not a lot manily becuase this is a
side path that is taken very few times.
Anyhow idea is correct becuase first 'quick' condition
has an hit rate of about 95%.
No functional change.
Add MOVE_NONE at the tail, this allows to loop
across MoveList checking for *it != MOVE_NONE,
and because *it is used imediately after compiler
is able to reuse it.
With this small patch perft speed increased of 3%
And it is also a semplification !
No functional change.
Use an optinal argument instead of a template
parameter. Interestingly, not only is simpler,
but also faster, perhaps due to less L1 instruction
cache pressure because we don't duplicate the very
used SEE code path.
No functional change.
Rename startPosPly to gamePly and increment/decrement
the variable after each do/undo move. This adds a little
overhead in do_move() but we will need to have the
game ply during the search for the next patches now
under test.
Currently we don't increment gamePly in do_null_move()
becuase it is not needed at the moment. Could change
in the future.
As a nice side effect we can now remove an hack in
startpos_ply_counter().
No functional change.