Clang 3.5 issues warning on constructs like: abs(f1 - f2). The thing is that
f1 and f2 are enum types, and the range given (all positive) allows the
compiler to choose an unsigned type (efficiency being one reason to prefer
unsigned arithmetic). If f1 < f2 are unsigned, then f1 - f2 wraps around zero
and the abs() becomes a no-op. It's the reinterpretation of the unsigned
result (large value) as a signed int that happens to give the correct result,
thanks to 2's complement. This is all tricky and dangerous!
In the spirit of the standard, we assume nothing on the signedness of enums,
and simply calculate the rank and file distances as:
- rank_dist(r1, r2) = r1 < r2 ? r2 - r1 : r1 - r2
- file_dist(f1, f2) = f1 < f2 ? f2 - f1 : f1 - f2
this logic can in fact be applied to any enum we may use, so for better
generality and to avoid code duplication, we use a template function diff()
here.
No functional change.
Resolves#95
This area has become obscure and tricky over the course of incremental
changes that did not respect the original's consistency and clarity. Now,
it's not clear why we use MAX_PLY = 120, or why we use MAX_PLY+6, among
other things.
This patch does the following:
* ID loop: depth ranges from 1 to MAX_PLY-1, and due to TT constraint (depth
must fit into an int8_t), MAX_PLY should be 128.
* stack[]: plies now range from 0 to MAX_PLY-1, hence stack[MAX_PLY+4],
because of the extra 2+2 padding elements (for ss-2 and ss+2). Document this
better, while we're at it.
* Enforce 0 <= ply < MAX_PLY:
- stop condition is now ss->ply >= MAX_PLY and not ss->ply > MAX_PLY.
- assert(ss->ply < MAX_PLY), before using ss+1 and ss+2.
- as a result, we don't need the artificial MAX_PLY+6 range. Instead we
can use MAX_PLY+4 and it's clear why (for ss-2 and ss+2).
* fix: extract_pv_from_tt() and insert_pv_in_tt() had no reason to use
MAX_PLY_PLUS_6, because the array is indexed by plies, so the range of
available plies applies (0..MAX_PLY before, and now 0..MAX_PLY-1).
Tested with debug compile, using MAX_PLY=16, and running bench at depth 17,
using 1 and 7 threads. No assert() fired. Feel free to submit to more severe
crash-tests, if you can think of any.
No functional change.
Now that half-plies are no more used we can simplify
the code assuming that ONE_PLY is 1 and no more 2.
Verified with a SMP test:
LLR: 2.95 (-2.94,2.94) [-4.50,0.00]
Total: 8926 W: 1712 L: 1607 D: 5607
No functional change.
Where they better belong.
Also, this removes '#include <string>' from types.h, which reduces the amount of code to compile (every
translation unit includes types.h).
No functional change.
Are broken for big-endian case and
I have verified with MSVC 2013 Premium
bench is correct and there is no
miscompilation, so the main reason
to change the original code drops.
No functional change.
Book handling belongs to GUI, we kept this code
for historical reasons, but nowdays there is
really no need of this old, (mostly) unused
and especially incorrect designed functionality.
It is up to the GUI to choose the book (far easier for
the user) and to select the book parameters. In no
place, including fishtest, TCEC, rating lists, etc.
the "own book" is used, moreover currently SF is
released without any book and even if in the future we
bundle a book in the release package, it will be the GUI
that will take care of it.
This corrects a wrong design decision that Galurung
and later Stockfish inherited from what was common
practice many yeas ago.
No functional change.
Remove the optimization for Intel, is not
standard and can break at any time, moreover
our release build is not done with Intel C++
anymore so we don't need to sqeeze the extra
speed out from this compiler.
No functional change.
Intel Haswell and newer CPUs can calculate sliders
attacks using special PEXT asm instructions instead
of magic bitboards. This gives a +3% speed up.
To enable it just compile with ARCH=x86-64-bmi2
No functional change.
Retire software pext and introduce hardware
call when USE_PEXT is defined during compilation.
This is a full complete implementation of sliding
attacks using PEXT.
No functional change.
In some legal positions like this one:
R6R/3Q4/1Q4Q1/4Q3/2Q4Q/Q4Q2/Np1Q4/kB1N1KB1 b -- 0 1
We can have a very high score, in this case 30177 and 29267
for midgame and endgame respectively, and because
VALUE_INFINITE = 30001 we have an assert in interpolate()
Midgame and endgame scores are stored in 16 bit signed integers
so we can rise VALUE_INFINITE a little bit. This does not fix
the possibility of overflow in general case, just makes the
condition more difficult to trigger and anyhow better uses all
the score width.
Raising VALUE_INFINITE to 32000 seems to fix the problem for this
particular case.
No functional change.
Under some very rare case 100 plies of search
could be not enough. Increasing more could lead
to crashes due to reached stack size limit on
some platforms.
Strongly requested by Uri.
bench: 8430785
With some positions like
8/8/8/2p2K2/1pp5/br1p1b2/2p2r2/qqkqq3 w - -
The eval score is higher than VALUE_INFINITE because
is the sum of VALUE_KNOWN_WIN plus a big material
advantage. This leads to an assert. Here are the
steps to reproduce:
Compile SF with debug=yes then do
./stockfish
position fen 8/8/8/2p2K2/1pp5/br1p1b2/2p2r2/qqkqq3 w - -
go depth 1
This patch fixes the issue in this case, but do exsist
other positions for which the patch is not enough and
we will need to limit the eval score to be sure not
overflow the limit.
Note that is not possible to increase the value of
VALUE_INFINITE because should remain within int16_t
type to be stored in a TT entry.
bench: 7356053
An old idea retested at SPRT(0, 3) with 60+0.05 TC:
LLR: 2.95 (-2.94,2.94) [0.00,3.00]
Total: 98872 W: 15549 L: 15123 D: 68200
This is a very small elo increase patch so it really
stresses the limits of fishtest.
bench: 8596156
Don't need a struct here. Speed test shows
result is teh same. Moreover RKISS is used
mainly at startup to compute magics, so
prefer to keep it simple...RKISS ;-)
Also some assorted triviality while there.
No functional change.
This should be enough for any legal position, even
the handcrafted ones, like the one presented by Reuven:
1Q5R/4Q1K1/B1Q5/B4Q2/N2Q4/pQ4Q1/pn2Q3/krQ4R w - -
Where currently we crash. This reverts the patch
0049d3f337 of 8/4/2012 where stack
was shrinked due to crashes while in deep analysys.
No functional change.
ENABLE_OPERATORS_ON has incorrect definitions of
post-increment and post-decrement operators.
In particularly the returned value is the variable
already incremented/decremented, while instead they
should return the variable _before_ inc/dec.
This has no real effect because are only used in loops
and where the returned value is never used, neverthless
it is wrong. The fix would be to copy the variable to a
dummy, then inc/dec the variable, then return the dummy.
So instead, rename to pre-increment that can be implemented
without the dummy, actually the current implementation
it is already the correct pre-increment, with the only change
to return a reference (an l-value) and not a copy, so
to properly mimic the pre-increment on native integers.
Spotted by Kojirion.
No functional change.
And #ifdef instead of #if defined
This is more standard form (see for example iostream file).
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Partially revert previous patch and use
unlikey() just as code annotation.
Actually it is better to rely on a profiler for branch prediction:
http://blog.man7.org/2012/10/how-much-do-builtinexpect-likely-and.html
"In fact, even when only one in ten thousand values is nonzero,
we're still at only roughly the break-even point"
No functional change,
This patch is the sum of:
- Grainsize of 4 instead of 8
- Removing "depth < DEPTH_ZERO"
- Change DEPTH_QS_RECAPTURES = -5 to -7
All the patches individually failed to pass SPRT but scored
around 50%.
Together they pass easily short TC:
LLR: 2.96 (-2.94,2.94)
Total: 4429 W: 964 L: 844 D: 2621
And with some difficult long TC of 60+0.05:
LLR: 2.95 (-2.94,2.94)
Total: 64133 W: 11968 L: 11532 D: 40633
bench: 4821467
But this time do not play with pointers, in
particular do not assume that size_t is an
unsigned type of the same width as pointers.
This code should be fully portable.
No functional change.
Let TT clusters (16*4=64 bytes) to hold on a singe cache line.
This avoids the need for the double prefetch.
Original patches by Lucas and Jean-Francois that has also tested
on his AMD FX:
BIG HASHTABLE
./stockfish bench 1024 1 18 > /dev/null
Before:
1437642 nps
1426519 nps
1438493 nps
After:
1474482 nps
1476375 nps
1475877 nps
SMALL HASHTABLE
./stockfish bench 128 1 18 > /dev/null
Before:
1435207 nps
1435586 nps
1433741 nps
After:
1479143 nps
1471042 nps
1472286 nps
No functional change.
Rename to insertion_sort so to avoid confusion
with std::sort, also move it to movepicker.cpp
and use the bit slower std::stable_sort in
search.cpp where it is used in not performance
critical paths.
No functional change.
Test by Joona prooves the new feature don't value 70 added lines of code.
Grand totals after 10040 games (crashes: 0) for tt_both
master_9edc7 - 6a93488_6a934: 1756 - 1688 - 6596 ELO +2 (+- 2.7)
Confirmed by test of Gary:
After 8680 games:
ELO: 0.80 +- 99%: 9.62 95%: 7.31
LOS: 65.38%
Wins: 1288 Losses: 1268 Draws: 6130
Thanks a lot to both for testing it !!!
bench 5149248
This is more complex than what I'd like but I
was unable to split in small chunks.
Here we add 2 slots to TTEntry (valueUpper and depthUpper)
so that sizeof(TTEntry) returns to the original 16 bytes
and we can pack exactly 4 entries in a 64 bytes cache line.
Now we save an upper bound score alongside a lower (exact)
score. The idea is to increase TT cut-offs rates becuase
there is now an higher probability for a node to use TT info.
This patch is highly experimental and probably needs further
steps as is hinted by an unrealistic bench number:
bench: 2022385
In almost all cases we already know in advance that
color_of() argument is different from NO_PIECE.
So avoid the check for NO_PIECE in color_of() and
test at caller site in the very few places where
this case could occur.
As a nice side effect, this patch fixes a (bogus)
warning under some versions of gcc.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Both Lucas re-test and Jean-Francois confirrm it
is a regression.
Here Jean-Francois's results after 3600 games :
Score of 96d3b1c92b vs 3b87314: 690 - 729 - 2181 [0.495] 3600
ELO: -3.86 +- 99%: 14.94 95%: 11.35
LOS: 15.03%
Wins: 690 Losses: 729 Draws: 2181 Total: 3600
Bench: 5404066
Don't prune and eventually extend check moves of type
DISCO_CHECK (pun intended, Lucas will understand :-) ).
Patch from Lucas Braesch that has also tested it:
Result: 879-661-2137, score=52.96%, LOS=100.00% (time 10"+0.1")
I have started a verification test right now.
bench: 6004966
We send to GUI multi-pv info after each cycle,
not just once at the end of the PV loop. This is
because at high depths a single root search can
be very slow and we want to update the gui as
soon as we have a new PV score.
Idea is good but implementation is broken because
sort() takes as arguments a pointer to the first
element and one past the last element.
So fix the bug and rename sort arguments to better
reflect their meaning.
Another hit by Hongzhi Cheng. Impressive!
No functional change.
When using the Makefile (as for the mingw case),
IS_64BIT and USE_BSFQ are already set with
ARCH=x86-64 and do not need to be redefined.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Are used by Position but do not belong to that class,
there is only one instance of them (that's why were
defined as static), so move to a proper namespace instead.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Calculate the attacks only for the piece to disambiguate,
not for all.
Also some reformatting while there.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Error is due to ambiguous overloading of operator^
because we have both the built-in operator^(int, int)
and the user defined operator^(Bitboard, Square).
This error does not trigger when using Makefile becuase,
due to luck, the user defined operator^(Bitboard, Square)
happens to be always defined _after_ opposite_colors() so
that compiler does not claim. But in case of Microsoft MSVC
we don't have a Makefile and the order of files compilation
is chosen by the compiler (in an unpredictable way). So it
could still happen that error is not detected (as in my case),
but in another case the order of compilation of the files could
be so that at some point both operator^ were defined before
opposite_colors() and this triggers the error.
The fix is much simpler than the explanation :-)
Reported by Quocvuong82.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
To be aligned with PolyGlot book castle right definitions.
This will be used by next patch.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
It seems ADL lookup is broken with the STLPort library. Peter says:
The compiler is gcc 4.4.3, but I don't know how many patches they
have applied to it. I think gcc has had support for Koenig lookup
a long time. I think the problem is the type of the vector iterator.
For example, line 272 in search.cpp:
if (bookMove && count(RootMoves.begin(), RootMoves.end(), bookMove))
gives the error:
jni/stockfish/search.cpp:272: error: 'count' was not declared in this scope
Here RootMoves is:
std::vector<RootMove> RootMoves;
If std::vector<T>::iterator is implemented as T*, then Koenig lookup
would fail because RootMove* is not in namespace std.
I compile with the stlport implementation of STL, which in its vector
class has:
typedef value_type* iterator;
I'm not sure if this is allowed by the C++ standard. I did not find
anything that says the iterator type must belong to namespace std.
The consensus in this thread
http://compgroups.net/comp.lang.c++.moderated/argument-dependent-lookup/433395
is that the stlport iterator type is allowed.
Report and patch by Peter Osterlund.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Hex representation doesn't add any value in those cases.
Preserve hex representation where more self-documenting
for instance for binary masks values.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
In Young Brothers Wait Concept (YBWC) available slaves are
booked by the split point master, then start to search below
the assigned split point and, once finished, return in idle
state waiting to be booked by another master.
This patch introduces "Active Reparenting" so that when a
slave finishes its job on the assigned split point, instead
of passively waiting to be booked, searches a suitable active
split point and reprents itselfs to that split point. Then
immediately starts to search below the split point in exactly
the same way of the others split point's slaves. This reduces
to zero the time waiting in idle loop and should increase
scalability especially whit many (8 or more) cores.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Shrinking from [16] to [2][2] is able to speedup
perft of start position of almost 5% !
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Shrink dimensions of the biggest stack consumers arrays.
In particular movesSearched[] can be safely shrinked
without any impact on strenght or risk of crashing.
Also MAX_PLY can be reverted to 100 with almost no impact
so to limit search recursion and hence stack allocation.
A different case is for MAX_MOVES (used by Movepicker's
moves[]), because we know that do exsist some artificial
position with about 220 legal moves, so in those cases SF
will crash. Anyhow these cases are never found in games.
An open risk remains perft, especially run above handcrafted
positions.
This patch originates from a report by Daylen that found
SF crashing on his Mac OS X 10.7.3 while in deep analysys
on the following position:
8/3Q1pk1/5p2/4r3/5K2/8/8/8 w - - 0 1
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Unfortunatly accessing thread local variable
is much slower than object data (see previous
patch log msg), so we have to revert to old code
to avoid speed regression.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Much faster then pthread_getspecific() but still a
speed regression against the original code.
Following are the nps on a bench:
Position
454165
454838
455433
tls
441046
442767
442767
ms (Win)
450521
447510
451105
ms (pthread)
422115
422115
424276
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
As it was in Glaurung times. Also rearranged order
so that byTypeBB[0] is accessed before byTypeBB[x]
to be more cache friendly. It seems there is even
a small speedup.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Result of t.time * 1000 should be a 64 bit value, not an int.
Bug reported by several users.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Introduce and use a new Time class designed after
QTime, from Qt framework. Should be a more clear and
self documented code.
As an added benefit we now use 64 bits internally to get
millisecs from system time. This avoids to wrap around
to 0 every 2^32 milliseconds, which is 49.71 days.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
There is no need to limit the maximum ply searched to
100, with deep exclusion search extensions we could
reach it even with much smaller search depths.
The only drawback is an increase in stack usage, but
is limited mainly to id_loop(), in particular the
recursive search() functions are not affected.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
This allows to retire ClearMaskBB[] and use just
one SquareBB[] array to set and clear a bit.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
And make CRITICAL_SECTION locks the only option for Windows.
This guarantees backward compatibility with all the Windows
versions (even XP and older) and an hassle free experience
when compiling for Windows. Tests performed by Ingo and
reported on talkchess confirm there is no speed penalty
against the most modern SRW locks:
http://www.talkchess.com/forum/viewtopic.php?t=41835&start=20
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
The aim is to have shorter names without losing
readibility but, if possible, increasing it.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
It was meant to build a single binary optimized
for any kind of CPU: with and without hardware POPCNT.
This is a nice idea but in practice was never used, or
people builds binary with popcnt enabled or not, mainly
according to their type of CPU. And it was also never
used in the official Jim's builds where, in case, would
be easier for a number of reasons, do build two different
versions: with and without SEE42 support.
So retire this feature and simplify the code.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
It is simple but somewhat tricky code that deserves
a bit of documentation. A bit of renaming while there.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Be consistent with the way these operators are defined
in plain C (and in C++).
Spotted by Lucas Braesch.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Triggered by a comment of Eelco on talkchess. Also
a bit of cleanup while there.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
They don't add any value given that the corresponding
table has global visibility anyhow.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
This avoids search explosion in qsearch for some
patological cases like:
r1n1n1b1/1P1P1P1P/1N1N1N2/2RnQrRq/2pKp3/3BNQbQ/k7/4Bq2 w - - 0 1
After 9078 games 20"+0.1 QUAD:
Mod vs Orig 1413 - 1319 - 6346 ELO +3 (+- 4)
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
The only interesting thing is that a backward or isolated
pawn cannot be a candidate passer, so code this condition.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Warning C4146: unary minus operator applied to
unsigned type, result still unsigned.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
MSVC (and possibly other compilers) does not inline
as requested, so force it to do so.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
There was a strange "int16" type and "int64_t"
was defined twice.
Spotted by Joona.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
And set them as default.
Introduce compile switch OLD_LOCKS to allow to fallback on
compatible locks supported by Windows XP and older versions.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
It is a redundant boiler plate, just call initialization and
resource release directly from main()
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Plus some other icc warnings popped up with new and strictier
compile options.
No functional and speed change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
To avoid nasty bugs due to silently overriding of
common operator we enable the templates on a type
by type base using partial template specialization.
No functional change, zero overhead at runtime.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>