1
0
Fork 0
Commit Graph

3940 Commits (1e76ba7cec535e431a8d7486bfd41b3efbfad68e)

Author SHA1 Message Date
VoyagerOne e27d3bb884 Use explicit logic for pruning
Also a speedup since we don't need to recalculate SEE
for extensions...as it already determined to be positive.

Results for 12 tests for each version:

        Base      Test      Diff
Mean    2132395   2191002   -58607
StDev   128058    85917     134239
p-value: 0.669
speedup: 0.027

Non functional change.
2016-10-18 08:53:51 +02:00
Jacques 16e1881126 Fixes for ARM compilation: take 2
The target:

Odroid U3 (http://www.hardkernel.com/main/products/prdt_info.php?g_code=g138745696275)
Debian Jessie
As listed in #550 and #638 three modifications are needed for compilation to work:

float-abi flag for GCC If an FPU is present and supported by the installed os then passed value need to be hard.
I didn't find any better solution than using readelf to check for the availibilty of Tag_ABI_VFP_args which sould indicate support for the FPU. The check is only done if the arch is arm and if readelf is not present
on the system, there will be an error (/bin/sh: 1: readelf: not found) but it will not break and will continue with the default softfp value. Outputing the error is not really acceptable but I wanted some feedback on the
check itself.

-lpthread is needed on armv7 outside of Android
I replaced UNAME with KERNEL and OS to allow to differentiate Android.

m32 flag
My understanding is that outside of Android the flag is generating errors on armv7.

These modifications should introduce change only for non Android armv7 build.

No functional change.
2016-10-14 08:58:07 +02:00
Marco Costalba e1f600f186 Revert "Fixes for ARM compilation"
This reverts commit a3fe80c36a.

Break compilation on mingw for me.
2016-10-13 08:36:30 +02:00
Jacques a3fe80c36a Fixes for ARM compilation
The target:

Odroid U3 (http://www.hardkernel.com/main/products/prdt_info.php?g_code=g138745696275)
Debian Jessie
As listed in #550 and #638 three modifications are needed for compilation to work:

float-abi flag for GCC If an FPU is present and supported by the installed os then passed value need to be hard.
I didn't find any better solution than using readelf to check for the availibilty of Tag_ABI_VFP_args which sould indicate support for the FPU. The check is only done if the arch is arm and if readelf is not present
on the system, there will be an error (/bin/sh: 1: readelf: not found) but it will not break and will continue with the default softfp value. Outputing the error is not really acceptable but I wanted some feedback on the
check itself.

-lpthread is needed on armv7 outside of Android
I replaced UNAME with KERNEL and OS to allow to differentiate Android.

m32 flag
My understanding is that outside of Android the flag is generating errors on armv7.

These modifications should introduce change only for non Android armv7 build.

No functional change.
2016-10-13 08:34:04 +02:00
Marco Costalba fdf3a51c68 AppVeyor: run bench after build
And show resulting bench signature.

The run is very slow becuase optimizations
are all disabled by default /Od /RTC1

No functional change.
2016-10-10 21:00:59 +02:00
Marco Costalba e61f7b1e6d Add AppVeyor integration
It is like Trevis CI but for Windows platform.

Currently just compile builds, wthouth benching
the resulting executable.

No functional change.
2016-10-10 16:29:29 +02:00
ajithcj f799610d4b Simplify futility pruning return value
Return eval as it is while doing futility pruning.

STC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 167687 W: 29778 L: 29904 D: 108005

LTC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 26905 W: 3503 L: 3390 D: 20012

Bench: 5936728
2016-10-09 09:54:43 +02:00
atumanian 073eed590e Optimisation of Position::see and Position::see_sign
Stephane's patch removes the only usage of Position::see, where the
returned value isn't immediately compared with a value. So I replaced
this function by its optimised and more specific version see_ge. This
function also supersedes the function Position::see_sign.

bool Position::see_ge(Move m, Value v) const;

This function tests if the SEE of a move is greater or equal than a
given value. We use forward iteration on captures instread of backward
one, therefore we don't need the swapList array. Also we stop as soon
as we have enough information to obtain the result, avoiding unnecessary
calls to the min_attacker function.

Speed tests (Windows 7), 20 runs for each engine:
Test engine: mean 866648, st. dev. 5964
Base engine: mean 846751, st. dev. 22846
Speedup: 1.023

Speed test by Stephane Nicolet

Fishtest STC test:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 26040 W: 4675 L: 4442 D: 16923
http://tests.stockfishchess.org/tests/view/57f648990ebc59038170fa03

No functional change.
2016-10-08 06:38:36 +02:00
Stéphane Nicolet 1e586288ca Do not use SEE in evasion scoring
Idea by Aram Tumanian (atumanian)

STC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 43889 W: 7849 L: 7767 D: 28273

LTC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 29333 W: 3809 L: 3700 D: 21824

Bench: 6421663
2016-10-06 00:00:27 +02:00
Stefano Cardanobile 0162fb83c2 Retire implicit malus for stonewalls
STC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 75864 W: 13466 L: 13437 D: 48961

LTC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 99050 W: 12472 L: 12451 D: 74127

bench: 6098474
2016-10-05 09:32:08 +02:00
VoyagerOne ab26c61971 Allow inCheck pruning
This is a bit tricky because we don't want
to prune the only legal evasions, even if
with negative SEE. So add an assert to avoid
this subtle bug to slip in later.

STC:
LLR: 2.96 (-2.94,2.94) [0.00,4.00]
Total: 14140 W: 2625 L: 2421 D: 9094

LTC:
LLR: 2.95 (-2.94,2.94) [0.00,4.00]
Total: 11558 W: 1555 L: 1379 D: 8624

bench: 5256717
2016-10-03 16:18:53 +02:00
Marco Costalba eccccba0ce Remove useless razoring condition
Condition is always true! For any value of the
array index! Even an out of bound array, like
razor_margin[120]!!!!

No functional change.
2016-09-29 15:24:36 +02:00
HiraokaTakuya b77bae0529 Make razor_margin[4] ONE_PLY value independent
No functional change.
2016-09-29 15:20:07 +02:00
Stéphane Nicolet 7ae3c05795 Rename shift_bb() to shift()
Rename shift_bb() to shift(), and DELTA_S to SOUTH, etc.
to improve code readability, especially in evaluate.cpp
when they are used together:

    old b = shift_bb<DELTA_S>(pos.pieces(PAWN))
    new b = shift<SOUTH>(pos.pieces(PAWN))

While there fix some small code style issues.

No functional change.
2016-09-25 10:45:10 +02:00
joergoster 351844061e Allowing singular extension in mate positions
Drop useless condition

abs(ttValue) < VALUE_KNOWN_WIN

And extend singular extension search to cases when ttValue
stores a mate score. This improves mate finding and does
not introduce any regression.

Yery tested this patch against current master on the 6500+
Chest mate suite with 200K fixed nodes:

    shortest mates found: master: 1206 patch:1205
    any mate found: master: 1903 patch: 2003

with 1 sec time:

    shortest mates found: master: 2667 patch: 2628
    any mate found: master: 3585 patch: 3646

Verified for no regression:

STC
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 25655 W: 4578 L: 4465 D: 16612

LTC
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 66247 W: 8618 L: 8557 D: 49072

bench: 6335042
2016-09-24 19:56:02 +02:00
Marco Costalba 8662bdfa12 Fix crash when passing a mate/stalemate position
Both Tablebases::filter_root_moves() and
extract_ponder_from_tt(9 were unable to handle
a mate/stalemate position.

Spotted and reported by Dann Corbit.

Added some mate/stalemate positions to bench so
to early catch this regression in the future.

No functional change.
2016-09-24 07:37:52 +02:00
Stéphane Nicolet 28240d375c Simplify pinners conditions in SEE()
Use the following transformations:

- to check that A is included in B, testing "(A & ~B) == 0" is faster
than "(A & B) == A"

- to remove the intersection of A and B from A, doing "A &= ~B;" is as
fast as "if (A & B) A &= ~B;" but is simpler.

Overall, the simpler patch version is 0.3% than current master.

No functional change.
2016-09-22 08:31:23 +02:00
Guenther Demetz 943ae89be1 Fix pin-aware SEE
Correct pinners calculation and fix bug with pinned
pieces giving check. With this patch 'pinners' only
returns sliders with exactly one defensive piece between
the slider and the attacked square (in other words, pinners
returns exact pinners).

This was a co-operation between Marco Costalba,
Stphane Nicolet and me.

Special thanks to Ronald de Man for reporting the bug with
pinned pieces giving check, discussed here:
https://groups.google.com/forum/?fromgroups=#!topic/fishcooking/S_4E_Xs5HaE

STC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 132118 W: 23578 L: 23645 D: 84895

LTC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 36424 W: 4770 L: 4670 D: 26984

bench: 6272231
2016-09-21 08:42:25 +02:00
Joost Vandevondele 4b0043ae7c Use fixed depth bench to make PGO builds more reproducible
Discussed on fishcooking

proposal and objdump verification:
https://groups.google.com/d/msg/fishcooking/4_ausUwMXP0/EGPsMYqOFAAJ

verified no significant speed difference between depth and time:
https://groups.google.com/d/msg/fishcooking/4_ausUwMXP0/KazW5QZmFgAJ

stockfish_time - stats:
mean = 2207232.56        std = 7079.51        std/mean = 0.003207

stockfish_depth - stats:
mean = 2201783.57        std = 6356.69        std/mean = 0.002887

No functional change
2016-09-18 08:13:34 +02:00
Marco Costalba 92f01aa2bd Fix a warning with MSVC
warning C4706: assignment within conditional expression

No functional change.
2016-09-17 10:14:28 +02:00
Stéphane Nicolet ea41f18e6e Swap mg and eg in internal representation of Score
Instrumentation shows that in make_score(mg, eg) calls, the mg value is
zero in 25,9% of the calls while the eg value is zero in 36,8% of the
calls.

Swapping the internal fields of mg and eg in the internal
representation of Score allows the compiler to optimize away the shift
in (eg << 16) + mg in more cases, thus resulting in a 0.3% speed-up
overall.

No functional change
2016-09-17 09:56:36 +02:00
Marco Costalba 057d710fc2 Fix indentation in struct FromToStats
And other little trivial stuff.

No functional change.
2016-09-17 09:51:20 +02:00
Stéphane Nicolet 01f2466f6e Retire KingDanger array
Rescales the king danger variables in evaluate_king() to
suppress the KingDanger[] array. This avoids the cost of
the memory accesses to the array and simplifies the non-linear
transformation used.

Full credits to "hxim" for the seminal idea and implementation,
see pull request #786.
https://github.com/official-stockfish/Stockfish/pull/786

Passed STC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 9649 W: 1829 L: 1689 D: 6131

Passed LTC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 53494 W: 7254 L: 7178 D: 39062

Bench: 6116200
2016-09-16 08:30:06 +02:00
Marco Costalba 5c58d1f5cb Use per-thread counterMoveHistory
Drops a scalability bottleneck due to memory contention
of a single shared table across threads. The effect starts
to be sensible with a high number of threads. Specifically
we have a small regression with 7 threads both at 60 and
180 seconds TC:

10000 @ 60+0.6 th 7
ELO: -2.46 +-3.2 (95%) LOS: 6.5%
Total: 9896 W: 1037 L: 1107 D: 7752

5000 @ 180+0.6 th 7
ELO: -1.95 +-4.1 (95%) LOS: 17.7%
Total: 5000 W: 444 L: 472 D: 4084

We have a regression because counterMoveHistory table is
quite big and it takes time for a single thread to fill it.
Sharing the table yields to a higher fill rate and better
quality of moves and up to 7 threads the benefits of sharing
more then compensate the loss in speed due to contention.
Interestingly even with a 3X longer TC, so with more time
for the single thread to catch up, the improvment is quite
limited and below noise level. It seems we really need much
longer TC to saturate the table.

When we move to high threads number it's another story:

5000 @ 60+0.6 th 22
ELO: 3.49 +-4.3 (95%) LOS: 94.6%
Total: 4880 W: 490 L: 441 D: 3949

2000 @ 60+0.6 th 32
ELO: 8.34 +-6.9 (95%) LOS: 99.1%
Total: 2000 W: 229 L: 181 D: 1590

As expected the speed-up more than compensates the filling
rate, and we expect that with tournament TC, where single
thread is able to saturate the table, the difference will
be even stronger. For instance for TCEC 9 super-final time
control will be 180 minutes + 15 seconds and this scalability
improvement seems definitely the way to go.

So, summarizing:

GOOD:

Measured big improvement in high core scenario

Suitable for TCEC 9 superfinal (big hardware, very long TC)

Consistent and natural patch that extends to counterMoveHistory
what we already do for remaining history tables, that are all per-thread

Non functional change for the common case of a single core

Very simple (just 6 lines modified, no added ones)

BAD:

Small regression (within 2-3 ELO) with few threads and short TC

bench: 5341477
2016-09-16 08:15:07 +02:00
Marco Costalba b96dd754ed Renaming in MovePicker
Rename stages and simplify a bit the code.

No functional change.
2016-09-15 09:07:49 +02:00
Marco Costalba 01ee509a5c Retire MovePicker::see_sign()
No more used after last patch.

No functional change.
2016-09-14 15:43:56 +02:00
VoyagerOne 95ad2b51b7 Tweak SEE margin in pruning conditions
Use 35 * depth^2 to calculate see_margin.

STC:
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 22636 W: 4212 L: 3990 D: 14434

LTC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 47241 W: 6314 L: 6041 D: 34886

The Movepick SEE is now dead code, retire it.

Bench: 5341477
2016-09-14 15:38:38 +02:00
syzygy 438805aee8 Integrate next_stage() logic into next_move()
Measured bench speed up goes from 0,7% to 2%,
given the unreliable measure a reverse simmplification
test was done on fishtest:

master vs patch
LLR: -2.94 (-2.94,2.94) [-3.00,1.00]
Total: 15499 W: 2685 L: 2867 D: 9947

Test result is positive, master is weaker.

No functional change.
2016-09-13 07:14:09 +02:00
Guenther Demetz ace8e951d7 Simplify code for pinaware SEE
This is the most compact and neatest version
is was able to produce.

On normal builds I have a small slowdown:
normal builds base vs. simplification (gcc 4.8.1 Win7-64 i7-3770 @ 3.4GHz x86-64-modern)
Results for 20 tests for each version:

        Base      Test      Diff
Mean    1974744   1969333   5411
StDev   11825     10281     5874
p-value: 0,178
speedup: -0,003

On pgo-builds however I measure a nice 1.1% speedup

pgo-builds base vs. simplification
Results for 20 tests for each version:

        Base      Test      Diff
Mean    1974119   1995444   -21325
StDev   8703      5717      4623
p-value: 1
speedup: 0,011

No functional change.
2016-09-12 15:45:00 +02:00
Guenther Demetz 90ce24b11e Pinned aware SEE
Don't allow pinned pieces to attack the exchange-square as long all
pinners (this includes also potential ones) are on their original
square.
As soon a pinner moves to the exchange-square or get captured on it, we
fall back to standard SEE behaviour.

This correctly handles the majority of cases with absolute pins.

bench: 6883133
2016-09-12 09:31:09 +02:00
Stefano Cardanobile 4c95edddbf Reorder evaluation start
In evaluate, we start by initializing the pos.psq_score
and adding the material imbalance. After that, we check
whether a specialized eval exists and if yes we return
that value and discard whatever we have computed until now.

It sounds more logical to first probe material entry and
return if we have a specialized eval, and only if it is
not the case initialize eval with some values. There is
no measurable speed-difference on my computer.

Non functional change.
2016-09-11 07:42:12 +02:00
Marco Costalba 602d7fbb07 Use Movepick SEE value in search
This halves the calls to the costly pos.see_sign(),
speed up is about 1-1.3%

Non functional change.
2016-09-09 17:11:54 +02:00
Marco Costalba d909d10f33 Refactor previous patch
No functional change.
2016-09-08 06:02:42 +02:00
ajithcj 38428ada54 Prune dangerous moves at low depth
At very low depths prune captures,
promotions and checks if see is negative.

STC:
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 6772 W: 1328 L: 1173 D: 4271

LTC:
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 8917 W: 1270 L: 1122 D: 6525

bench: 6024713
2016-09-08 05:55:10 +02:00
Marco Costalba e340ce221c Syntactic sugar to loop across pieces
Also add some comments to the new operator~(Piece).

No functional change.
2016-09-04 15:33:17 +02:00
syzygy ca6c9f85a5 Change from [Color][PieceType] to [Piece]
Speed up of almost 1% in both normal and
pgo builds.

No functional change.
2016-09-04 09:22:09 +02:00
Marco Costalba c5828c4eba Fix syzygy with partial TB
In case we have installed a not complete set of 6-men tables and
there is 6 piece position on board, but no corresponding
tablebase engine is not using any syzygy at all.

Reported by Jouni Uski, fix by Peter Österlund,
confirmed as a bug by Ronald de Man.

bench: 7591630
2016-09-03 08:21:05 +02:00
Stéphane Nicolet d37dfe9ae4 Space bonus in presence of open files
If the opponent has a cramped position, opening a file often
helps him/her to exchange pieces, so it makes sense to reduce
the space bonus if there are open files.

Credits: Leonardo Ljubičić for the strategic idea, Alain Savard for the
implementation of the open files calculation, "CrunchyNYC" for the
compensation of the numerator.

STC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 49112 W: 9239 L: 8900 D: 30973

LTC:
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 89415 W: 12014 L: 11601 D: 65800

Bench: 7591630
2016-09-03 00:04:20 +02:00
lucasart 13b4444d9e Change exclusion key setup
Should depend on which move is excluded. This
allow us to remove the dedicated Position::exclusion_key().

STC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 59814 W: 11136 L: 11083 D: 37595

LTC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 31023 W: 4187 L: 4080 D: 22756

bench 7553379
2016-09-02 08:37:01 +02:00
Stefano80 7f2eb10e93 Retire linear imbalance
Retire linear imbalance and compensate
in piece values enumeration.

STC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 43596 W: 8105 L: 8023 D: 27468

LTC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 24482 W: 3352 L: 3237 D: 17893

Bench: 7777707
2016-09-02 08:25:17 +02:00
ajithcj 5cffc032da Optimize order of a few conditions in search
Also fix size of KingDanger array to reduce memory footprint.

Small speed up of around 0.5%

No functional change.
2016-08-31 13:47:45 +02:00
VoyagerOne 2731bbaf6b Remove condition on killers in history pruning
Now allows main killer to be history prune.

STC:
LLR: 2.94 (-2.94,2.94) [-3.00,1.00]
Total: 15852 W: 2910 L: 2781 D: 10161

LTC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 56428 W: 7610 L: 7537 D: 41281

Bench: 8032058
2016-08-30 09:09:55 +02:00
Stefan Geschwentner 6aa9308f08 Tweak probcut threshold
Use better threshold for capture move generation.

STC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 23265 W: 4415 L: 4188 D: 14662

LTC:
LLR: 2.95 (-2.94,2.94) [0.00,4.00]
Total: 36618 W: 5083 L: 4836 D: 26699

bench: 7030088
2016-08-29 22:15:03 +02:00
Andrew Grant b4f6728e61 Removed an extra space
No functional change.
2016-08-28 09:47:30 +02:00
Alain SAVARD 2b57b61cb1 Move king tropism to evaluate_king
No functional change.
2016-08-28 08:49:40 +02:00
Marco Costalba 1ee2838214 Retire CheckInfo
Move its content directly under StateInfo.

Verified for no speed regression.

No functional change.
2016-08-28 08:08:13 +02:00
Marco Costalba 0b944c7186 Silence some warnings with MSVC 2013
No functional change.
2016-08-27 12:16:13 +02:00
Stéphane Nicolet 805afcbf3d Move CheckInfo under StateInfo
This greately simplifies usage because hides to the
search the implementation specific CheckInfo.

This is based on the work done by Marco in pull request #716,
implementing on top of it the ideas in the discussion: caching
the calls to slider_blockers() in the CheckInfo structure,
and simplifying the slider_blockers() function by removing its
first parameter.

Compared to master, bench is identical but the number of calls
to slider_blockers() during bench goes down from 22461515 to 18853422,
hopefully being a little bit faster overall.

archlinux, gcc-6
make profile-build ARCH=x86-64-bmi2
50 runs each

bench:
base = 2356320 +/- 981
test = 2403811 +/- 981
diff = 47490 +/- 1828

speedup = 0.0202
P(speedup > 0) = 1.0000

perft 6:
base = 175498484 +/- 429925
test = 183997959 +/- 429925
diff = 8499474 +/- 469401

speedup = 0.0484
P(speedup > 0) = 1.0000

perft 7 (but only 10 runs):
base = 185403228 +/- 468705
test = 188777591 +/- 468705
diff = 3374363 +/- 476687

speedup = 0.0182
P(speedup > 0) = 1.0000

$ ./pyshbench ../Stockfish/master ../Stockfish/test 20
run base     test     diff
...

base = 2501728 +/- 182034
test = 2532997 +/- 182034
diff = 31268 +/- 5116

speedup = 0.0125
P(speedup > 0) = 1.0000

No functional change.
2016-08-27 09:53:26 +02:00
Marco Costalba 4c5cbb1b14 Make engine ONE_PLY value independent
This non-functional change patch is a deep work to allow SF to be independent
from the actual value of ONE_PLY (currently set to 1). I have verified SF is
now independent for ONE_PLY values 1, 2, 4, 8, 16, 32 and 256.

This patch gives consistency to search code and enables future work, opening
the door to safely tweaking the ONE_PLY value for any reason.

Verified for no speed regression at STC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 95643 W: 17728 L: 17737 D: 60178

No functional change.
2016-08-27 09:12:25 +02:00
gamander 133808851d Fixed wrong definition of WhiteCamp and BlackCamp
No functional change.
2016-08-27 08:48:07 +02:00