As crazyjon said, you can’t beat the price, so I downloaded it, and I’ve been analyzing games in Houdini alongside analyses by Deep Fritz 8. One thing I’m noticing about Houdini is that it’s much less likely to concede an advantage to one player or the other: what Fritz calls an advantage or a slight advantage, Houdini will often call equal. Has anyone else noticed this? Is this a function of Houdini’s being more material and less positional than Fritz? Would people consider this a drawback, a benefit or just an artifact?
One other thing I’m noticing about Houdini: It’s fast. Houdini routinely scours 19 plies in the same time it takes Fritz to explore 13 or 14.
Most of the high end chess engines have aggressive pruning. So its expected that Houdini would show more plies than Deep Fritz 8.
I suspect Deep Fritz 12 would also outpace Deep Fritz 8 on number of plies.
Its not that any of the engines can search faster than the computer will allow, but the aggressive pruning means for many of the lines, newer, high end chess engines will prune sub-optimal lines much faster than the older engines.
Thats partly why many people prefere to have different chess engines and opening books to play with.
The aggressive pruning means the engines play really well, but unlike humans, are a lot less creative in their decision making. So using different opening books, or different engines altogether allows for a greater range of lines.
-Aggressive pruning is also why most engines play tournaments with a smaller book designed just for engine to engine chess mathes/tournament. They don’t care so much about which opening moves they play, as long as its the strongest ones. They let the engines try and find a middlegame or an endgame advantage.
Houdini’s evals are generally smaller (closer to 0) than those of other engines. You could either say that it thinks the advantage is smaller or just see it as using different units. It’s not generally useful to compare one engine’s +0.4 to another engine’s +0.8.
Similarly, different engines have different methods for reporting plies and nodes. Rybka routinely has quite small numbers for both in comparison to other engines. So I’m not sure how useful this comparison is either.
“Some pawns are more equal than others.” - George Orwell, chess player. Strategic pawn sacrifices can throw evaluation functions off a bit as some of them are more materialistic than others. GM David Bronstein used to sac pawns and confuse the programs which thought they stood better than they actually did. This also works with human players who while “knowing” a gambit is supposed to be bad still feel uncomfortable playing against it. Another way to look at it is that a pawn sacrificed in the Benko Gambit has a different value than one sacrificed in the Latvian Gambit or the Blackmar-Diemer, or in middle game attacks to open lines for attack.
Yes, as Tom asks cogently, which sort of pawn equals one point? In the original position do we remove a rook pawn, a knight pawn or a center pawn, to get a one point difference? Since open lines and pawn islands are also considerations here, it’s rather complicated. Later on, there’s control of important squares at stake too.
It might be a little easier to ask whether a bishop or knight is worth three points, or a rook worth five, or a queen worth nine. Their values vary too, but usually not as much.
If the various programs give similar values for major and minor pieces, then maybe it’s that they think a piece is worth a different number of pawns. It seems plausible to me that there’s a fairly wide range of possible opinion about how many pawns a bishop is usually worth, anywhere from 2 1/2 to 4 maybe (in a “typical” position, not varying across different positions), consistent with good play. And differences in this calibration can be compensated by other differences between programs.
Artichoke, in the 1980’s when Hitech and Deep Thought were the on the cutting edge of computer chess, I had a discussion with a computer programmer on the question of the values of the pieces. He mentioned the standard bishop or knight equals 3 pawns evaluation. I asked him if he had ever read “Lasker’s Manual of Chess.” In that book, the values of the pieces were different than other book. In addition, Lasker said that there were values to moves, to squares, and to special location of pawns. Moreover, in each new position, the values were constantly changing. It was more important to know which squares to cover than to know how many squares each piece could go to. He was frustrated by the difficulty in quantifying such a changing, amorphous system of evaluation. I told him that players rarely quantify things. Value was more of a sensation of the ebbs and flows than an exercise in accounting. He could probably get away with the normal table of piece values in most cases. He wanted to be able to do more. And cheaper, as the cost for chips and other hardware were a concern.
Not anymore… The notion of a 1.0 eval being equal to one pawn of material is no longer true. I’ve noticed the version of Stockfish I’ve got on my machine very consistently amplifies results compared to Fritz 12. What is a +1.2 eval in Fritz is a +3.4 eval in Stockfish, and a fraction in Fritz is over 1.0 in Stockfish. I’ve analyzed enough positions in both engines to believe that this is consistent between the programs.
Off to get Houdini.
Adding in:
For example, I spun up a midgame from Nakamura-Shulman in the 2010 championships quads after move 14 where Black is one pawn up. (After 1. e4 e6 2. d4 d5 3. Nc3 Bb4 4. e5 c5 5. a3 Bxc3+ 6. bxc3 Qa5 7. Bd2 Qa4 8. Nf3 Nc6 9. h4 cxd4 10. cxd4 Nge7 11. h5 Nxd4 12. Bd3 h6 13. Kf1 Nxf3 14. Qxf3 b6. The game continued: 15. Qg3 Ba6 16. Qxg7 Bxd3+ 17. cxd3 Rg8 18. Qxh6 Qd4 19. Re1 Qxd3+ 20. Kg1 Rc8 21. Bg5 Qf5 22. f4 Rc2 23. Rh2 Qd3 24. Qf6 Rxg5 25. Qxg5 Qd4+ 26. Kh1 Qe3 0-1)
After reaching a depth 25, Stockfish was odd out placing 15. Qg3 as third choice with an eval of -0.80, and it’s best move choice evaled at -0.60. Houdini evaluated the overall position at -.09, and Fritz 12 evaluated at -0.13.
I skipped ahead to the losing move 23. Rh2 (where material is now even) and let the engine run to depth 19. Fritz had -10.94, Houdini -10.82, Stockfish -13.01.
It will be interesting to see if Houdini continues to parallel Fritz. Anyway, while evalutation routines in a program must be internally consistent (2.0 is better than 1.0,) there is no requirement that a pawn advantage must equal one point of evaluation… It can be any scale that is desired so long as it is consistent and bears some relation to the best objective analysis of the position.
It can be problematic if you’re used to one program’s relative evaluation scores to look at others and get used to them.
I’ve wondered if evaluations should be on a logarithmic scale (such that a one pawn advantage is a 100 eval, a three pawn/minor piece is 1000, a Rook advantage is 100000, etc.) Just a random thought that pops into my head now and again.
A pawn is not a unit of evaluation (although of course having more material is generally a good thing because it tends to increase your chance of eventually checkmating your opponent).
Trying to evaluate a chess position in terms of “pawns” is like trying to evaluate your happiness in terms of your salary.
One annoying thing about Houdini: on my laptop (Windows 7, 64-bit), Houdini running under CB 10 often hangs. After I close ChessBase, multiple copies of Houdini are still running, and I have to go to Task Manager to manually close them.
A small price to pay, but I’m wondering if others have this problem.
But here, what you want to measure is more like salary, rather than happiness. Happiness is hard to describe, and if you specify how you will define your “position-happiness”, you’ve mapped it to something that is observable on a chessboard, or “salary” in your analogy. Really it’s not as general as happiness in the large sense, because here it’s necessarily focused on the goal of checkmate. I may like positions with tripled rook pawns because of a personal fetish, but that doesn’t mean that feature should get a bump in evaluation unless maybe the program is great at playing them.
You need a way to order positions, so that you compare compare the positions that are outcomes of variations via minimax, using that ordering, and choose the best next move. A one-dimensional numerical rating is appropriate. Anything greater than one dimension (say, a multi-faceted evaluation listing one or more aspects of pawn structure, material advantage, king safety, etc.) needs to be reduced to one dimension for ordering. At least this reflects the standard approach to chessplaying algorithms.
If you’re going to rate things, what do you reference it to? Something observable in chess (like a one-pawn advantage, even though that is terribly ambiguous) or something that is unrelated to chess? I think it makes things simpler to use something observable in chess. Anyway, the mythical “one-pawn advantage” is so ambiguous it could be considered an arbitrary, though fixed, unit of measure.
I think we are mostly in agreement, Artichoke. I’m just pointing out that 1) the ultimate object of the game is to checkmate the opposing king, not amass more material, and 2) while you can try to calibrate your weights so that a middlegame position that is relatively equal except for one player having an additional pawn tends to have an evaluation of about 1.0, pawns are still not the fundamental unit of evaluation.