BEHIND THE BETS
Russell Clarke
Mine was the generation that missed out on the Computer Studies or IT being introduced into the schools’ curriculum. However, when World Chess Champion of the time, Gary Kasparov, was defeated by IBM’s Deep Blue, I started to re-think my luddite stance towards horse racing analysis. For years, Grandmasters routinely beat any and every computer. The perceived wisdom of the time was that, although Chess involved an almost infinite set of combinations, which computers were ideal to crunch through, it was also a game of intuition at the highest level and so the best players would always beat the machines. I thought much the same about horse racing. When Kasparov was beaten, it became clear to me that a powerful program would be more accurate than you or I (or anyone else) when it comes to predicting the result of horse races.
In addition there is a wealth of scientific evidence that humans err predictably and often. Humans are simply not good information processors. Nobel Laureate, Herbert Simon, studied this phenomena for decades and concluded “the capacity of the human mind for formulating and solving complex problems is very small compared with the size of the problems whose solution is required.” David Faust reached a similar conclusion in his book ‘The Limits Of Scientific Reasoning’. He made a study of a wide range of Professionals and found that human judges were consistently outperformed by simple actuarial models.
In relation to horse racing, a study in the US was undertaken with experienced racegoers. They were were given information ranging from 5 to 40 pieces that they considered important for winner finding. As the number of pieces of information was increased, the confidence of the punters also increased, but the number of winners found did not correspondingly increase. My favourite quote about experts springs to mind (remember it when you see the news and a contributor is introduced as an expert)…”often wrong, but never in doubt.”
Why Models Beat Humans
Models beat human forecasts because they consistently apply the same criteria. They never vary, they never get overwhelmed by the workload and they never get bored. They don’t listen to rumour or favour the “interesting angle”. Models use base rate data, humans favour descriptive data. For example, a horse blinkered for the first time…..the model knows the base strike-rate for this factor is 5%. The racing crowd may see “a horse blinkered for the first time that may improve and is rumoured to be working well.” The human prefers the story and ignores the base rate 5%.
Humans favour a case by case scenario rather than the base rate. We favour the complex over the simple, despite the fact we are not good processors of information. Humans also time weight information by placing more emphasis on the very latest piece of news. Humans prefer personal experience, so we favour, for example, horses we have seen. It isn’t logical, but it is the human way.
Computers in Horse Racing
With the sheer volume of racing taking place, it is obvious, that even the most committed individual can only scratch the surface of any serious analysis. Computers, on the other hand, can analyse any amount of data they are given in minutes. For most, this impressive ability to process data is enough.
However, we can use the machines to do much more. For example, a horses fitness is traditionally measured by the number of days since it’s last run, or, by the trainer strike-rate first time out etc. But, what if we discovered a more accurate method? Perhaps a method that utilises the standard of the previous performance, or, the standard of the previous performance in combination with how many days ago it was? Or, those two parameters and the stable’s comparative strike rates first, second, third time out? Of course, even though these methods may be more accurate than a simple “days since last ran” it becomes difficult or impossible for an individual to calculate a fitness number based ont these factors because of the time and calculations involved…..but a computer does it in seconds, for every horse in every race. And that is just one example of analysing racing data (in this case fitness) in a more accurate way. Multiply this, with more accurate measures of collateral form, race times, going, distance suitability, pace, draw, progression etc and you have a powerful way of accurately assessing chances and compiling a useful odds line.
The Factors
Let me briefly run through the factors the model utilises with a few clues as to how it examines each factor in a way that is different to the traditional pen and paper methods;
Collateral Form
The idea that an intuitive private handicapper can do this job better than a computer is frankly laughable. I have a collection of Dick Whitford books from the 70’s and I was great fan of his ratings in the Sporting Life, but he and every other private handicapper is a relic from a bygone age.
The computer can get the result of a race and immediately find a “best fit” to rate that race. By “best fit” I mean, given the prior knowledge of the horses in that race and the new knowledge of this result, the computer can calculate the probabilities of which horses ran to which figures. It can even do this by taking into account the going and distance of today’s race. This ability alone, makes it more accurate than a human can ever be. The “old” idea of finding a key horse than “ran to it’s form” and then rating the race in line with that assumption now seems ridiculously simplistic.
But, more than that. Once the computer has rated a new race, it has another piece of evidence, another piece of the jigsaw, and can then go back through every race it has ever rated and re-assess the ratings. Us mere mortals are reduced to statements like “the race has worked out well as 3 winners have come from the race”. Vague at best. The computer puts numbers on this.
Time Figures
They can be useful in races where the overall ability of the runners remains uncertain. They can identify ability in some horses before collateral form. They are also very useful for demonstrating when a race has been a true test of stamina at the distance. They remain a partial picture in UK racing however, as we do not have sectional times to show us the pace distribution of a race.
Progression
Understanding the amount of progression that can occur and factoring this into a model is crucial. Identifying the clues allows the computer to do this. A number of potential factors are at play…age of the horse, number of runs to date, number of runs on ground, number of runs at distance, breeding, trainer modus operandi, running style, future entries and information received. This latter factor may seem at variance with the quantative approach previously described, but I make an allowance for such information as I think it is often vital when measuring a factor, which, by definition, has yet to occur.
Fitness
I dealt with earlier. This is a fascinating area and should be treated very much in terms of degrees rather than the traditional, “fit” or “in need of the run”. Fitness is very much a matter of degrees and can be measured far more accurately than days since last ran. Included in my definition of fitness is being “over raced”. A jaded horse will run just as badly as an unfit horse.
In Running Comments
These can be used by the model to identify a number of things. Horses with a turn of foot, horses that are one paced, horses that did not have sufficient stamina, horse that were unfit, horses that were suited by the pace of the race etc. This remains a work in progress but has much potential.
Pace of the Race
An under rated factor because it appears almost invisible to punters. Predicting the shape of a race and the likely pace and effect of the draw is all something a computer model can do with ease but is a long winded operation for an individual. Jockeys and trainers (with a few notable exceptions) seem almost oblivious to the strength of this factor
Ability to act on the ground/distance
A previous win on the ground/at the trip was the traditional method. However, this is crude and of limited use. The merit of each run is much more important and with handicappers in particular, it is easy for the computer to assess exactly how likely the horse will be suited to the conditions rather than a simplistic yes/no.
With lightly raced horses, breeding gives us another clue. Occasionally this clue will be strong and other times weak. Again the model can put a numerical assessment on this. There are readily available stats for Sire and Dam Sire. Clearly this gives only a partial picture and the dam should also be taken into account, although this is difficult with often limited data surrounding the dam.
Trainer/Jockey
This is an area, that, in all fairness, most punters assess quite well. There are many stats covering trainer methods and strike-rates for trainer/jockey combinations etc. The model accounts for these, but perhaps has a crucial advantage over traditional methods. Strike-rates are a crude measurement and the model assesses the merit of the runs and can therefore often unearth a significant factor that is hidden by a strike rate that is misleading.
Draw
The model assesses this factor accurately (unlike the more traditional methods). Firstly, it measures strike rates v expected strike rates. For example, a low draw has a superior strike rate at Chester than a higher draw, but this also reflects that a low stall will always be represented in a small field, whereas stall 10 will always be competing in a field with at least 10 runners. Basic but often overlooked.
Secondly, the model allows for ground differences, which causes distortions in draw bias. It also assesses the draw based on the potential advantage it confers upon a runner, ie it measures the extent of the advantage, rather than the binary “good draw/bad draw.”
Odds Line
Once these factors (and a few more) are assessed, an odds line is produced. This is done by producing ratings and rating ranges for each horse. These are converted into % chance of winning and thus into an odds line. From the odds line it can be seen where the potential value bets are. Value is defined as the difference between the odds determined by the model and offered by the bookmakers/exchanges.
That is the simplistic explanation, there are subtle nuances along the way, which I will deal with in later articles.