FAQ About Backtesting

Does Backtesting Really Work?

Nailgun Analogy

Do nailguns really work? Perhaps you know someone who has tried using a nailgun before and they say it did not work very well because they did not build a strong structure with it. A nailgun is an invaluable tool to a carpenter and its make them more efficient and effective. However, it is not sufficient alone for building great structures. It must be coupled with other tools to be most effective. Backtesting is like a nailgun. It is a tool that helps you be more efficient and effective in your trading and investment research. It does not ensure you make money. If used the wrong way, it can be dangerous. Below are some of the most common issues people encounter.

Past performance

It is common wisdom that buying and holding is a good default strategy for many investors. What evidence is typically cited first? The past performance of that strategy. In fact past performance is probably the primary piece of data most investors use to judge a fund or strategy. So why do people sometimes run into problems looking at the past performance information that backtesting produces?

Look-ahead Bias

One common pitfall is called look-ahead bias. You might decide to backtest a strategy that buys Apple stock and see amazing backtested performance. Then you put this strategy into action and the results are not as good. Well why not? The problem here is almost subconscious. The reason you probably chose to backtest buying Apple stock is because it is talked about widely, heavily traded and owned, and has had amazing products and performance. In a sense, you are cheating by even subconsciously using clues of good performance to test for good performance. Proper backtesting requires you to try to be fully aware of any such bias you may have.

Data Mining

Another common pitfall is called "data mining." If you look through enough data rare statistical events will become likely. If you were looking through records of coin flips and came across evidence of 10 head flips in a row, is that statistically meaningful? The only way to answer that question is by looking at the total number of coin flips surveyed. Statistically there is a 50% chance of getting heads and a 0.5 to the power of 10 chance of getting 10 heads in a row, which is 1/1024 or 0.001% chance. So if the data covered 100 flips, that would be a very significant event and could suggest the coin was damaged or heavier on one side or perhaps there were errors in the data. If this happened once in 1000 flips, that suggests everything is normal. If it happened only once after 10,000 flips, then it is not unusual that it happened, rather that it did not happen more, and maybe there was something going on with the other side of the coin. For proper backtesting, when looking at very high performing strategies, look at that information in the context of the number of trials. Also you can try be symmetrical in your analysis and look at both the best performing and worst performing strategies, and make sure your best strategies are larger in magnitude than the worst strategies.

Out of Sample Testing

One solution to both of these pitfalls is called "out of sample" testing. Your true goal is probably not to find the best past performing strategy, but rather the best future performing strategy. You can do this with just historical data, by using a statistical technique called "out of sample" testing. Rather than backtest to today's date, you can leave a percentage of your possible data at the end out of your test. So perhaps you first backtest from 2000 to 2010, then take your best strategy from there and backtest it out-of-sample / in the future from 2010 to 2013 and validate the performance remains strong. One "cost" of this technique is that your most recent data in your "in-sample" test is not as fresh so a proper balance is needed. Many people employ this powerful technique without even knowing it. After identifying a good strategy, many people will trade it on a piece of paper in real-time, which is also very effective.

Why Theories?

Another common pitfall is focusing too much on the statistics and not enough on the "why". Theories that explain why something is happening is another leg to support the stool. It is another reason why something should persist into the future. Let's say I have data that shows two neighboring houses. House A mows it's lawn on average every 7 days and House B mows it's lawn on average every 10 days. Does House A have the better yard? The data may provide a clue but bringing other information to this story makes the answer more solid. Perhaps Neighbor B has more trees and therefore less sun and the grass grows more slowly. Perhaps Neighbor A has a landscaping service which business is less profitable at slower frequencies. Maybe House B was just sold and there was a temporary gap in mowings that is distorting the average but since the new owner it is being mowed every 6 days. Using the data as evidence to support a theory is much stronger than using the data alone. So for your backtesting, try coupling your theory on market behavior with the numerical backtesting evidence.