Friday 29 January 2016

Computers vs Humans - considering the median


Or why you aren't, and will never be, John Paulson


The systematic versus discretionary trading argument is alive and well; or if you prefer, computers versus humans*. In this post I pose the question - who is better the average systematic trader, or the average discretionary human?

* Though even fully discretionary traders will be relying on a computer at some point; fully manual trading and back office settlement systems not available in modern market.

My new hedge fund will completely eschew computers... and casual dress (source: http://www.officemuseum.com)


I've thought, and spoken, about this quite a lot. Upfront I should say that there is no simple answer to this question. There are some extraordinary human traders; celebrities like Soros and Paulson. At the same time there are some incredibly successful systematic funds: high frequency firms like IMC and Virtu; also the likes of Renaissance and DE Shaw; and of course the systematic CTA business that I cut my teeth in of which Winton and AHL are just two examples.

But reciting this diverse group of household names is of no real help. It's the median that fascinates me.

Firstly as an individual trader reading this and trying to decide which route to go down, it's unlikely you are the next Soros or Paulson; or for that matter someone with the resources of Jim Simons. As a journalist, you don't know if this group is really representative of all the systematic and non systematic traders and fund managers that are out there. As an asset allocator you may struggle to identify the stars of the future (who may not be the stars of the past); you'll have to allocate to at least some ex-post average managers.

To put in geeky terms if we're trying to work out if the mean or median of a distribution is larger than another, just looking at the outliers isn't going to help and may even be misleading.

We need to think a little more deeply.


What are computers good at?

Source: cartoonstock.com

  •  speed. Duh.
  •  sticking to a plan, and not being freaked out by losses or annoying clients
  •  repeatability - doing the same thing given the same inputs
  •  getting their position scaling correct
  •  managing large portfolios
  •  not being hit by the proverbial bus, or leaving the firm
  • teaching other computers how to do exactly what they do

Researchers using computers can also be good at:
  •  identifying persistent patterns
  •  identifying unusual, non intuitive, patterns 


What are humans good at?



It might be more accurate to say "What are humans who are good at trading good at". Not everyone can do these things.
  • "deep dive" analysis on a small number of assets
  •  processing complex, novel, information
  •  interpreting non quantifiable information
  • adapting to novel, changing, environments
  •  genuine forecasting (rather than extrapolating the past, or assuming it will repeat)
On the last point, it's worth elucidating. We could never write a computer system that would put a CDS short on in 2006, based on the sort of fundamental analysis that Paulson and the rest of the cast of "Big Short" did*. Asset bubbles, backed by novel securitisation techniques, do not happen every month.

* I suppose it's plausible that you'd put on a credit short based on a simple technical model which assumed that CDS prices mean revert from extremes. Nevertheless there would probably be insufficient data to fit such a model, given the relatively short history of CDS as an asset.


If Paulson had been in the Big Short film, Kevin Spacey would have played him. (source digitaltrends.com) (....and why wasn't he? Answers on a postcard, or via twitter)


Which trading arenas are computers likely to be good at?


  •  High Frequency Trading (Duh, again)
  •  Scalping (sure humans can do this, but you'd die of boredom, wouldn't you?)
  •  Systematic technical analysis (persistent patterns)
  •  Equity neutral - where the portfolio is too large for humans but requires only quantifiable factors
  •  Stat arb (finding the weird non intuitive relationships)
  •  Passive index tracking, and 'smart' beta
  •  "Vanilla" arbitrage or near arbitrage eg index vs constituents, on the run vs off the run treasuries
  •  any other strategy that boils down to smart or alternative beta, where a simple set of rules does the trick



Which trading arenas are humans likely to be better at?


  •  Fundamental economic analysis, eg global macro type bets (Soros and Paulson again)
  •  Subjective technical analysis (which, frankly, I am personally very skeptical of)
  •  Weird, one-off, pure or near pure arbitrage trades (eg cash-CDS in 2009, French Gold linked bond in Bonfire of the vanities...)
  •  Special situations / event driven, eg merger arbitrage
  •  Activist investing (computers can't harass the board into doing what you want)
  •  'Deep dive' stock analysis, on the long or short side, where you go right into the nitty gritty of the business
  •  Anything where every trade is different and or relies on analysing a lot of non quantifiable information 
  • Illiquid, and "real" assets, or anything for which data is sparse, unreliable  unavailable


Note that the human edge in the later arena will be reduced or depleted if they don't have the discipline and the knowledge to set up their position sizing and management correctly and stick to it. Imagine if Paulson had put on too large a position in 2005*; and then had to close it in late 2006 due to investor pressure (which was starting to heat up). All his analysis would have been for naught.

* Yes I know he pretty much couldn't have put on a bigger position, and had to scratch around for ways to increase it by doing stuff like this, but please don't spoil my example with the facts.


Average Computer versus Average Human


So, I think we have a small number of instances where humans can't compete with computers:

  •  High Frequency Trading
  •  Scalping
  •  Equity neutral
  •  Stat arb
We then have some situations where humans could compete, but it would be pointless and labour intensive:

  •  Systematic technical analysis
  •  Passive index tracking, and 'smart' beta
  •  "Vanilla" arbitrage or near arbitrage (not high frequency)
  •  Alternative beta

Then there's a group of strategies that computers will really struggle with:
  •  Fundamental economic analysis
  •  One-off, pure or near pure arbitrage trades
  •  Special situations / event driven / activist investing
  •  'Deep dive' stock analysis
  • Illiquid, and "real" assets


We can split the investing and trading population into four groups depending on whether they have the skills to do the kind of 'human only' strategies in the last group, plus the discipline to implement and stick to the correct risk and position management.

  • Super Traders with both the skills and the discipline
  • Committed and useless; with the discipline but not the skill
  • Clever and Chaotic, skills but no discipline
  • Useless and Chaotic, no skills or discipline.
 In the interests of full disclosure I probably fall into either the second or fourth category; with very occasional visits to the third category.

Clearly only those in the first category should contemplate discretionary trading. The rest of us should trade using a system.

Let's assume that a quarter of the investing population are in each category; even though this is a gross exaggeration*. We'll also assume that, generally speaking, people don't realise which category they are in.

* In fact I would say the proportions are more like 0.01%, 10%, 10% and 79.99%

Firstly, in terms of deciding whether to trade with a system or not, you only have a 25% chance of being in the first group for which discretionary trading is the way to go. This isn't great odds. Unless you're an idiot, a gambler, or incredibly over confident you should trade systematically.

Secondly, to answer the original question, we know that the world is split between people managing money with computers, and people not doing so. From the above analysis only 25% of those in the discretionary group should be doing it. They'll be making great returns, especially if they focus on the stuff computers can't do.

The rest will be making a terrible job of it, either because they lack the skill or the discipline. The median will sit in the middle of this group. The mean will be a little higher.

For those that are a bit slow, yes, that's why I chose the median right at the start of this post!

Meanwhile in the computer group there will be much less dispersion; a much narrower distribution. It's true that in many fields of investing and trading the very best computers will be not quite as good as the very best humans.

But the average (both mean and median) of this group will be lower than the top 25% of the human group, but higher than the overall human average.

I would argue that this simple model is pretty close to the truth, especially if you pull in all the amateur traders, many of whom are making a pretty terrible fist of discretionary trading, and fall mainly into the "useless/chaotic" bracket. In the professionalised segment of the business things may be slightly better; as even discretionary managers will be using some systematic rules (which I discuss more below).


The best of both worlds?


source: Dilbert.com


Is there some way we can get the best of both worlds? After all pilots of planes manage it. Autopilot does the boring stuff, but in an emergency humans are able to interpret the situation and act on it much better. I can think of two main routes:


Mechanical system - human override and scaling


Quite a few so called systematic managers seem to operate on the basis of "Yeah, we have all these systems, but we decide when to turn them on and off depending on our judgement". Even firmly committed managers do, occasionally tinker (meddle, risk manage, refit or improve - pick your verb).

Is there a back testable systematic way of switching models on and off that works? If so (and I doubt it) then you're firmly back in the systematic camp.

If not... well it strikes me as just plain stupid that people who know they can't predict market movements think they can predict trading system performance in the same markets.


Discretionary calls - human


This is a much nicer idea. Use your human judgement to decide that Apple is a good bet. Then use a system to decide how much of Apple you should buy, how you should adjust the position as market conditions change, and when you should close. Stick to the system, even when it hurts.

I liked this so much I even put it into my book, where it appears as the the "Semi Automatic Trader" character.

It's also tailor made for the clever and chaotic. Sadly for the committed / useless and useless / chaotic there is nothing that can be done. Use a system, please, for your own sake.


Conclusion


Most humans should trade with systems. Some,who have the skill but not the discipline, can use discretionary trading combined inside a systematic framework. The average human will not be as good as the average computer. Unless you know, for sure, that you'll be well above average as a human trader you should get out your keyboard and start coding.


Correlations, Weights, Multipliers.... (pysystemtrade)

This post serves three main purposes:

Firstly, I'm going to explain the main features I've just added to my python back-testing package pysystemtrade; namely the ability to estimate parameters that were fixed before: forecast and instrument weights; plus forecast and instrument diversification multipliers.

(See here for a full list of what's in version 0.2.1)

Secondly I'll be illustrating how we'd go about calibrating a trading system (such as the one in chapter 15 of my book); actually estimating some forecast weights and instrument weights in practice. I know that some readers have struggled with understanding this (which is of course entirely my fault).

Thirdly there are some useful bits of general advice that will interest everyone who cares about practical portfolio optimisation (including both non users of pysystemtrade, and non readers of the book alike). In particular I'll talk about how to deal with missing markets, the best way to estimate portfolio statistics, pooling information across markets, and generally continue my discussion about using different methods for optimising (see here, and also here).

If you want to, you can follow along with the code, here.


Key


This is python:

system.forecastScaleCap.get_scaled_forecast("EDOLLAR", "carry").plot()


This is python output:

hello world

This is an extract from a pysystemtrade YAML configuration file:

forecast_weight_estimate:
   date_method: expanding ## other options: in_sample, rolling
   rollyears: 20

   frequency: "W" ## other options: D, M, Y

Forecast weights


A quick recap



The story so far; we have some trading rules (three variations of the EWMAC trend following rule, and a carry rule); which we're running over six instruments (Eurodollar, US 10 year bond futures, Eurostoxx, MXP USD fx, Corn, and European equity vol; V2X).

We've scaled these (as discussed in my previous post) so they have the correct scaling. So both these things are on the same scale:

system.forecastScaleCap.get_scaled_forecast("EDOLLAR", "carry").plot()

Rolldown on STIR usually positive. Notice the interest rate cycle.

system.forecastScaleCap.get_scaled_forecast("V2X", "ewmac64_256").plot()

Notice how we moved from 'risk on' to 'risk off' in early 2015

Notice the massive difference in available data - I'll come back to this problem later.
 
However having multiple forecasts isn't much good; we need to combine them (chapter 8). So we need some forecast weights. This is a portfolio optimisation problem. To be precise we want the best portfolio built out of things like these:
Account curves for trading rule variations, US 10 year bond future. All pretty good....


There are some issues here then which we need to address.

An alternative which has been suggested to me is to optimise the moving average rules seperately; and then as a second stage optimise the moving average group and the carry rule. This is similar in spirit to the handcrafted method I cover in my book. Whilst it's a valid approach it's not one I cover here, nor is it implemented in my code.


In or out of sample?


Personally I'm a big fan of expanding windows (see chapter 3, and also here)
nevertheless feel free to try different options by changing the configuration file elements shown here.

forecast_weight_estimate:
   date_method: expanding ## other options: in_sample, rolling
   rollyears: 20

   frequency: "W" ## other options: D, M, Y
Also the default is to use weekly returns for optimisation. This has two advantages; firstly it's faster. Secondly correlations of daily returns tend to be unrealistically low (because for example of different market closes when working across instruments).


Choose your weapon: Shrinkage, bootstrapping or one-shot?


In my last couple of posts on this subject I discussed which methods one should for optimisation (see here, and also here, and also chapter four).

I won't reiterate the discussion here in detail, but I'll explain how to configure each option.

Boostrapping

This is my favourite weapon, but it's a little ..... slow.


forecast_weight_estimate:
   method: bootstrap
   monte_runs: 100
   bootstrap_length: 50
   equalise_means: True
   equalise_vols: True



We expect our trading rule p&l to have the same standard deviation of returns, so we shouldn't need to equalise vols; it's a moot point whether we do or not. Equalising means will generally make things more robust. With more bootstrap runs, and perhaps a longer length, you'll get more stable weights.

Shrinkage


I'm not massively keen on shrinkage (see here, and also here) but it is much quicker than bootstrapping. So a good work flow might be to play around with a model using shrinkage estimation, and then for your final run use bootstrapping. It's for this reason that the pre-baked system defaults to using shrinkage. As the defaults below show I recommend shrinking the mean much more than the correlation.


forecast_weight_estimate:
   method: shrinkage
   shrinkage_SR: 0.90
   shrinkage_corr: 0.50
   equalise_vols: True


Single period


Don't do it. If you must do it then I suggest equalising the means, so the result isn't completely crazy.

forecast_weight_estimate:
   method: one_period
   equalise_means: True
   equalise_vols: True




To pool or not to pool... that is a very good question



One question we should address is, do we need different forecast weights for different instruments, or can we pool our data and estimate them together? Or to put it another way does Corn behave sufficiently like Eurodollar to justify giving them the same blend of trading rules, and hence the same forecast
weights?

forecast_weight_estimate:
   pool_instruments: True ##

One very significant factor in making this decision is actually costs. However I haven't yet included the code to calculate the effect of these. For the time being then we'll ignore this; though it does have a significant effect. Because of the choice of three slower EWMAC rule variations this omission isn't as serious as it would be with faster trading rules.

If you use a stupid method like one-shot then you probably will get quite different weights. However more sensible methods will account better for the noise in each instruments' estimate.

With only six instruments, and without costs, there isn't really enough information to determine whether pooling is a good thing or not. My strong prior is to assume that it is. Just for fun here are some estimates without pooling.

system.config.forecast_weight_estimate["pool_instruments"]=False
system.config.instrument_weight_estimate["method"]="bootstrap"
system.config.instrument_weight_estimate["equalise_means"]=False
system.config.instrument_weight_estimate["monte_runs"]=200
system.config.instrument_weight_estimate["bootstrap_length"]=104

system=futures_system(config=system.config)

system.combForecast.get_forecast_weights("CORN").plot()
title("CORN")
show()









Forecast weights for corn, no pooling

system.combForecast.get_forecast_weights("EDOLLAR").plot()
title("EDOLLAR")
show()



Forecast weights for eurodollar, no pooling

Note: Only instruments that share the same set of trading rule variations will see their results pooled.
 

Estimating statistics


There are also configuration options for the statistical estimates used in the optimisation; so for example should we use exponential weighted estimates? (this makes no sense for bootstrapping, but for other methods is a reasonable thing to do). Is there a minimum number of data points before we're happy with our estimate? Should we floor correlations at zero (short answer - yes).


forecast_weight_estimate:
 

   correlation_estimate:
     func: syscore.correlations.correlation_single_period
     using_exponent: False
     ew_lookback: 500
     min_periods: 20     
     floor_at_zero: True

   mean_estimate:
     func: syscore.algos.mean_estimator
     using_exponent: False
     ew_lookback: 500
     min_periods: 20     

   vol_estimate:
     func: syscore.algos.vol_estimator
     using_exponent: False
     ew_lookback: 500
     min_periods: 20     


Checking my intuition


Here's what we get when we actually run everything with some sensible parameters:

system=futures_system()
system.config.forecast_weight_estimate["pool_instruments"]=True
system.config.forecast_weight_estimate["method"]="bootstrap" 
system.config.forecast_weight_estimate["equalise_means"]=False
system.config.forecast_weight_estimate["monte_runs"]=200
system.config.forecast_weight_estimate["bootstrap_length"]=104


system=futures_system(config=system.config)

system.combForecast.get_raw_forecast_weights("CORN").plot()
title("CORN")
show()

Raw forecast weights pooled across instruments. Bumpy ride.
 Although I've plotted these for corn, they will be the same across all instruments. Almost half the weight goes in carry; makes sense since this is relatively uncorrelated (half is what my simple optimisation method - handcrafting - would put in). Hardly any (about 10%) goes into the medium speed trend following rule; it is highly correlated with the other two rules. Out of the remaining variations the faster one gets a higher weight; this is the law of active management at play I guess.

Smooth operator - how not to incur costs changing weights


Notice how jagged the lines above are. That's because I'm estimating weights annually. This is kind of silly; I don't really have tons more information after 12 months; the forecast weights are estimates - which is a posh way of saying they are guesses. There's no point incurring trading costs when we update these with another year of data.

The solution is to apply a smooth

forecast_weight_estimate:
   ewma_span: 125
   cleaning: True


Now if we plot forecast_weights, rather than the raw version, we get this:

system.combForecast.get_forecast_weights("CORN").plot()
title("CORN")
show()



Smoothed forecast weights (pooled across all instruments)
There's still some movement; but any turnover from changing these parameters will be swamped by the trading the rest of the system is doing.



Forecast diversification multiplier


Now we have some weights we need to estimate the forecast diversification multiplier; so that our portfolio of forecasts has the right scale (an average absolute value of 10 is my own preference).


Correlations


First we need to get some correlations. The more correlated the forecasts are, the lower the multiplier will be. As you can see from the config options we again have the option of pooling our correlation estimates.


forecast_correlation_estimate:
   pool_instruments: True 

   func: syscore.correlations.CorrelationEstimator ## function to use for estimation. This handles both pooled and non pooled data
   frequency: "W"   # frequency to downsample to before estimating correlations
   date_method: "expanding" # what kind of window to use in backtest
   using_exponent: True  # use an exponentially weighted correlation, or all the values equally
   ew_lookback: 250 ## lookback when using exponential weighting
   min_periods: 20  # min_periods, used for both exponential, and non exponential weighting





Smoothing, again


We estimate correlations, and weights, annually. Thus as with weightings it's prudent to apply a smooth to the multiplier. I also floor negative correlations to avoid getting very large values for the multiplier.


forecast_div_mult_estimate:
   ewma_span: 125   ## smooth to apply
   floor_at_zero: True ## floor negative correlations


system.combForecast.get_forecast_diversification_multiplier("EDOLLAR").plot()
show()




system.combForecast.get_forecast_diversification_multiplier("V2X").plot()
show()

Forecast Div. Multiplier for Eurodollar futures
Notice that when we don't have sufficient data to calculate correlations, or weights, the FDM comes out with a value of 1.0. I'll discuss this more below in "dealing with incomplete data".


From subsystem to system


We've now got a combined forecast for each instrument - the weighted sum of trading rule forecasts, multiplied by the FDM. It will look very much like this:

system.combForecast.get_combined_forecast("EUROSTX").plot()
show()

Combined forecast for Eurostoxx. Note the average absolute forecast is around 10. Clearly a choppy year for stocks.


Using chapters 9 and 10 we can now scale this into a subsystem position. A subsystem is my terminology for a system that trades just one instrument. Essentially we pretend we're using our entire capital for just this one thing.


Going pretty quickly through the calculations (since you're eithier familar with them, or you just don't care):

system.positionSize.get_price_volatility("EUROSTX").plot()
show()

Eurostoxx instrument value volatility. A bit less than 1% a day in 2014, a little more exciting recently.

system.positionSize.get_block_value("EUROSTX").plot()
show()


Block value (value of 1% change in price) for Eurostoxx.


system.positionSize.get_instrument_currency_vol("EUROSTX").plot()
show()




Eurostoxx: Instrument currency value: Volatility in euros per day


system.positionSize.get_instrument_value_vol("EUROSTX").plot()
show()







Eurostoxx instrument value volatility: volatility in base currency ($) per day, per contract



system.positionSize.get_volatility_scalar("EUROSTX").plot()
show()




Eurostoxx vol scalar: Number of contracts we'd hold in a subsystem with a forecast of +10




system.positionSize.get_subsystem_position("EUROSTX").plot()
show()

Eurostoxx subsystem position



Instrument weights


We're not actually trading subsystems; instead we're trading a portfolio of them. So we need to split our capital - for this we need instrument weights. Oh yes, it's another optimisation problem, with the assets in our portfolio being subsystems, one per instrument.


import pandas as pd

instrument_codes=system.get_instrument_list()

pandl_subsystems=[system.accounts.pandl_for_subsystem(code, percentage=True)
        for code in instrument_codes]

pandl=pd.concat(pandl_subsystems, axis=1)
pandl.columns=instrument_codes

pandl=pandl.cumsum().plot()
show()

Account curves for instrument subsystems
Most of the issues we face are similar to those for forecast weights (except pooling. You don't have to worry about that anymore). But there are a couple more annoying wrinkles we need to consider.



Missing in action: dealing with incomplete data


As the previous plot illustrates we have a mismatch in available history for different instruments - loads for Eurodollar, Corn, US10; quite a lot for MXP, barely any for Eurostoxx and V2X.

This could also be a problem for forecasts, at least in theory, and the code will deal with it in the same way.

Remember when testing out of sample I usually recalculate weights annually. Thus on the first day of each new 12 month period I face having one or more of these beasts in my portfolio:
  1. Assets which weren't in my fitting period, and aren't used this year
  2. Assets which weren't in my fitting period, but are used this year
  3. Assets which are in some of my fitting period, and are used this year
  4. Assets which are in all of the fitting period, and are used this year
Option 1 is easy - we give them a zero weight.

Option 4 is also easy; we use the data in the fitting period to estimate the relevant statistics.

Option 2 is relatively easy - we give them an "downweighted average" weight. Let me explain. Let's say we have two assets already, each with 50% weight. If we were to add a further asset we'd allocate it an average weight of 33.3%, and split the rest between the existing assets. In practice I want to penalise new assets; so I only give them half their average weight. In this simple example I'd give the new asset half of 33.3%, or 16.66%.

We can turn off this behaviour, which I call cleaning. If we do we'd get zero weights for assets without enough data.


instrument_weight_estimate:
   cleaning: False
 


Option 3 depends on the method we're using. If we're using shrinkage or one period, then as long as there's enough data to exceed minimum periods (default 20 weeks) then we'll have an estimate. If we haven't got enough data, then it will be treated as a missing weight; and we'd use downweighted average weights (if cleaning is on), or give the absent instruments a zero weight (with cleaning off)

For bootstrapping we check to see if the minimum period threshold is met on each bootstrap run. If it isn't then we use average weights when cleaning is on. The less data we have, the closer the weight will be to average. This has a nice Bayesian feel about it, don't you think? With cleaning off, less data will mean weights will be closer to zero. This is like an ultra conservative Bayesian.



If you don't get this joke, there's no point in me trying to explain it (Source: www.lancaster.ac.uk)


Let's plot them


We're now in a position to optimise, and plot the weights:

(By the way because of all the code we need to deal properly with missing weights on each run, this is kind of slow. But you shouldn't be refitting your system that often...)

system.config.instrument_weight_estimate["method"]="bootstrap" ## speed things up
system.config.instrument_weight_estimate["equalise_means"]=False
system.config.instrument_weight_estimate["monte_runs"]=200
system.config.instrument_weight_estimate["bootstrap_length"]=104

system.portfolio.get_instrument_weights().plot()
show()


Optimised instrument weights
These weights are a bit different from equal weights, in particular the better performance of US 10 year and Eurodollar is being rewarded somewhat. If you were uncomfortable with this you could turn equalise means on.


Instrument diversification multiplier


Missing in action, take two


Missing instruments also affects estimates of correlations. You know, the correlations we need to estimate the diversification multiplier. So there's cleaning again:


instrument_correlation_estimate:
    cleaning: True


I replace missing correlation estimates* with the average correlation, but I don't downweight it. If I downweighted the average correlation the diversification multiplier would be biased upwards - i.e. I'd have too much risk on. Bad thing. I could of course use an upweighted average; but I'm already penalising instruments without enough data by giving them lower weights.

* where I need to, i.e. options two and three

Let's plot it



system.portfolio.get_instrument_diversification_multiplier().plot()
show()


Instrument diversification multiplier


And finally...


We can now work out the notional positions - allowing for subsystem positions, weighted by instrument weight, and multiplied by instrument diversification multiplier.


system.portfolio.get_notional_position().plot("EUROSTX")
show()


Final position in Eurostoxx. The actual position will be a rounded version of this.


End of post


No quant post would be complete without an account curve and a Sharpe Ratio.

And an equation. Bugger, I forgot to put an equation in.... but you got a Bayesian cartoon - surely that's enough?
 

print(system.accounts.portfolio().stats())

system.accounts.portfolio().cumsum().plot()

show()



Overall performance. Sharpe ratio is 0.53. Annualised standard deviation is 27.7% (target 25%)

Stats: [[('min', '-0.3685'), ('max', '0.1475'), ('median', '0.0004598'), 
('mean', '0.0005741'), ('std', '0.01732'), ('skew', '-1.564'), 
('ann_daily_mean', '0.147'), ('ann_daily_std', '0.2771'), 
('sharpe', '0.5304'), ('sortino', '0.6241'), ('avg_drawdown', '-0.2445'), ('time_in_drawdown', '0.9626'), ('calmar', '0.2417'), 
('avg_return_to_drawdown', '0.6011'), ('avg_loss', '-0.011'), 
('avg_gain', '0.01102'), ('gaintolossratio', '1.002'), 
('profitfactor', '1.111'), ('hitrate', '0.5258')]

This is a better output than the version with fixed weights and diversification multiplier that I've posted before; mainly because a variable multiplier leads to a more stable volatility profile over time, and thus a higher Sharpe Ratio.


Monday 18 January 2016

pysystemtrader: Estimated forecast scalars

I've just added the ability to estimate forecast scalars to pysystemtrade. So rather than a fixed scalar specified in the config you can let the code estimate a time series of what the scalar should be.

If you don't understand what the heck a forecast scalar is, then you might want to read my book (chapter 7). If you haven't bought it, then you might as well know that the scalar is used to modify a trading rule's forecast so that it has the correct average absolute value, normally 10.

Here are some of my thoughts on the estimation of forecast scalars, with a quick demo of how it's done in the code. I'll be using this "pre-baked system" as our starting point:

from systems.provided.futures_chapter15.estimatedsystem import futures_system
system=futures_system()


The code I use for plots etc in this post is in this file.

Even if you're not using my code it's probably worth reading this post as it gives you some idea of the real "craft" of creating trading systems, and some of the issues you have to consider.


Targeting average absolute value


The basic idea is that we're going to take the natural, raw, forecast from a trading rule variation and look at it's absolute value. We're then going to take an average of that value. Finally we work out the scalar that would give us the desired average (usually 10).

Notice that for non symmetric forecasts this will give a different answer to measuring the standard deviation, since this will take into account the average forecast. Suppose you had a long biased forecast that varied between +0 and +20, averaging +10. The average absolute value will be about 10, but the standard deviation will probably be closer to 5.

Neither approach is "right" but you should be aware of the difference (or better still avoid using biased forecasts).


Use a median, not a mean


From above: "We're then going to take an average of that value..."

Now when I say average I could mean the mean. Or the median. (Or the mode but that's just silly). I prefer the median, because it's more robust to having occasional outlier values for forecasts (normally for a forecast normalised by standard deviation, when we get a really low figure for the normalisation).

The default function I've coded up uses a rolling median (syscore.algos.forecast_scalar_pooled). Feel free to write your own and use that instead:

system.config.forecast_scalar_estimate.func="syscore.algos.yesMumIwroteMyOwnFunction"


Pool data across instruments


In chapter 7 of my book I state that good forecasts should be consistent across multiple instruments, by using normalisation techniques so that a forecast of +20 means the same thing (strong buy) for both S&P500 and Eurodollar.

One implication of this is that the forecast scalar should also be identical for all instruments. And one implication of that is that we can pool data across multiple markets to measure the right scalar. This is what the code defaults to doing.

Of course to do that we should be confident that the forecast scalar ought to be the same for all markets. This should be true for a fast trend following rule like this:

## don't pool
system.config.forecast_scalar_estimate['pool_instruments']=False

results=[]
for instrument_code in system.get_instrument_list():
    results.append(round(float(system.forecastScaleCap.get_forecast_scalar(instrument_code, "ewmac2_8").tail(1).values),2))
print(results)


[13.13, 13.13, 13.29, 12.76, 12.23, 13.31]

Close enough to pool, I would say. For something like carry you might get a slightly different result even when the rule is properly scaled; it's a slower signal so instruments with short history will (plus some instruments just persistently have more carry than others - that's why the rule works).

results=[]
for instrument_code in system.get_instrument_list():

   results.append(round(float(system.forecastScaleCap.get_forecast_scalar(instrument_code, "carry").tail(1).values),2))
print(results)


[10.3, 58.52, 11.26, 23.91, 21.79, 18.81]

The odd one's out are V2X (with a very low scalar) and Eurostoxx (very high) - both have only a year and a half of data - not really enough to be sure of the scalar value.

One more important thing, the default function takes a cross sectional median of absolute values first, and then takes a time series average of that. The reason I do it that way round, rather than time series first, is otherwise when new instruments move into the average they'll make the scalar estimate jump horribly.

Finally if you're some kind of weirdo (who has stupidly designed an instrument specific trading rule), then this is how you'd estimate everything individually:

## don't pool
system.config.forecast_scalar_estimate.pool_instruments=False

## need a different function
system.config.forecast_scalar_estimate.func="syscore.algos.forecast_scalar"


Use an expanding window


As well as being consistent across instruments, good forecasts should be consistent over time. Sure it's likely that forecasts can remain low for several months, or even a couple of years if they're slower trading rules, but a forecast scalar shouldn't average +10 in the 1980's, +20 in the 1990's, and +5 in the 2000's.

For this reason I don't advocate using a moving window to average out my estimate of average forecast values; better to use all the data we have with an expanding window.

For example, here's my estimate of the scalar for a slow trend following rule, using a moving window of one year. Notice the natural peaks and troughs as we get periods with strong trends (like 2008, 2011 and 2015), and periods without them.
There's certainly no evidence that we should be using a varying scalar for different decades (though things are a little crazier in the early part of the data, perhaps because we have fewer markets contributing). The shorter estimate just adds noise, and will be a source of additional trading in our system.

If you insist on using a moving window, here's how:

## By default we use an expanding window by making this parameter *large* eg 1000 years of daily data
## Here's how I'd use a four year window (4 years * 250 business days)
system.config.forecast_scalar_estimate.window=1000



Goldilocks amount of minimum data - not too much, not too little


The average value of a forecast will vary over the "cycle" of the forecast. This also means that estimating the average absolute value over a short period may well give you the wrong answer.

For example suppose you're using a very slow trend following signal looking for 6 month trends, and you use just a month of data to find the initial estimate of your scalar. You might be in a period of a really strong trend, and get an unrealistically high value for the average absolute forecast, and thus a scalar that is biased downwards.

Check out this, the raw forecast for a very slow trend system on Eurodollars:
Notice how for the first year or so there is a very weak signal. If I'm now crazy enough to use a 50 day minimum, and fit without pooling, we get the following estimate for the forecast scalar:

Not nice. The weak signal has translated into a massive overestimate of the scalar.

On the other hand, using a very long minimum window means we'll eithier have to burn a lot of data, or effectively be fitting in sample for much of the time (depending on whether we backfill - see next section).

The default is two years, which feels about right to me, but you can easily change it, eg to one year:

## use a year (250 trading days)
system.config.forecast_scalar_estimate.min_periods=250



Cheat, a bit


So if we're using 2 years of minimum data, then what do we do if we have less than 2 years? It isn't so bad if we're pooling, since we can use another instrument's data before we get our own, but what if this is the first instrument we're trading. Do we really want to burn our first two years of precious data?

I think it's okay to cheat here, and backfill the first valid value of a forecast scalar. We're not optimising for maximum performance here, so this doesn't feel like a forward looking backtest.

Just be aware that if you're using a really long minimum window then you're effectively fitting in sample during the period that is backfilled.

Naturally if you disagree, you can always change the parameter:

system.config.forecast_scalar_estimate.backfill=False


Conclusion


Perhaps the most interesting thing about this post is how careful and thoughtful we have to be about something as mundane as estimating a forecast scalar.

And if you think this is bad, some of the issues I've discussed here were the subject of debates lasting for years when I worked for AHL!

But being a successful systematic trader is mostly about getting a series of mundane things right. It's not usually about discovering some secret trading rule that nobody else has thought of. If you can do that, great, but you'll also need to ensure all the boring stuff is right as well.