Neural network with two objective functions

7 views (last 30 days)
Hi,
I am considering building a neural network with two similar but different objective functions. I have read about genetic optimization with more than one objective function. Is there similar functionality in Matlab for NNs?
In other words, is there a way to train the NN to reach some kind of "pareto" optimal solution for two objective functions?
In case curious, the idea is that one function is the error in forecast return (the NN's output) of stocks and actual return (would like to minimize). The other function is the return (or inverse of it), of the top say 15-25% ranked stocks based on the NN's output. I need to optimize both functions because (1) what I really care about is the best stocks coming out on top and (2) I want to have a forecast return metric so I can combine this with other analysis I am doing. Obviously more accurate return forecast will beget a more accurate stock ranking, but by using ranking the optimization will focus more on accuracy of the best stocks...I think.
Thanks in advance for any help.
Best, Mike
  2 Comments
Greg Heath
Greg Heath on 5 Aug 2014
It is not clear how your inputs and targets are being defined.
It is also not clear what the corresponding dimensions will be.
Michael
Michael on 6 Aug 2014
Greg,
First of all thanks for the response. Second, sorry not to be clearer. Let me give it a try.
Let’s say I have a 1000 stock universe. My goal is to always be long the best 50 stocks...or more specifically to determine a strategy that gives me 50 stocks that consistently outperform.
I have two general approaches in mind. The first, and most simplistic, is to train a NN with various different inputs to forecast the return at every point in time for all 1000 stocks. I would use the mean-squared-error performance function with return being the output. Lets for arguments sake say I use monthly returns. Then, arguably, all I have to do is order the stocks by forecast return every day/period and pick the 50 with the top forecast return.
Here are the issues I see with this method. First, my NN may be much more accurate at forecasting some stocks than others. My idea is to end up with 50 stocks that don't just have a high forecast return, but where the NN has a reasonably low error for the forecast. So there is a tradeoff: I need both high return and low error.
Second, forecasting return per the above, the system is going to be just as worried about "bad" stocks as "good" stocks. The way this system is going to work is that a big input is stock quality. Quality won't change much over time. So I would, to a certain degree, rather have the system focus on accuracy for the high quality stocks and not the low quality stocks.
Here is an example. Suppose there are 5 horrible stocks that I never want to own. If the error on the forecast return is higher on these stocks so that it can be LOWER on the top 5 highest quality stocks, this is better for my system since I will more accurately pick the very best stocks.
My first response to the issues above was to optimize around return. In other words, I would actually calculate the average annualized return of the strategy (best 50 stocks equal weighted) over the training period of the data and this would be a custom performance function. It seemed like the perfect solution to my problem, as what should happen is the system should end up worrying about accuracy much more on the higher quality stocks than on the low ones (i.e., where I need it).
But there is a catch as to this solution. This is that the “returns” that come out as output may not actually be good return estimates. They may end up optimally ranking the stocks, but the return may be “off.” This is a problem because my real idea here is to use more like 30-40 stocks per group, derive a return metric with a NN for each group, and then combine all the data in the ranking process (which may itself need a NN as expected error should be an input, not just expected return).
Here is where I am on this issue. Why not user a performance function that is MSE divided by return? I think I would really need something like MSE/(2+return) because return can go negative. But the general idea is that this is an objective function that will seek to BOTH minimize error of the forecast as well as maximize return….so it is a little bit of both approaches.
I guess what my question really should have been is: what if I am worried about two performance functions and not just one?
Sorry if this isn’t making sense. I’m trying to think as far ahead as possible on these issues before I actually get to coding them.
THANKS so much for your help!
Best, Mike

Sign in to comment.

Accepted Answer

Michael
Michael on 7 Aug 2014
Greg, thanks for the comment. I'm still focused on this idea that what needs to be optimized is the portfolio return and not the accuracy of any particular stock's return prediction. I think this because my feeling is that if I only really want to own 20% of the market over the duration, and my strategy is to own 25% of this 20% (5% of the market), then why worry a lot about the accuracy of the return prediction on the 80% I never own? The other 80% only needs to be accurate enough so that it sorts to the bottom of the list and not the top. But I totally understand it might not work.
The system should naturally glom on to what IS both predictable and high return, not necessarily what is high return because of requiring the portfolio return to be high. What can't be predicted will end up sorting lower, because it's lack of predictability will end up making it less attractive of a portfolio position than a stock with lower expected return but where the model has been very accurate (at least, that is what I am hoping for).
I am really more of a fundamental investor than a quant guy (though I studied EE), so this is in line with the way I invest as a portfolio manager. I look mostly for the ~20% of stocks that have characteristics that would allow them to be owned, and then work to make accurate predictions on these, with the prediction accuracy increasing as a function of (1) the probability of it going into the portfolio at some time and (2) if it is in the portfolio, the size of the position.

More Answers (2)

Greg Heath
Greg Heath on 7 Aug 2014
The only way I have ever designed a successful stock market predictor was to use fractional increases in price (or return?) as the target/output. The function sort will then rank them.
Hope this helps.
Thank you for formally accepting my answer
Greg
  1 Comment
Michael
Michael on 5 Jun 2015
Hi Greg, I hope all is well. Thanks again for your prior responses.
I am still working on my system but have evolved my understanding of machine learning. I have currently centered on gradient boosting as the likely best machine learning solution for the problem. I thought for a while that simple bagging might be better, but I have seen a lot of examples where boosting with a learning rate less than 1 and random selection of data and features has superior results. For neural nets, I concluded that most likely their advantages are in data that is convolutional/structured which I don't think my data really is. Deep neural networks sounded interesting for a while to me, but in stock market data you don’t have data translation like in an image.
However, I have 2 fundamental "problems" with my data owing to it being from the stock market having to do with it not being IID.
First, because of the duration of average some indicators use, some data-points are highly correlated. For example. the 2Y trailing return of a stock is not very different if measured a month ago. My understanding is that this requires a sampling (for ensembles) where I choose datapoints that are "far away" in time. From what I can tell so far, Matlab does not have functionality to pick a random subspace with this criteria. When I was thinking of using simple bagging, I figured I would just build the trees myself from custom subspaces and aggregate them into an ensemble, but this won’t work if I want to do gradient boosting. Now, on this point I am not totally sure that it is so critical to have samples “far away.” My intuition is that it is better if they are, but even if they are not perhaps by right-sizing the percent of data sampled and having enough trees it gives the same result. I would love any insight on that issue.
The second fundamental problem is that data from a given stock is correlated/related to itself. I realized after thinking about it that this is of critical importance. Consider, it would likely be better, if there is enough data, to make a prediction for stock A from training data only or mostly from stock A than to use the entire market. Thus, I had been thinking of a “system” where I train on stock-specific data, stock-group data (where I use a special algorithm to group stocks), and the entire market, and then use a calculation (I can elaborate if interested) that determines which of these models is more likely to give the better result. If the input looks very different from the stock-specific training data, for example, then it will use the group or entire market. I am pretty convicted that some form of taking into account which stock the system is looking at is important to optimizing performance.
Now, on the second issue the question is what is the best way to organize this. Thinking naively, it would be great to simply feed categories to the predictor that indicate what stock it is looking at. However, my belief here from what I know about these algorithms is that this will have poor results on new data, because this predictor will assume that it has seen the full universe of potential outcomes for each stock, when many times this isn’t the case. (Say there is a stock with only a one year history with a big rally – the system will think the rally will continue regardless of how different the new data looks). So I feel like I have to do something like in the previous paragraph.
If you have any insights or ideas on these issue or my current approach, I would very much appreciate. Thanks in advance.
Best, Mike

Sign in to comment.


Greg Heath
Greg Heath on 19 Feb 2015
In general you cannot simultaneously minimize two functions. The two most common alternatives are
1. Minimize a linear combination (e.g., Neural Network regularization)
2. Minimize one subject to the an upperbound constraint of the other.
Personally I prefer to minimize the number of hidden nodes subject to the constraint that the mean-squared-error is less than one-hundredth of the mean target variance.
Hope this helps.
Greg

Categories

Find more on Sequence and Numeric Feature Data Workflows in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!