How to use more than one training set for training a NARX neural network?

1 visualizzazione (ultimi 30 giorni)
I have a question concerning the training of NARX neural networks using the Neural Network Toolbox. The task is to obtain a neural black box model of a time-dependent system and, as such, to predict time series.
Now I want to train my NARX network not only with one training set, i.e. one time series, but with several ones. In my case, this is important since it is important that the network captures the real system's behavior in the beginning of a time series. Hence, if I only use one training series, there is only one opportunity for the network to learn that behavior.
Again, to clarify: Can I somehow join several time series into one and use it for training, or can I do the training with several time series?
If an example might help, consider this: Say I want to train the NARX network such that it approximates a timeseries y(t)=t^2 + u(t). Now, it doesnt make full sense to create only one training data set with a varying u over time, say u(t)=t: Since the system itself is time-dependent, how can the network ever learn what would happen e.g. in the beginning if I would use u(t)=sqrt(t) and then predict the according time series? Hence, I would like to create several training data sets with say u(t)=t; u(t)=sqrt(t); u(t)=t^2 so that the network might interpolate between them. The final question to be answered is: How can I train the network using all these three training data set - at once?
I hope I could describe my issue in due detail.
Thanks a lot in advance! Robert

Risposta accettata

Greg Heath
Greg Heath il 5 Lug 2014
1. The system is not time-dependent because none of the weights depend on time.
2. Polynomial functions, sinusoids and their products satisfy time-invariant difference equations. Therefore can be represented by time-invariant nets.
3. Consider
y(t) = (a*t^2 + b*t + c)*( d*cos(t) + e*sin(t))
It is not difficult to show that y(t) can be determined knowing y(t-dt) and y(t-2*dt).
4. My approach
a. Subtract a polynomial fit.
b. Standardize to zero-mean/unit-variance and delete or modify outliers.
c. Find the statistically significant lags of the target auto-correlation and target-input cross-correlation functions.
d Use the lags to fit Ntrials candidate narx models for each value of hidden nodes in the range Hmin:dH:Hmax.
e. Choose the designs with the best validation set performance.
f. Obtain unbiased estimates of performance and confidence levels on unseen data from the test set performances.
Hope this helps.
Thank you for formally accepting my answer
Greg
  3 Commenti
Robert
Robert il 5 Lug 2014
Dear Greg,
thank you for your answer and the clarification on why that system is not time-dependent.
Unfortunately, I don't see a connection between the rest of your answer and my actual question. The approach you provided seems to me that it just consists of training Ntrials different NARX models, each with a different number of hidden units, and then select the best one. While this approach generally is quite good, I don't see how this helps me with my issue. If I misunderstood you, I would be happy if you could clarify how your approach can solve my issue to train a narx model with more than one training set.
Let me illustrate my problem example of the initial post: In the attached picture, you see three training data sets (u,y) that I might use you training. Now I would like to make the model be accurate for all those three sets (and maybe more). Hence, it does not make sense just to train the model with one of them and then hope for the best for the other u(t). How can I use all three training sets at once during training?
One naive appraoch would be just to concatenate the three series (some care has to be taken on the connections of the series because of the narx's delays).
Is that a viable approach or can you provide a better one?
Dominic
Dominic il 30 Lug 2014
Dear Greg,
Your approach seems to be pretty straight forward. Although I need to admit I have no idea how to do the step c which you mentioned. I am working with a 6 dimensional Input and I am using an Array with 200 Matrices with each a couple of thousand timesteps.
When I am using the crosscorr function like you mentioned in one of your other posts, I need to set the Lags to 500 in order to see some kind of Peak. This seems to be quite a high number....
So far I used the basic configuration with 10 nodes and 2 delays and it seemed to work pretty fine on the data I used for training but the performance on non trained data was really bad.
Just let me check if I understood the rest of your approach. a. Subtract a polynomial fit. I guess the dimension has no correlation with the network itself, has it? b. Standardize to zero-mean/unit-variance and delete or modify outliers. Standardize the output from a) c. Find the statistically significant lags of the target auto-correlation and target-input cross-correlation functions.
Use the the data from b and crosscorr but what are my significant lags? and how to I get the threshold?
d Use the lags to fit Ntrials candidate narx models for each value of hidden nodes in the range Hmin:dH:Hmax.
I guess it is the number of lags you are speaking of, aren't you? And to use them as inputs for the delay configuration of the network.
e. Choose the designs with the best validation set performance.
f. Obtain unbiased estimates of performance and confidence levels on unseen data from the test set performances.
Thank you for your answers.
Kind regards
Dominic

Accedi per commentare.

Più risposte (1)

Greg Heath
Greg Heath il 6 Lug 2014
You are not going to be able to do it that way.
1. The net has to recognize which waveform is the input.
a. How many waveforms do you have?
b. How long are they?
2. The weights, not to mention the significant delays, for each input will be different.
I have designed two stage classifiers where the first stage determined which of several second stage classifiers in parallel will complete the classification.
In your case, you could design a first stage classifier to determine which waveform is being input. Its output will then be directed to a second stage containing several parallel nets where each is designed for one type of input.
Hope this helps.
Thank you for formally accepting my answer
Greg
  1 Commento
Robert
Robert il 6 Lug 2014
If by waveform, you mean possible "shapes" (i.e. constant, linear, quadratic?) of input, I can not say how much there will be. It depends on how muck different training sets I want to create. This, in turn, depends on how well the NARX does with relatively few waveforms, say about 3. Each one might have approx. 500 elements.
I am not sure how a two stage classifier would do here. Infact, it is not the case that I have certain types of waveforms a know a priori. An input to be evaluated may be arbitrary and not clearly of one certain type of waveform.

Accedi per commentare.

Categorie

Scopri di più su Sequence and Numeric Feature Data Workflows in Help Center e File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by