vecm model for high frequency trading

4 views (last 30 days)
Niccolò Ghionzoli
Niccolò Ghionzoli on 31 Jan 2022
Answered: Vinayak on 9 Feb 2024
Good morning, I am a student and I have to analyse a dataset based on LOBSTER data, Microsoft level 5, 21st June 2012. I hope that someone in the community can help me because I don't know how to go on.
https://lobsterdata.com/info/DataSamples.php
I need to estimate a vecm model, but the function that I am using does not work at all.
https://it.mathworks.com/help/econ/modeling-the-united-states-economy.html
Any suggestions to improve my code? Please find attached the code I am developing.
In addition, I report a fact which might be linked to the problem, which might help to solve the problem.
Last saturday evening I tried again to run the code and after trying to manipulate the input data in order to transform data_vector in a sort of time-varying vector, I got some similar results in H1 and H2 Johansen test (rejection up to 7 for H2 and rejection up to 8 for H1), and the Likelihood ratio test worked. However, the vecm continued to return errors and I modified something, but now H1 rejects only up to 2 as before saturday evening and the LR test does not work at all.
The problem lies in this part of the code. Do not consider the section related to ADF and KPSS test. Please find below the faulty code.
data_vector = [log_book_1_ask log_book_1_bid lev_1_ask lev_2_ask lev_3_ask lev_1_bid lev_2_bid lev_3_bid BUY_t SELL_t];
P = 15 % number of lags
[h,pValue,stat,cValue,mleH2] = jcitest(data_vector, 'lags', P-1, 'Model', 'H2');
[h,pValue,stat,cValue,mleH1] = jcitest(data_vector, 'lags', P-1, 'Model', 'H1');
%the likelihood ratio test
r = 7; % Cointegrating rank
uLogL = mleH2.r7.rLL; % Loglikelihood of the unrestricted H2 model for r = 7
rLogL = mleH1.r7.rLL; % Loglikelihood of the restricted H1 model for r = 7
[h,pValue,stat,cValue] = lratiotest(uLogL, rLogL, r);
%higher value for uLogL
% %create the VEC model object
time_message = mess(:,1)
my_time = time_message(1:end)
%
[Mdl,se] = estimate(vecm(size(data_vector,2),r,P-1), data_vector, 'Model', 'H2');
toFit = vecm(Mdl.NumSeries, Mdl.Rank, Mdl.P - 1);
toFit.Constant(abs(Mdl.Constant ./ se.Constant) < 2) = 0;
toFit.ShortRun{1}(abs(Mdl.ShortRun{1} ./ se.ShortRun{1}) < 2) = 0;
toFit.Adjustment(abs(Mdl.Adjustment ./ se.Adjustment) < 2) = 0;
Fit = estimate(toFit, data_vector, 'Model', 'H2');
B = [Fit.Cointegration ; Fit.CointegrationConstant' ; Fit.CointegrationTrend'];
figure
plot(my_time, [data_vector ones(size(data_vector,1),1) (-(Fit.P - 1):(size(data_vector,1) - Fit.P))'] * B)
title('Cointegrating Relations')
  1 Comment
Niccolò Ghionzoli
Niccolò Ghionzoli on 31 Jan 2022
Warning: Rank deficient, rank = 126, tol = 5.417920e-08.
> In vecm/estimate>johansen (line 1295)
In vecm/estimate (line 643)
Warning: Rank deficient, rank = 126, tol = 5.417920e-08.
> In vecm/estimate>johansen (line 1296)
In vecm/estimate (line 643)
Warning: Rank deficient, rank = 126, tol = 5.417920e-08.
> In vecm/estimate>johansen (line 1342)
In vecm/estimate (line 643)
Out of memory.
Error in varm/estimate (line 417)
D{t-P} = Z(:,solve).*W(t);
Error in vecm/estimate (line 886)
[MDL,sigmaVARX,logL,residuals,errorCovarianceBlocks] = estimate(VAR, dY, 'X', [Y1(P+1:end,:) X],
'MaxIterations', maxIterations);
Related documentation
The errors which I got by running the above function.

Sign in to comment.

Answers (1)

Vinayak
Vinayak on 9 Feb 2024
Hi Niccollo
The code you shared is designed to analyze intraday trading data for the Microsoft stock (MSFT) obtained from LOBSTER (Limit Order Book System Theoretical and Empirical Results) data files. The rank deficiency warning suggests lack of independent variables, you should focus on removing redundant variables and reducing dimensionality:
  1. Variable Selection: You can identify redundancy based on correlation matrix and select the variables you want to consider based on your requirements.
correlation_matrix = corr(data_vector);
highly_correlated_pairs = find(abs(correlation_matrix) > 0.9 & eye(size(correlation_matrix)) == 0);
variables_to_remove = unique(highly_correlated_pairs);
data_vector(:, variables_to_remove) = [];
2. Perform Principal Component Analysis to reduce dimensionality in case you still have large number of variables:
coeff = pca(data_vector);
explained_variance_ratio = cumsum(var(data_vector) / sum(var(data_vector)));
num_components_to_keep = find(explained_variance_ratio > 0.95, 1); % Keep 95% of variance
data_vector_pca = data_vector * coeff(:, 1:num_components_to_keep);
The out of memory error can happen due to large datasets, it may eradicate based on reduced variables, but you may also consider clearing memory and processing data in batches to keep the memory available.
% Define batch size
batch_size = 1000; % Adjust based on your memory constraints
% Determine the number of batches
num_batches = ceil(size(data_vector, 1) / batch_size);
% Perform VECM estimation in batches
for i = 1:num_batches
% Select data for the current batch
start_idx = (i - 1) * batch_size + 1;
end_idx = min(i * batch_size, size(data_vector, 1));
data_batch = data_vector(start_idx:end_idx, :);
% Perform VECM estimation on the current batch
P = 15; % Number of lags
r = 2; % Number of cointegrating relationships (example value)
toFit = vecm(size(data_batch, 2), r, P-1);
Fit = estimate(toFit, data_batch, 'Model', 'H2');
% Additional processing or analysis on the current batch if needed
% Display progress
fprintf('Processed batch %d/%d\n', i, num_batches);
end
For more information on VECM estimation, you may refer to the documentation:

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!