You are now following this question
- You will see updates in your followed content feed.
- You may receive emails, depending on your communication preferences.
is Matlab code a lot faster than Excel formulas?
52 views (last 30 days)
Show older comments
I have a very large Excel spreadsheet that I use to do an extensive calculation with many iterations. The calculation is done by spreadhseet formulas, and the iteration is done by a VBA macro.
The spreadsheet is 1000 columns X 10,000 rows, and one iteration takes about 30 secs. A typical calculation "run" has 30,000 iterations, so total calculation time is ~ 10 days
I'd like to speed this up by coding the spreadsheet in Matlab. I read it is up to 100 X faster than Excel, although to make it worth it to me, 10 X would be enough.
Given the above, is a 10 X speed increase a reasonable expectation when converting Excel formulas to Matlab code?
I'm a new Matlab user, and very grateful for your input. :-)
6 Comments
dpb
on 10 Jan 2023
It will depend greatly on how the code is written, but in general one would expect significantly faster execution with computation code. If the model can be extracted from the spreadsheet, using just the data as input and then writing its output when computed, I'd venture quite probably will see a significant improvement.
This will, of course, require that you're not going to rely on builtin solver for other similar toolsets.
John D'Errico
on 10 Jan 2023
Any code can be fast or slow. It all depends on how well the code was written, on how well the programmer understands the numerical methods in volved. And sometimes there are better ways to do any computation.
Can we know what you did in the spreadsheet, and why it is so slow? OF COURSE NOT! Sadly, the odds are good that if your knowledge of MATLAB is poor and your skills at numerical computing are poor, then the MATLAB code you would write might just mimic exactly what you did in Excel. And then, it will still probably be slow. Sorry. But this is only a guess. The iterative solver used by you in Excel may just be a poor way of solving the problem. There may be better tools in MATLAB to solve the problem, or possibly not.
For example, I can give the example of a coupled system of nonlinear PDEs I solved long ago in Excel. It took a seriously long time to converge (at least hours, I recall.) But that was me, lacking any real expertise in solving those problems. That is not to say I am an expert in the subject now, but I can be positive that my solution now would be far better, done in MATLAB. Of this I am quite confident.
Anyway, what you don't want to do is to just code the spreadsheet in MATLAB. You won't be happy, and then you will give up, thinking MATLAB is no better than Excel. Solve the problem, using the tools in MATLAB. Don't just execute code that mimics a spreadsheet.
And of course, we don't even know what the problem you are trying to solve. So all of this is just wild speculation.
Jim Riggs
on 10 Jan 2023
If you have code that takes ~10 days to run in Excel and you need to repeat this or update the result many times, then you realy should be using a compiled code. I have performed benchmarks in the past comparing Fortran with Matlab, and the identical implementation runs 500 to 1000 times faster in compiled Fortran vs. Matlab. (And Python is another factor of 6 slower than Matlab.)
Matlab has a compiler which creates standalone executable code - this would be one solution to getting that exponential improvement in speed. Or you might consider coding it in C/C++.
A 500x improvement would get that 10 days down to under 1/2 hour. But if Excel macros are more like Python, you might get 3000x to 6000x improvement using compiled code. That would get you to 2.5 to 5 minutes. It's definitely worth exploring.
John D'Errico
on 10 Jan 2023
While Jim is correct, in that code can be compiled to improve it, I'd point out that it may be missing an important fact. For example, suppose the excel solver is simply solving a large linear system of equations, using iterative methods. Great, except that is something MATLAB will do VERY well. And the code is already compiled. So you gain nothing by use of compiled code there. What really matters is the algorithmic improvement, of using a better method to solve the problem.
The issue is, we know nothing about what is being solved. Is this a problem that is amenable to the use of a better tool to solve an easily solved problem? Until then, there is no way to offer good advice.
Berliner JFK
on 11 Jan 2023
Thank you for your answers. Very helpful and thought-provoking!
The spreadsheet is simple, just a lot of conditional equations and some basic math. It is so slow is because the amount of formulas calculated per iteration: 1000 X 10000 = 10 million formulas
Based on your responses, it sounds like what I want to do is "code" the spreadsheet so that it does exaclty what it's doing now, just a lot faster.
So ... given a relatively uncomplicated but huge Excel spreadsheet, could coding it in Matlab result in a 10 X speed increase?
Accepted Answer
John D'Errico
on 11 Jan 2023
Edited: John D'Errico
on 11 Jan 2023
A problem is that you have said it is a lot of conditional expressions. The thing is, MATLAB will be fast, IF you can use the capabilities of linear algebra to your advantage. It will be fast, if you can use vector and array operations. Then MATLAB can use tools like the BLAS to execute code blazingly fast. So if you were solving a huge linear system of equations, even with many thousands of unknowns, expect MATLAB to go like blazes. For something like that, I would expect to see a speed boost of thousands to 1, especially since you are using iterative methods to solve your problem.
The problem arises with branches, tests, etc. That forces MATLAB to be far less efficient. It prevents the efficient operations that make MATLAB fast. If your code is something in the middle, the time will be something in the middle too.
But what you are telling us is this is just going to be hand coded operations on thousands of cells, with many branches and tests. That will be laborious to code, as well as offering no real gain in speed.
Another place to gain would come from the use of automatic multi-threading. MATLAB can do this on many simple problems, so that all of the cores on your CPU can be used at once. However, that can never happen when you have branches. So if you are just adding each element between a pair of very long vectors together, this can be farmed out efficiently (and automatically) to multiple cores. Conditional operations prevent that from happening, so one core is all you will ever be able to use on highly conditional code.
Seriously, we still don't know enough about what you are doing to know if it will be faster to use MATLAB. I might expect some speed boost, but how much of a boost will depend on how much you can use vector and array operations. It would strongly depend on how well you can write that code using the capabilities of MATLAB. But if you just recode the computation from each spreadsheet cell into one line of MATLAB, you will probably be sorry. This is a mistake many people make when they migrate to MATLAB from a spreadsheet, in trying to treat MATLAB as if it is just a spreadsheet tool.
(I even considered trying to build a large iterative spreadsheet, just to compare the speed of the same code in MATLAB, but this would seem almost impossible to come up with a representative spreadsheet that MIGHT be comparable to what you would be doing. And then of course, I'd need to avoid using any tricks I know as a skilled user of MATLAB, that you might not know. Hmm. I might be able to do something though...)
26 Comments
Berliner JFK
on 11 Jan 2023
Edited: Berliner JFK
on 18 Jan 2023
Thank you very much, John. It is an interesting problem!
In fact, what you say is exactly what I had planned: translate the Excel spreadsheet into Matlab code 1:1. And the spreadsheet does in fact consist 90% of huge arrays of if-then formulas and very little actual math (here is a link to a very small, simplified version of the spreadsheet: EXCEL FILE).
The purpose of the spreadsheet is to input time-based data, and using iteration, identify patterns which can be used to make decisions.
note: the linked sample spreadsheet would probably finish all iterations in less than 5 minutes. The spreadsheet in question (10 days calculation time) is much larger and more complex, but conceptually identical.
Am I on the wrong path trying to convert this particular spreadsheet into Matlab code for a 10 X performance gain?
Thanks!
dpb
on 11 Jan 2023
Edited: dpb
on 11 Jan 2023
Yes, I believe you are on the wrong path in trying to convert the spreadsheet to Matlab directly; as John points out, only being able to bring the power of linear algebra or otherwise vectorize the algorithm; to mimic the 1M spreadsheet cells would likely gain little plus be incredibly painful to code/debug.
OTOH, it is quite likely that a natively-written MATLAB implementation of THE ALGORITHM being solved could/would be significantly faster. This presumes the algorithm is one in which the cells contain the same tests, just being duplicated for a thousands of cells, not that every cell contains some different computation.
So, give us a reference to the algorithm and it's probable somebody can provide hints on implementing it.
Berliner JFK
on 12 Jan 2023
First of all, thank you @dpb, @John D'Errico, @Cameron, @Jim Riggs and @Bruno Luong for the input. Very grateful, very helpful!
I spent most of the last two days reading your posts, studying my spreadsheet and thinking.
Here are my conclusions:
- it's worth it to code the spreadsheet - although this use case may not be typical, there will be others in the future, it's worth the investment
- it will likely increase iteration speed - it seems that the concept of coding a spreadsheet to improve speed is sound (with caveats)
- caveat: the code will need to be well-written - a 1:1 translation of the spreadsheet will probably not produce the desired result
For now, I'll start with the basic tutorials on this website. I'm not a programmer, but have coded before.
If you have any tips for me starting out, I'm open. Very eager to learn, and again grateful for your help!
dpb
on 12 Jan 2023
My only "tip" if you want pointers specific to the problem at hand remains the one given above -- show us the algorithm you're trying to implement; I doubt anybody here is going to go and try to ferret out the problem statement by reading Excel functions in a spreadsheet.
Berliner JFK
on 14 Jan 2023
Edited: Berliner JFK
on 15 Jan 2023
The first attempts importing from Excel were disappointing. The importing was very slow if I let Matlab determine the input range. In that case it would take 30+ minutes. If I specified the exact range it was much faster, but I may not have that luxury. I also noticed that Matlab only used one CPU core.
Regarding only using one core, I know other languages do that as well (C++ for instance) but I also know Excel uses multiple cores, which made me doubt whether Matlab with one could be faster than a spreadsheet with 8 (in my case).
So just my first dose of reality: very long import times of Excel files, and also only using a single core which seems slow v. Excel which natively uses all.
One final thought. As I've considered my situation more, it's becoming about speed. 10 X is probably not enough anymore. 100 X would be better, and as @Jim Riggs said, that may require coding in FORTRAN or C++, we'll see.
Just a few thoughts and some updates, once again with my deepest gratitude for all the excellent answers and guidance you've given me!
Walter Roberson
on 14 Jan 2023
Note that xlsx files are actually zip'd directories containing (mostly) XML text files. So if you had a cell that contained (for example) 18.3387 then the uncompressed files will at some point contain the literal text '18.3387' in a context that indicates which cell it is relevant for. Then if the next cell to the right is (say) 17.1161 then the file would contain '17.1161' together with the full overhead indicating which cell it is.
For example part of the internal file for those two values is
<sheetData><row r="1"><c r="A1" s="0"><v>18.338699999999999</v></c><c r="B1" s="0"><v>17.116099999999999</v></c></row></sheetData>
This is not something that can easily be imported by multiple cores. To import it by multiple cores you would need to run a pass that identified the boundaries of each <row and ask different cores to parse the text for different rows and return row and column and value and type information that then got merged together. Possible, perhaps, but it would take a fair bit of work to make it practical. And that parsing cannot start until the entire file is decompressed...
Bruno Luong
on 14 Jan 2023
Import the data from excel once then save it in MATLAB mat format and forget about importing foreever.
Berliner JFK
on 14 Jan 2023
Hello @Walter Roberson, thank you very much for that explanation. I suppose that's the price to be paid for the ability to import directly from Excel. I again expected too much, a serious flaw of mine.
OTOH I'm thinking of coding the spreadsheet in VBA as a precursor to Matlab or C++. VBA has direct access to the data, which simplifies things, and VBA is simple enough that it can be easily ported over later.
One of the things that I do to overcome the fact the VBA only uses one core is to start multiple Excel instances and split the data over those instances.
@Bruno Luong thank you again for your excellent suggestions, really appreciated.
The nature of my work means that datasets are always changing and new ones always need to be imported. Your point is good, though, because many times the data set does not change, or only parts of it, so perhaps it's not as big of a deal as I thought. Good thoughts and again, thank you very much!
dpb
on 15 Jan 2023
Edited: dpb
on 15 Jan 2023
I've not used such large Excel files so haven't ever noticed any real speed issues with MATLAB reading them with readtable or readmatrix but as @Walter Roberson notes, the common .xlsx file format isn't built for efficiency.
Try saving one as .xlsb instead and see what happens -- although it may still be somewhat tedious.
Then again, if it's going to take 7-10 days for Excel to run, then even an hour in input if the calculations are swift is a small price to pay...
However, I just tried
>> C=rand(100,10000);
>> tic;writematrix(C,'BigExcel.xlsx'),toc
Elapsed time is 10.762080 seconds.
clear C;tic;C=readmatrix('BigExcel.xlsx');toc
Elapsed time is 12.234982 seconds.
so, that size workbook doesn't seem to be all that bad if it contains just data...I didn't load it up with formulas...
However, somewhat to my surprise, the same experiment with .xlsb took longer than I cared to wait so I Ctrl-C'ed out of it...so, that may not be any help, after all, with pure high-level MATLAB.
However, you can also get much more clever with MATLAB and use COM judiciously...
>> cwd=cd;
fn=fullfile(cwd,'BigExcel.xlsx');
tic;
excel=actxGetRunningServer('Excel.Application');
Workbook = excel.Workbooks.Open(fn);
wbk=excel.ActiveWorkbook;
wksht=wbk.ActiveSheet;
usedRange=wksht.UsedRange;
C=wksht.UsedRange.Value;
toc
Elapsed time is 2.148298 seconds.
>>
>> excel.ActiveWorkbook.Close(false)
>> excel.Quit
>> delete(excel)
>>
So, there's a lot of extraneous overhead in the higher-level routines that don't necessarily need...
Of course, also note that there are lots of ways to make VBA code much more efficient -- including the above syntax to load an array instead of what you normally see of loading in a FOR...NEXT loop element-by-element.
Also, cell formulas are notoriously slow to call from VBA -- one would think that would be the fast way to do things and calling one would be highly optimized by the compiler, but experience has shown that is not the case.
Bruno Luong
on 15 Jan 2023
Edited: Bruno Luong
on 15 Jan 2023
Again it's all depends what kind of processing you'll do. MATLAB lately is very fast for for-loop with pure simple arithmetics and conditional statements, many vectorized functions are multi-threaded, and if you have good Nvidia graphic card and parallel computing toolbox you can perform GPU calculation and parallel processing to further speed-up. I use to do a lot of mex C programming to speed up MATLAB, but not anymore since few years.
dpb
on 15 Jan 2023
Edited: dpb
on 15 Jan 2023
I filled the above "BigExcel.xlsx" file with the formula =RAND() in every cell and read it the same way -- the timing was essentially the same as above returning either .Value or .Formula with the formula actually being somewhat faster as it appears Excel reevaluated the function each time when extracting the values to ensure got lastest results. One would have to set the recalculation flag from Automatic to avoid this (would want to do that anyway in such a scenario to avoid recomputing the entire sheet when only updating the given cell from VBA).
The venerable xlsread might outperform the newer functions as they don't have quite as much in the builtin attempt to decipher the file content as do the new routines (user convenience comes at a price in that regards). The issue with all of them is unless you do all the work in memory and only read/write the whole spreadsheet or significant fractions of it when calling them, all these routines open/close the file including creating/destroying the COM engine connection every call as they are designed as standalone functions to be able to be called independently without any knowledge of any other use on the same file. This is fine when reading/writing full files or even for a few cells at a time for small spreadsheets, but it does not scale well at all and will be painfully slow if it doesn't actually crash for larger spreadsheets trying to read/write in a tight loop. I ran into this on much smaller files than these (300 x 40 or so) and the writecell routine (which calls writetable to do the actual work) actually crashed.
The solution is that one must revert to COM in some form or the other; the problem with writing to the same sheet has been recognized "since forever" and there are several FEX contributions that work around the problem by creating and keeping open the COM connection first, then doing all the operations on the file before closing it. Other than writing it all in COM directly, the best of these I have found and that I use to solve the above problem is at <xlsWriteEx>.
I've made some additional modifications to the above in exposing the handle to the current Excel COM server to the caller as well; this allows one to intermix additional COM commands besides just the higher-level interface to write data -- for example, to format cells or to directly set the .Formula instead of the .Value object that isn't possible through the original higher-level calling syntax/function.
Berliner JFK
on 15 Jan 2023
Hello @dpb, thanks again for your help. :-)
To start, here are my results (identical to yours):
C=rand(100,10000);
tic;writematrix(C,'BigExcel.xlsx'),toc
Elapsed time is 12.610278 seconds.
clear C;tic;C=readmatrix('BigExcel.xlsx');toc
Elapsed time is 11.839560 seconds.
FWIW, I do use xlsb binary files. I remember being forced to do so because the xlsx files were crashing, or because I used macros, I don't remember.
With your encouragement, I gave it another try today and here are my results:
tic
DASHBOARD=readmatrix('MATLAB.xlsb','Sheet','DASHBOARD','DataRange','D10:K253');
toc
Elapsed time is 28.617214 seconds.
tic
TEMPLATE=readmatrix('MATLAB.xlsb','Sheet','DASHBOARD','DataRange','L10:S253');
toc
Elapsed time is 30.069370 seconds.
tic
SORT=readmatrix('MATLAB.xlsb','Sheet','SORT','DataRange','D2:CS3608');
toc
Elapsed time is 31.568575 seconds.
tic
CLOSE=readtable('MATLAB.xlsb','Sheet','CALC','DataRange','A6:C3612');
toc
Elapsed time is 30.357699 seconds.
tic
DATA=readmatrix('MATLAB.xlsb','Sheet','CALC','DataRange','BYE6:CBT3612');
toc
Elapsed time is 32.258974 seconds.
This particular spreadsheet is not very large, but still that only 2.5 minutes, which is great progress. I need to try again with a larger file.
The only change I made was to use fixed ranges instead of letting MATLAB "find" the data, i.e.:
before ...'DataRange','BYE6');
after ...'DataRange','BYE6:CBT3612');
The only thing that's interesting is that the matrices all loaded in about 30 seconds although each is very different in size. I guess, like you wrote, that's because the Excel file closes and reopens after each read?
Berliner JFK
on 15 Jan 2023
@Bruno Luong thank you for this information:
"MATLAB lately is very fast for for-loop with pure simple arithmetics and conditional statements...if you have ...parallel computing toolbox you can speed-up MATLAB."
I do have the parallel computing toolbox, so very encouraging!
C++ MEX functions will be a future option.
Thank you again. :-)
dpb
on 15 Jan 2023
"...the matrices all loaded in about 30 seconds although each is very different in size. ... because the Excel file closes and reopens after each read?"
That's some of it, but notice the COM engine took only 2+ sec for the same size file or 10-15X quicker.
I didn't see how much was opening the file and how much was actual data transfer; I'd guess a good chunk is in the setup and opening, yes, but let's see...
>> tic;
excel=actxGetRunningServer('Excel.Application');
Workbook = excel.Workbooks.Open(fn);
wbk=excel.ActiveWorkbook;
wksht=wbk.ActiveSheet;
usedRange=wksht.UsedRange;toc
tic;C=wksht.UsedRange.Value;toc
Elapsed time is 2.380182 seconds.
Elapsed time is 0.813473 seconds.
>> tic;C=wksht.UsedRange.Value;toc
Elapsed time is 0.806535 seconds.
>> excel.ActiveWorkbook.Close(false)
>> excel.Quit
>> delete(excel)
>>
So of the 3 sec the combined took this time, 75% was in the setup and open; you can break that down into the other pieces if interested, but the actual data transfer isn't the bottleneck.
The new functions are not m files so one can't go spelunking to see how they're built, but I'm pretty sure a lot of the overhead is in the exploratory code built into the methods to try to ascertain the data types and so on to make them more user friendly.
Just to see, I took the previous sheet and copied the first page and then reduced the number of columns in the second page by roughly half...
>> tic;
excel=actxGetRunningServer('Excel.Application');
Workbook = excel.Workbooks.Open(fn);
wbk=excel.ActiveWorkbook;
wksht=wbk.ActiveSheet;
usedRange=wksht.UsedRange;
toc
tic;
C=wksht.UsedRange.Value;
toc
excel.Worksheets.Item(2).Activate
usedRange=wbk.ActiveSheet.UsedRange;
tic;C=wksht.UsedRange.Value;toc
Elapsed time is 4.615550 seconds.
Elapsed time is 0.798654 seconds.
Elapsed time is 0.820399 seconds.
>> excel.ActiveWorkbook.Close(false)
>> excel.Quit
>> delete(excel)
>>
In a one-time test, the second, smaller sheet actually timed out a little longer to read than the first, I'd expect that to not be true over a large number of trials unless there's a dependency in Excel upon which sheet is being read for workbooks with more than one sheet. I have no information on that level of Excel internals...
I'm not sure why the opening time has seemed to jump up from the first few trials I ran...
What this exercise demonstrates, however, is that by using COM you can essentially eliminate the overhead of the subsequent sheets being read -- and note that it is using the internal Excel UsedRange property so that you don't have to manually figure out what ranges to try to read on each sheet. Beware, however, that if you go in and stick an extra cell in somewhere while debugging, that will be reflected in the range returned until that cell/row/column has been deleted. With such large worksheets, the likelihood of much manual editing of that type would seem minimal, however.
However, the previous comments still hold -- even if it takes 5-10 minutes to read, if you can speed up the calculations, that's probably still acceptable.
On .xlsb vis a .xlsx -- if, indeed, you do have macros, then you can't use the .xlsx file format; you will have to use .xlsb or .xlsm -- I forget just what is the difference, but .xlsx files can't store macros.
Berliner JFK
on 16 Jan 2023
Hello @dpb!
Thank you again. :-)
Here is the "readmatrix" method with the largest file size:
tic
DASHBOARD=readmatrix('MATLAB.xlsb','Sheet','DASHBOARD','DataRange','D10:K253');
toc
Elapsed time is 155.723040 seconds.
tic
TEMPLATE=readmatrix('MATLAB.xlsb','Sheet','DASHBOARD','DataRange','L10:S253');
toc
Elapsed time is 217.319510 seconds.
tic
SORT=readmatrix('MATLAB.xlsb','Sheet','SORT','DataRange','D2:CS14429');
toc
Elapsed time is 229.705674 seconds.
tic
CLOSE=readtable('MATLAB.xlsb','Sheet','CALC','DataRange','A6:C14433');
toc
Elapsed time is 233.959225 seconds.
tic
DATA=readmatrix('MATLAB.xlsb','Sheet','CALC','DataRange','BYE6:CBT14433');
toc
Elapsed time is 225.827494 seconds.
About 20 minutes, which like you said, would be acceptable.
Here is your COM method:
PATH = cd;
FILENAME = fullfile(PATH,'MATLAB.xlsb');
tic
EXCEL = actxserver('Excel.Application');
MATLAB = EXCEL.Workbooks.Open(FILENAME);
toc
Elapsed time is 128.306727 seconds.
EXCEL.Worksheets.Item(1).Activate;
MATLAB = EXCEL.ActiveWorkbook;
SHEET1 = MATLAB.ActiveSheet;
tic
DASHBOARD = SHEET1.Range('D10:K253').Value;
toc
Elapsed time is 0.363739 seconds.
tic
TEMPLATE = SHEET1.Range('L10:S253').Value;
toc
Elapsed time is 0.003894 seconds.
EXCEL.Worksheets.Item(3).Activate;
MATLAB = EXCEL.ActiveWorkbook;
SHEET3 = MATLAB.ActiveSheet;
tic
SORT=SHEET3.Range('D2:CS14429').Value;
toc
Elapsed time is 0.796004 seconds.
EXCEL.Worksheets.Item(4).Activate;
MATLAB = EXCEL.ActiveWorkbook;
SHEET4 = MATLAB.ActiveSheet;
tic
CLOSE=SHEET4.Range('A6:C14433').Value;
toc
Elapsed time is 0.040375 seconds.
tic
DATA=SHEET4.Range('BYE6:CBT14433').Value;
toc
Elapsed time is 0.930530 seconds.
tic
EXCEL.ActiveWorkbook.Close(false)
EXCEL.Quit
delete(EXCEL)
toc
Elapsed time is 15.704085 seconds.
That's less than 2.5 minutes total, very good!
So thanks to you, that problem is solved for the moment and I can move on to setting up some large logic matrices, which account for the bulk of the spreadsheet.
Have a good day, @dbp!
dpb
on 16 Jan 2023
When you get to such monstrosities as this, reverting to lowest-level is almost going to be mandatory no matter how you go at it...even in C/C++ you'd end up either finding a routine or writing one that encapsulated the above; MATLAB is a "rapid development" environment and tries to minimize development time by providing high level interfaces to almost everything; that user convenience of just one or two lines of MATLAB code comes at a cost in performance that may not make the simple solution practical for all problems.
Out of curiosity, how long does it take for Excel to open the workbook interactively?
dpb
on 16 Jan 2023
Edited: dpb
on 16 Jan 2023
"I can move on to setting up some large logic matrices, which account for the bulk of the spreadsheet...."
I'd caution you to not fall into the trap of duplicating the logic by replication as must in Excel but to replace those with simply data arrays being operated on by whatever is the (still unexplained) algorithm being applied.
How much actual independent, raw data is there out of all those myriad cells and what is the size of the final output? It would be humanly impossible to make any sense at all from all the intermediary million cells so the output has to be reduced to something manageable; I would think those two array sizes should be as large as the MATLAB application should have to be -- and if the input data are not the full size of the spreadsheet, then there's really no need to read the whole spreadsheet at all -- only the input data from wherever it is obtained.
Again, do NOT just reimplement the spreadsheet in MATLAB; build a MATLAB application that solves the problem using MATLAB paradigms and coding idioms.
However, we're now stuck because you've as yet not shared what that algorithm is...that, of course, is your perogative if you think you've got something new and clever that don't want to release in open forum for possible future competitive position or somesuch...
dpb
on 16 Jan 2023
..."and if the input data are not the full size of the spreadsheet, then there's really no need to read the whole spreadsheet at all -- only the input data from wherever it is obtained."
And, the corollary of this is that if it is so that the input data are only a smaller subset of the actual spreadsheet, then there's no point in wasting the overhead time of opening that huge file at all; simply create a file that has nothing but the input data in it from whatever source it is obtained.
Berliner JFK
on 16 Jan 2023
Hello @dpb, thanks again. :-)
To answer your question, the large Excel file takes about 135 secs to open directly.
Also appreciate the advice regarding how to approach converting the spreadsheet from Excel to Matlab. I hear you loud and clear and appreciate it!
If you give me some time, I'll do my best to provide more details about the inner workings of the spreadsheet. It's complicated and I assume it's not a typical Matlab problem, so I please ask for your patience.
Thank you!
dpb
on 17 Jan 2023
No problem; I've more than sufficient at hand to keep busy with...but it would/will(?) be interesting to see just what it is you're doing and how well it can be factored into MATLAB code.
As a suggestion and looking at your timing code above, it it were me (and, of course, it isn't! :) ), I'd take those variables you read above and SAVE them as .mat files and then do all my coding/testing reloading those when/if needed and avoid the spreadsheet entirely.
Also, of course, do the algorithmic coding on small datasets so that you can visualize what's going on; once an algorithm is shown to work correctly, then increasing the dataset size is trivially accomplished just by passing in the larger array(s). The only problem you may run into would be one of memory, but if you can hold the spreadsheet, I'd think MATLAB should be ok...
Anyways, I'm about out of additional ideas at the moment...been kinda' fun to think about...
Berliner JFK
on 18 Jan 2023
Hello @ dbp!
Thanks again for your help. :-)
The code for importing the data is mostly done, here it is:
EXCEL = actxserver('Excel.Application');
PATH = 'H:\Dropbox\MATLAB\';
FILENAME = fullfile(PATH,'MATLAB.xlsb');
EXCELFILE = EXCEL.Workbooks.Open(FILENAME);
SHEET = EXCELFILE.Sheets.Item('DASHBOARD');
DASHBOARD = SHEET.Range('D10:K253').Value;
TEMPLATE = SHEET.Range('L10:S253').Value;
SHEET = EXCELFILE.Sheets.Item('SORT');
SORT = SHEET.Range('D2:CS14429').Value;
SHEET = EXCELFILE.Sheets.Item('CALC');
CLOSE = SHEET.Range('A6:C14433').Value;
DATA = SHEET.Range('BYE6:CBT14433').Value;
EXCEL.Quit
delete(EXCEL)
A few questions for an expert:
- Most of my data is double precision, but COM imports it as a cell matrix. cell2mat() seems to work, but I know from my searching that there are other solutions, using (i.e. [C{:}], C++ MEX) which are faster or more elegant. How would you convert these cell matrices to double precision matrices?
- The size of my SORT, CLOSE and DATA matrices will vary depending on the size of the dataset. I found this method (which you had input on) for importing data sets of varying size. Is there another way to do this without using keystrokes?
Thank you!
PS send you a PM
dpb
on 18 Jan 2023
Edited: dpb
on 18 Jan 2023
- Yes, the .Value property is always returned as the VBA Variant type which is a cell array in MATLAB parlance. My recommendation would be to develop the code first using normal MATLAB functions and only worry about fine-tuning performance once you have a working algorithm and can profile it to find bottlenecks. So, I'd start with cell2mat and stick with it until it was proven to be a significant source of overhead. It's SO much easier to debug simple-to-read, "deadahead" code than to also have to fight through arcane manipulations and remember what was the purpose of such machinations. You might save a few msec here, but I'll be very surprised to learn it turns out to be a significant fraction of the overall run time. After all, this is only a one-time thing on startup, isn't it?
- Yes, there are ways, but not magic ones. You'll have to have some sort of breadcrumbs to lead through the forest; if your spreadsheet has header rows to identify where things are, then one ploy is to first read the header row(s) and locate columns by finding specific variable names columns. Failing that, it becomes more difficult and would have to have details of the spreadsheet content that could somehow be used as waypoints to have any further insight. I'll refloat the previous idea to move completely away from the huge single spreadsheet paradigm, however, and go back to the source of the original data from which the spreadsheet was/is created and read it from those sources. Then it should already be of the given size and content without having to break it back out again.
Berliner JFK
on 19 Jan 2023
Hello @dpb!
Thank you. :-)
1. You make an excellent point. Understood and cell2mat it is.
FWIW, when I learn a new programming language / platform, I try to learn every way possible to do something, to build up my toolbox. But like you said, the current solution is functional so time to move on.
2. Still thinking about point #2, as it has fundamental implications on how the code is written.
Thanks again!
dpb
on 19 Jan 2023
#2 is the key point every respondent to the original Q? has emphasized -- do NOT simply translate the spreadsheet to MATLAB; instead write a MATLAB application that implements the algorithm. Forget there is a spreadsheet solution and design the code from the basic requirements.
Berliner JFK
on 19 Jan 2023
For now I will just do a 1:1 code, because I have to start somewhere. Consider it an experiment at my expense. The spreadsheet is huge. If that means that everybody loses interest, then I'll have to live with that.
BUT, I believe when you see the code and what it does, though, you'll recognize that it's "dummy level" simple. Like I've said from the beginning, just a lot of if / thens and and / or's, with some basic math. I don't believe there's an algorithm in the sense everyone else means.
In fact, this is probably a problem better suited for C++, but Matlab is worth the investment so LET'S HAVE FUN! :-)
So...
The clear goal is to end up with a fast program that performs ONLY and EXACTLY the intended function with no bells, no whistles, no extra matrices, no extra data. I understand and want that too.
I throw myself at your mercy. :-)
Berliner JFK
on 13 Mar 2023
Edited: Berliner JFK
on 13 Mar 2023
First of all, thanks again to everyone in this thread. Very helpful information, very appreciated!
A quick update on the project, in case you're interested:
- your points about not making a 1:1 translation from Excel -> Matlab are fully understood. I think the confusion was my assumption that I would have to adapt the way a spreadsheet does things to the way they can be done in code, so I never planned a "literal 1:1" translation.
- although this was not mentioned before, there is existing code that provides data which the spreadsheet uses, and it is written in C++. I'm not an experienced C++ coder (or experienced in coding at all), but I did write the code and can work in that environment. From what you all have said, C++ is very likely faster than Matlab.
- I've been taking time to modify the spreadsheet in preparation for converting it to code. Much of what has accumulated over time has been removed, and much of what is helpful has been kept and even fine-tuned. That's what I'm working on mainly now.
- For the moment, I think the wisest course of action, if speed is most important, is to make the (deeper) plunge into C++. It's not what I wanted to do, because I have a Matlab license available to me and a wonderful community to help, but it seems wise.
I know Matlab offers tremendous capabilities, but I'm probably not ready for that yet. My work is predictive in nature, so I will get there eventually. I've heard that "the LSTM algorithm is the best for predicting data on a time series".
So thanks and hopefully bump into you all again soon!
More Answers (1)
Bruno Luong
on 11 Jan 2023
Edited: Bruno Luong
on 11 Jan 2023
Impossible to answer with such fuzzy question; the time depends on your hardware, your matlab version, your licences to acess to some toolbox suh as parallel computing, your data, your algorithm, your MATAB skill, ...
Here is the time of some arbitrary formula run on TMW server.
A=randn(1000,10000);
P = [1 2 3];
f = @(x) sin(polyval(P,x)).*(x>=0);
tic
f(A);
toc
Elapsed time is 0.137264 seconds.
2 Comments
Berliner JFK
on 11 Jan 2023
Thank you, Bruno.
If I understand you correctly, you're suggesting translating the above code into Excel formulas, and then comparing the speed in Excel with the speed in Matlab on the same machine?
Bruno Luong
on 11 Jan 2023
I suggest you to test the same typical but possibly simple formula you would use on Excel and MATLAB to get your own idea.
I could be 100 faster, it could be the same speed, it could be slower. So far there is not enough details in your question that can be answered with certainty.
See Also
Categories
Find more on Spreadsheets in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!An Error Occurred
Unable to complete the action because of changes made to the page. Reload the page to see its updated state.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)