Are less lines of code always better programs? Do user-defined functions slow down MATLAB?
5 views (last 30 days)
Show older comments
Daniel Bridges
on 28 Nov 2017
Commented: Walter Roberson
on 29 Nov 2017
Searching online, there are multiple discussions about good programming practices, e.g. Michael Robbins from MIT , Raviteja's MATLAB Answers Sept 2011 question , Brett Schoelson's Jan 2011 blogpost , Michael Robbins' MATLAB Newsgroup March 2000 discussion , and some things appear to be a matter of opinion, but I wanted to hear your thoughts about minimizing lines of code, something I didn't see discussed there. Does passing data to a user-defined function cost more speed than repeating built-in functions?
For example, consider the following function. At one point I change a string's value and then repeat the loop, i.e. copy-pasting code rather than creating another function within my script. Because the code is identical, should it go in its own function? That would require less lines of code, but it would require passing more data to another function, presumably creating a second copy of it, temporarily requiring more RAM. Am I correct in thinking it will execute faster if I repeat the code within this function rather than create a second function called within it?
Or, even if it would execute faster, is it still better practice to use a function because then the code only needs to be edited once rather than in multiple locations?
function youtput = ExtractDataFromCellArray(maxsize,newvoldata,Organ)
% We must produce a table containing the dosed volumes and the dose
% intervals for each element in these vectors. This table should have
% variables Volumes, DistType, and DoseInterval.
youtput = table(zeros(maxsize,1),cell(maxsize,1),cell(maxsize,1),...
'VariableNames',{'Volumes' 'DistType' 'DoseInterval'});
index = 0;
DistType = 'planned';
for loop = 1:numel(newvoldata)
if ~isempty(newvoldata{loop})
matchingindices = strcmpi(newvoldata{loop}.DistributionType,DistType)...
& strcmpi(newvoldata{loop}.Organ,Organ);
SectionLength = length(find(matchingindices));
youtput(index+1:SectionLength+index,:) = table( ...
newvoldata{loop}.Volume(matchingindices), ...
newvoldata{loop}.DistributionType(matchingindices), ...
newvoldata{loop}.DoseInterval(matchingindices));
index = index + SectionLength;
end
end
DistType = 'blurred'; clear matchingindices % just to be safe
for loop = 1:numel(newvoldata)
if ~isempty(newvoldata{loop})
matchingindices = strcmpi(newvoldata{loop}.DistributionType,DistType)...
& strcmpi(newvoldata{loop}.Organ,Organ);
SectionLength = length(find(matchingindices));
youtput(index+1:SectionLength+index,:) = table( ...
newvoldata{loop}.Volume(matchingindices), ...
newvoldata{loop}.DistributionType(matchingindices), ...
newvoldata{loop}.DoseInterval(matchingindices));
index = index + SectionLength;
end
end
youtput(youtput.Volumes==0,:)=[];
end
Accepted Answer
Walter Roberson
on 28 Nov 2017
Remember MATLAB is copy-on-write, so any read-only access to a variable will cost only as much memory as the symbol table entry for the name pointing off to the shared definition.
When code is broken up into functions, the smaller symbol table for name searches improves performance.
That said, each function call takes work of putting the argument descriptors into the correct positions, and of creating the new workspace context. A call to an anonymous function in particular is notably slower than a non-anonymous function. If you have a tight enough section of code, then the overhead of the function call could end up being more than the work of the function itself.
To phrase that another way: Yes, it is possible that calling a function to do work could be less efficient than copy-and-paste.
"is it still better practice to use a function because then the code only needs to be edited once rather than in multiple locations?"
Most of the time, Yes!! Especially if you are passing in data structures, in which case you can potentially change the implementation without any other code having to know. There was a time when memory space was precious, and in some environments it still is, but these days programmer time is much more costly.
2 Comments
Stephen23
on 29 Nov 2017
Edited: Stephen23
on 29 Nov 2017
"Yes, it is possible that calling a function to do work could be less efficient than copy-and-paste."
Hmmm... only according to a very narrow definition of "efficient". Unless that code is being called trillions of times processing CERN data it is unlikely that the user is going to notice the difference between 0.0001 seconds and 0.01 second processing time.
It has been pointed out by programmers much better than I that code is read more times than it is written, and that the code's intent should be clear, why and how it works should be represented in how the code is constructed and laid out. Defining well-tested functions for repeated code is a perfectly reasonable example of this, and will ultimately save more time than almost any other so-called "optimizations".
Beginners severely underestimate the importance of things like testing, debugging, documentation, code clarity, code maintenance, compatibility, etc. All of these are much easier to achieve with functions than with copy-and-pasted code. Copy-and-paste just increases the risk of mistakes, makes finding them harder, and testing for them almost impossible.
"Because the code is identical, should it go in its own function?"
Yes, it should.
Walter Roberson
on 29 Nov 2017
"Unless that code is being called trillions of times processing CERN data it is unlikely that the user is going to notice the difference between 0.0001 seconds and 0.01 second processing time."
In reasonable C and C++ compilers (and even the better fortran compilers), there is the possibility of "inlining", which automatic copying of a function's body into the calling routine (with automatic renaming of variables as required.) This increases code size and compile time, but removes the overhead of making a function call. Furthermore, because the body of the function gets exposed to the context, better compilers can do dead-code removal: that is, they can see from the calling context that some parts of the called routine would not be used and will remove that code.
Imagine, for example, automatically removing many of the various size checks and type assertions in commonly called MATLAB routines because from the calling context it was possible to prove that the tests would not fail.
A change to use inlining can in some cases allow code to fit entirely inside the highest performance instruction cache with only a single conditional test (to loop), thus allowing much higher performance than might otherwise be possible.
More Answers (0)
See Also
Categories
Find more on Logical in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!