Is it possible to avoid copy-on-write behavior in functions yet?

Question

Christopher on 3 Oct 2017

3
Link

Direct link to this question

https://nl.mathworks.com/matlabcentral/answers/359410-is-it-possible-to-avoid-copy-on-write-behavior-in-functions-yet

Commented: Benjamin Kraus on 13 Jun 2025

As I understand, MATLAB has used a system called 'copy-on-write' for function calls. So if you have a function of the form

 function [out] = myfunction(out,in1,in2)
 in1 = rand(1);
 in2 = rand(1);
 out = in1+in2;

MATLAB will create a new space in memory for a new copy of variables out, in1, and in2, perform the given operations on these arrays, and then copy modified arrays onto the old variable memory space if it is an output variable. This will also occur for the variable 'out', and will even occur for in1 and in2 if written as

 function [out,in1,in2] = myfunction(out,in1,in2)
 in1 = rand(1);
 in2 = rand(1);
 out = in1+in2;

Obviously, this behavior wastes time if you know that the old variable should be replaced by the new variable. I have long avoided using functions for this reason, resulting in messy code.

Is it possible to pass variables to functions by reference? If no, will this be possible in a future MATLAB?

EDIT:

A commenter noted that since the inputs in1 and in2 are defined in the function they do not need to be passed through the function. Perhaps the following better describes the problem:

 function [out,ind1,ind2] = myfunction(out,in1,in2)
 ind      = 5;
 in1(ind) = rand(1);
 in2(ind) = rand(1);
 out      = in1+in2;

so the function modifies one element of each of these arrays, although the entire variables are copied before being modified.

12 Comments
Show 10 older commentsHide 10 older comments

Stephen23 on 3 Oct 2017

Edited: Stephen23 on 3 Oct 2017

" I have long avoided using functions for this reason, resulting in messy code."

That was a really bad design decision. You are going to waste a lot more time writing/testing/fixing/... messy code, compared to if you had used testable, neatly encapsulated functions. Scripts are fun for playing around with, but any reliable, repeatable, testable, efficient code should be written using functions or classes. If you read this forum and the MATLAB help you will find plenty of discussions and reasons for using functions, including that they are faster than scripts. They are certainly less buggy, easier to test, and easier to maintain.

"MATLAB will create a new space in memory for a new copy of variables out, in1, and in2,..."

Quite unlikely. You never write to the input variables inside the functions, so no copying will occur. Simply passing an argument to a function does not cause copying of that variable, only when it is written to does the variable get copied. Although you never write to them, you do allocate new variables which coincidentally have the same names and will require their own memory, but this is unrelated to the topic at hand.

https://blogs.mathworks.com/loren/2006/05/10/memory-management-for-functions-and-variables/

This is why premature optimization is a bad idea, because it leads people to write messy, unclear, impossible-to-maintain code. It is also a good example of why code should be designed to be clear and readable, rather than designed based on some esoteric ideas of efficiency: good code practices will always be correct, no matter what MATLAB version, and can be identified by the JIT compiler and optimized internally.

https://www.mathworks.com/matlabcentral/answers/228557-experts-of-matlab-how-did-you-learn-any-advice-for-beginner-intermediate-users

Alec on 13 Jun 2025

What does "IF the original variables are not shared data copies of something else to begin with." mean?

Benjamin Kraus on 13 Jun 2025

Open in MATLAB Online

@Alec: Consider this situation:

a = rand(100);
a = somefunction(a);

In the scenario above, because a is being overwritten by the output from somefunction, then somefunction can reuse the memory allocated by a and (depending on the implementation of somefunction) may be able to avoid making a copy of a to pass into the function.

Now consider this:

a = rand(100);
b = a;
a = somefunction(a);

The second line of code creates the a "shared data copy" of a and stores it in b. No new memory is allocated, and both a and b have the same value, and MATLAB knows this, so both a and b are sharing the same memory. However, this means that somefunction can no longer overwrite that memory, because it is being shared by b. If somefunction changed the value of a, this shouldn't be reflected in the value of b.

Now consider this:

a = rand(100);
b = a;
b(1) = 1; % This triggers the "copy-on-write" mechanism.
a = somefunction(a);

The second line of code creates a "shared data copy" of a, but then the third line of code modifies b. This means that a and b can no longer share the same storage. This forces MATLAB to create a duplicate copy of a and then update it for storage in b. Once you've done this, a is no longer being shared with another variable, so somefunction can go back to reusing the storage for a.

Sign in to comment.

Sign in to answer this question.

Answer 1

Guillaume on 3 Oct 2017

6
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/359410-is-it-possible-to-avoid-copy-on-write-behavior-in-functions-yet#answer_284073

Open in MATLAB Online

First, you've fallen into the trap of premature optimisation. You've decided not to use functions because they may slow your code but you don't know for sure (In all likelyhood, it's the opposite, it's easier for matlab jit compiler to optimise functions) and instead ended with messy code.

So really, the answer to your question is: stop worrying about the internal implementation of matlab until you've proven it is an issue by profiling your code. Bear in mind that the internal implementation is not fully documented and subject to change from versions to versions.

Secondly, you've misunderstood copy-on-write. In your example, copy-on-write is never triggered for any of the variables. Brand new variables are created, no copying occurs. Copy-on-write is triggered when you're modifying part of a variable but still have the original in another variable:

a = [1 2 3];
a(2) = 4;  %no copy-on-write
a = [1 2 3];
b = a;
b(2) = 4;  %copy-on-write triggered since original still in a
a = [1 2 3];
b = a;
b = [1 4 3];  %no copy-on-write since b is a different variable altogether (your example)

As for reusing the same memory when input and output are the same variable, I believe matlab jit compiler does that, but again, we're talking about implementation details that should not matter much and are subject to change.

1 Comment
Show -1 older commentsHide -1 older comments

Jan on 3 Oct 2017

+1: "trap of premature optimisation". Christopher, read this carefully. You got a lot of very valuable suggestions in this thread.

Sign in to comment.

Answer 2

Jan on 3 Oct 2017

5
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/359410-is-it-possible-to-avoid-copy-on-write-behavior-in-functions-yet#answer_284107

Edited: Jan on 3 Oct 2017

Open in MATLAB Online

See also Loren's very useful article: https://blogs.mathworks.com/loren/2007/03/22/in-place-operations-on-data/ .

You are right: When the algorithm is very efficient and processed on a multi-core machine, the memory copies can become the bottleneck. I had the same problem in an optimization tool written in C, which called a FORTRAN library for solving a huge matrix equation with a known pattern. The two deep data copies when entering and leaving the library took 40% of the total run time. Fortunately we had the FORTRAN source code and modify it to process the matrices in-place.

But now imagine we had avoided to use functions at first. As you wrote, the code would have been too messy to optimize it.

You can avoid deep data copies sometimes:

x = zeros(10000, 10000);
n = 1e6;
tic;
for k = 1:n
   x = addInSubFcn(x);
end
toc
tic;
for k = 1:n
   [xx, index] = addInCaller(x);
   x(index)    = xx;
end
toc
function x = addInSubFcn(x)
index    = randi(numel(x));
x(index) = x(index) + rand;
function [xx, index] = addInCaller(x)
index = randi(numel(x));
xx    = x(index) + rand;

R2016b/64, Win7:

 Elapsed time is 2.583763 seconds.   % In subfunction
 Elapsed time is 1.884192 seconds.   % In caller

Keep this in mind, when you create functions to modify arrays.

1 Comment
Show -1 older commentsHide -1 older comments

Tyler Warner on 24 May 2018

Excellent response. Very insightful!

Sign in to comment.

Answer 3

Cedric on 3 Oct 2017

5
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/359410-is-it-possible-to-avoid-copy-on-write-behavior-in-functions-yet#answer_284195

Edited: Cedric on 5 Oct 2017

Open in MATLAB Online

I agree with most of what is said in the comments/answers. Yet, if you really needed to avoid copies for good reasons in a context far more complex and/or specific than the example that you give, you could create a handle class and always work on a single copy of whatever you pass to functions/methods.

Again, there is no point in doing this for simple data structures unless you have proven that you cannot afford the copy-on-write, so don't jump on this solution if you don't fully understand what you are doing.

Yet, skimming the history of your questions, I think that you know what you are doing and that people reacted to your comment about "not using functions and getting messy code for avoiding copy-on-write" a bit too quickly .. but you have to admit that in most cases this is almost a heretical statement/approach ;-)

Anyhow, assuming that you need this for valid reasons, here is an example:

 classdef VeryVeryLargeArray < handle
    properties
        array
    end
    methods
        function obj = VeryVeryLargeArray( builder, varargin )
            obj.array = builder( varargin{:} ) ;
        end
        % Possibly some overload of e.g. SUBSREF/SUBSASGN/SIZE and operators.
    end    
 end

Using it for building e.g. a 5GB random array (so you can see something in the task manager):

 >> n = floor( sqrt( 5e9/8 )) ;
 >> vvla = VeryVeryLargeArray( @rand, n ) ;

you see a 5GB jump in the memory usage. Now if you call a function e.g. setRow :

 function setRow( vvla, rowId, value )
    vvla.array(rowId,:) = value ;
 end

after having set a break point on the 3rd line with end:

setRow( vvla, 1, 0 ) ;

you won't see a second jump due to a copy-on-write and your array will have been updated (even in the base workspace, because handles work "a bit like pointers").

EDIT 10/4 @ 12:41UTC: I am just giving you a quick example of overload of SUBSREF in case you wanted to transfer block indexing of the object(s) to the internal array(s):

        function out = subsref( obj, S )
            if S(1).type(1) ~= '.'
                out = subsref( obj.array, S ) ;
            else
                out = builtin( 'subsref', obj, S ) ;
            end
        end

This method could be added after my comment in the methods block of the class definition. The same would have to be done for SUBSASGN and possibly SIZE. The advantage is that most functions could operate on the object the way they operate on any numeric array:

 >> vvla(2:4, 10:13)
 ans =
    0.5108    0.1707    0.3188    0.3955
    0.8176    0.2277    0.4242    0.3674
    0.7948    0.4357    0.5079    0.9880

This accesses vvla.array(2:4,10:13) and has the advantage to make the internal structure transparent to the user (at least for what is managed by SUBSREF).

Note that testing S(1).type(1)~='.' (and not just S(1).type(1)=='(') allows to transfer any () or {} indexing to the array property, so you can use builders of cell arrays:

vvlca = VeryVeryLargeArray( @cell, 4, 5 ) ;

BUT you cannot easily (or at all) manage properly CSL outputs (especially when you want to nest these objects), so there is a limit to what you can achieve with overloading indexing methods. [If you try, you will likely spend hours wondering why nargout is defined through a call to your overloaded NUMEL and not to the builtin, and trying to find workarounds.]

EDIT 10/5 @ 12:32UTC: As mentioned, you can overload specific operations or functions that are relevant to the use that you make of these arrays. If you want to be able to use DIFF transparently for example:

        function df = diff( obj, varargin )
            df = diff( obj.array, varargin{:} ) ;
        end

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Is it possible to avoid copy-on-write behavior in functions yet?

12 Comments
Show 10 older commentsHide 10 older comments

Accepted Answer

1 Comment
Show -1 older commentsHide -1 older comments

More Answers (2)

1 Comment
Show -1 older commentsHide -1 older comments

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Community Treasure Hunt

Is it possible to avoid copy-on-write behavior in functions yet?

12 Comments Show 10 older commentsHide 10 older comments

Accepted Answer

1 Comment Show -1 older commentsHide -1 older comments

More Answers (2)

1 Comment Show -1 older commentsHide -1 older comments

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Community Treasure Hunt

12 Comments
Show 10 older commentsHide 10 older comments

1 Comment
Show -1 older commentsHide -1 older comments

1 Comment
Show -1 older commentsHide -1 older comments

0 Comments
Show -2 older commentsHide -2 older comments