MATLAB's inefficient copy-on-write implementation

Question

Kenneth Johnson on 31 Jan 2022

1
Link

Direct link to this question

https://nl.mathworks.com/matlabcentral/answers/1640135-matlab-s-inefficient-copy-on-write-implementation

Commented: Paul on 2 Feb 2022

MATLAB's copy-on-write memory management seems to have a serious defect, which I think is the reason behind the abysmal performance of subsasgn overloading. (The same problem probably occurs with parenAssign in the new R2021b RedefinesParen class -- I haven't yet experimented with it.) Normally, an array assignment like b = a simply does a pointer copy; the array data is not copied until b is modified (e.g. b(1) = 1). Thereafter, subsequent modification of b (e.g. b(2) = 1) do not copy the full array; they just modify it in place as long as the reference count is 1. For example,

clear, a = zeros(1e8,1);
memory % 2764 MB used by MATLAB
b = a;
memory % 2764 MB
tic, b(1) = 1; toc, memory % 0.329099 seconds, 3540 MB
tic, b(2) = 1; toc, memory % 0.000123 seconds, 3541 MB

However, the benefit of copy-on-write is lost when the variable is changed in a function, e.g.

% test.m
function x = test(x)
x(1) = 1;

In this case, the x reference count is apparently incremented in test before the assignment is made, so this will always result in a full array copy. For example,

clear, a = zeros(1e8,1);
tic, a = test(a); toc % 0.337475 seconds
tic, a = test(a); toc % 0.310373 seconds

To see what's happening with copy-on-write, test.m is modified as follows:

function x = test(x)
memory
x(1) = 1;
memory
return

The array modification inside the function forces a full array copy, even though the original array is immediately discarded:

clear, a = zeros(1e8,1);
memory % 2748 MB
a = test(a); % 2748 MB, 3503 MB
memory % 2740 MB

I would think this problem could be easily avoided by treating any variable that appears as both an input and output argument in a function (e.g. function x = test(x)) as a reference variable, i.e. its reference count is not incremented on entering the function and is not decremented upon exiting. If the function is called with different input and output arguments, e.g. y = test(x), then the interpreter would implement this as y = x; y = test(y).

Is there any particular reason why MATLAB does not or cannot do this? There are many applications such as subasgn overloading that could see a big performance boost if this problem is fixed.

1 Comment
Show -1 older commentsHide -1 older comments

James Tursa on 31 Jan 2022

Slight point of confusing terms with your description. In the past, MATLAB has passed shared data copies of arguments to functions, not bumping up reference counts. Do you have evidence or know of documentation that shows a change in this behavior, and that now a bumped up reference count method is used for arguments? Why do you write that MATLAB uses this method?

Sign in to comment.

Sign in to answer this question.

Answer 1

James Tursa on 31 Jan 2022

1
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/1640135-matlab-s-inefficient-copy-on-write-implementation#answer_885830

Edited: James Tursa on 31 Jan 2022

Open in MATLAB Online

See Loren's Blog on this topic. Basically, to write functions that can modify a variable "inplace" you need to call that function from within another function and follow some syntax rules. Then you can avoid the deep data copy.

https://blogs.mathworks.com/loren/2007/03/22/in-place-operations-on-data/?s_tid=srchtitle_inplace_1

There is a subtle caveat to this. If the variable is already shared, then the function will be forced to make a deep copy regardless of how you call it or what syntax you use. And there are no official MATLAB functions that can tell you the sharing status of a variable ahead of time, so it can be hard to predict when a deep copy will be forced and when it will not be forced. E.g.,

X = rand(10); % X will not be shared with anything at this point
Y = 1:10; % Y will be shared with a background variable that is hidden from you

It is not obvious that the simple assignment for Y above should result in shared variables, but that is exactly what happens on later versions of MATLAB for certain sized variables (it will be a reference copy). In this case any attempt to modify Y inplace will result in a deep data copy first.

11 Comments
Show 9 older commentsHide 9 older comments

Walter Roberson on 2 Feb 2022

Open in MATLAB Online

Note that when you are using compound variables such as struct or cell array, then if you modify a value and are not doing in-place modifications, then a new data pointer needs to be created to hold the new value, and a new data descriptor needs to be generated for that, and any object that pointed to the old data descriptor needs to be updated, and a new data descriptor for that needs to be generated, and so on. But the pointers for "cousins" can remain the same

Descriptor D1 : cell 1 x 2, pr points to block B1
Block B1: contains pointer to Descriptor D2 and Descriptor D3
Descriptor D2: double 1 x 3, pr points to block B2
Descriptor D3: char 1 x 5, pr points to block B3
Block B2: contains numeric [pi, -2, sqrt(5)]
Block B3: contains 'hello'

In a situation that modifies cell location {1,1}(1,1) and returns the entire cell, the result would look like

Descriptor D4 : cell 1 x 2, pr points to block B4
Block B4: contains pointer to Descriptor D5 and Descriptor D3
Descriptor D4: double 1 x 3, pr points to block B5
Descriptor D3: char 1 x 5, pr points to block B3
Block B3: contains 'hello'
Block B5: contains numeric [7, -2, sqrt(5)]
%and some or all of the below might have been reclaimed
Descriptor D1 : cell 1 x 2, pr points to block B1
Block B1: contains pointer to Descriptor D2 and Descriptor D3
Block B2: contains numeric [pi, -2, sqrt(5)]

This is not a "deep copy". A "deep copy" would require that a new storage area for the 'hello' be generated.

James Lebak on 2 Feb 2022

@Paul, @James Tursa: I think your understanding for M-functions is correct. I accept the criticism that "pass by value" on the doc page is imprecise and I will pass that on to the appropriate people. Maybe "pass by value with lazy copy" would have been a better description, although as @Walter Roberson points out we do our best to keep the copies shallow for containers.

Paul on 2 Feb 2022

@James Lebak

Thanks for the response. Frankly, I don't see how "pass by value with lazy copy" adds any clarity to that portion of the doc page, which is specifically explaining how f1() works, where I don't see any kind of pass by value at all.

Regardless of what the specific wording should be, I appreciate your response and initiative to pass along the concern to the doc writers.

Sign in to comment.

Answer 2

Matt J on 31 Jan 2022

0
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/1640135-matlab-s-inefficient-copy-on-write-implementation#answer_885875

Edited: Matt J on 31 Jan 2022

Open in MATLAB Online

refwrap.m

(1) The variable must be allocated within a function.

A workaround to this rule is to wrap the data in a handle object:

a = 1:1e8;
tic,
 obj=refwrap(a); clear a
 testFn(obj); 
 a=obj.data;
toc %Elapsed time is 0.000460 seconds.
 
function testFn(obj)
obj.data(1) = 1;
end

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

MATLAB's inefficient copy-on-write implementation

1 Comment
Show -1 older commentsHide -1 older comments

Accepted Answer

11 Comments
Show 9 older commentsHide 9 older comments

More Answers (1)

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

MATLAB's inefficient copy-on-write implementation

1 Comment Show -1 older commentsHide -1 older comments

Accepted Answer

11 Comments Show 9 older commentsHide 9 older comments

More Answers (1)

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

1 Comment
Show -1 older commentsHide -1 older comments

11 Comments
Show 9 older commentsHide 9 older comments

0 Comments
Show -2 older commentsHide -2 older comments