Set and get structure field with invalid name

I'd like to get and set a structure with potentially invalid field names. Before digging into the mex myself I was wondering if there was already a tool that existed that performed these functions. I swear I've seen this around but my search skills are failing me. Finally, I assume this is possible with mex ....
Thanks, Jim

1 Comment

Clarification: I am trying to accomplish an ordered dictionary in Matlab that accepts and retrieves arbitrary field names. It turns out that the retrieval is relatively easy, you just need to use dynamic indexing. The setting is the difficult part, especially if it is for a structure that already exists. I am currently working on an implementation based off of: http://www.mathworks.com/matlabcentral/fileexchange/28516-renamefield
I'll post the solution when I'm done ...

Sign in to comment.

 Accepted Answer

I knew I had seen this somewhere before. Amro posted this link: http://pastebin.com/j69kQEur
Key point summaries:
  1. Retrieval is easy for invalid field names
  2. Setting can be done using mex
  3. In place modifications of a struct currently are not documented by mex, so some duplication needs to be performed.
  4. An initial version of my code can be found at: https://gist.github.com/JimHokanson/84141d0955a6a0eaed68516e3f69487a
  5. I personally have wrapped this in an object to make the call to setField transparent and only done when an in place call is not possible (i.e. when the field name is invalid)

4 Comments

I will look at it in more detail when I get some time to do so. At first glance I note that you are using shared data copies for the field element copies. This is not how MATLAB does this at the m-file level. MATLAB uses shared reference copies for this, not shared data copies. E.g., consider this mex routine that prints out the structure and data addresses:
// cell_prpi.c
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
mwSize i=0, n;
mxArray *mx;
if( nrhs ) {
if( mxIsCell(prhs[0]) ) {
n = mxGetNumberOfElements(prhs[0]);
for( i=0; i<n; i++ ) {
mx = mxGetCell(prhs[0],i);
mexPrintf("Structure address %d = %p\n",i,mx);
if( mx ) {
mexPrintf(" pr address = %p\n",mxGetData(mx));
mexPrintf(" pi address = %p\n",mxGetImagData(mx));
}
}
} else {
mx = prhs[0];
mexPrintf("Structure address %d = %p\n",i,mx);
mexPrintf(" pr address = %p\n",mxGetData(mx));
mexPrintf(" pi address = %p\n",mxGetImagData(mx));
}
}
}
And then run this at the command line:
>> mex cell_prpi.c
>> x = 1:5
x =
1 2 3 4 5
>> c = {x}
c =
[1x5 double]
>> c(2) = c(1)
c =
[1x5 double] [1x5 double]
>> cell_prpi(x)
Structure address 0 = 07E68440
pr address = 22EC4230
pi address = 00000000
>> cell_prpi(c)
Structure address 0 = 07E6D290
pr address = 22EC4230
pi address = 00000000
Structure address 1 = 07E6D290
pr address = 22EC4230
pi address = 00000000
You can see that the mxArray structure address of x, 07E68440, is different from the mxArray structure address of c{1}, 07E6D290, yet they both have the same pr data pointer value, 22EC4230. That tells you that c{1} is a shared data copy of x. But the mxArray structure address of c{2} is exactly the same as c{1}, 07E6D290. That tells you that c{2} is a shared reference copy of c{1}. This happened because of the way we created c{2} ... a direct assignment from another cell element. There is a field of the mxArray structure that keeps track of how many reference copies there are in existence. It is very memory efficient since no extra mxArray structure is needed for this second cell element (deep copies will only be created when necessary downstream). There are no official mex API functions that will allow you do to this ... you have to use the undocumented function mxCreateReference.
Now, the way you are currently doing it looks fine at first glance. So I am not advising you to change anything at this point. But I thought I would point out this difference between how you are doing things and how MATLAB would natively do things. Using shared data copies for all of the field element copies is more work and can cause a bit of memory bloat if the array size is large.
Some additional observations from just looking at the code (haven't compiled or used it yet):
---------------------------------------------------------------
1) It appears that this will leak memory in the else case (this is a real leak since the return address of mxArrayToString is not on the garbage collection list ... an oversight by TMW IMO):
if (field_exists){
mxFree(field_to_set);
}else{
field_list[n_existing_fields] = field_to_set;
}
I would instead just do this:
if (!field_exists){
field_list[n_existing_fields] = field_to_set;
}
and then move this down to your cleanup:
mxFree(field_to_set);
(Although this will still leak memory if any of your downstream allocations fail. Could be fixed by writing your own custom mxArrayToString function that used mxMalloc if you really wanted to go through the effort, or just live with the potential small leak)
---------------------------------------------------------------
2) For these comments:
//TODO: This should be checked, we aren't supporting
//anything greater than 1
I think you can easily expand the capability by wrapping your existing for-loops with another for-loop based on the result of mxGetNumberOfElements(S).
---------------------------------------------------------------
3) I am not sure what mxCreateSharedDataCopy does with a NULL input and if it is always safe for all MATLAB versions. You might consider replacing this:
mxSetFieldByNumber(plhs[0], 0, iField,
COPY_ARRAY(mxGetFieldByNumber(S, 0, iField)));
with something like this:
mxArray *mx;
:
if( mx = mxGetFieldByNumber(S, 0, iField) ) {
mxSetFieldByNumber(plhs[0], 0, iField, COPY_ARRAY(mx));
}
---------------------------------------------------------------
4) You can easily input the index (e.g., could make it an optional input) to replace the hard-coded 0 below with an index from the user:
mxSetFieldByNumber(plhs[0], 0, field_number_of_input, value_to_set);
And of course, if the index is greater than the number returned by mxGetNumberOfElements(S), you can account for that when you first create plhs[0].
---------------------------------------------------------------
5) Per my comment above, for field element copies you might consider making reference copies instead of shared data copies. E.g. change this:
if( mx = mxGetFieldByNumber(S, 0, iField) ) {
mxSetFieldByNumber(plhs[0], 0, iField, COPY_ARRAY(mx));
}
to this:
mxArray *mxCreateReference(const mxArray *); // Prototype
:
if( mx = mxGetFieldByNumber(S, 0, iField) ) {
mxSetFieldByNumber(plhs[0], 0, iField, mxCreateReference(mx));
}
---------------------------------------------------------------
Another way to make reference copies of all the field elements (works because all of the source mxArray's are already of type SUB_ELEMENT and removed from the garbage collection list):
mwSize n_existing_fields = mxGetNumberOfFields(S);
mwSize n_number_of_elements = mxGetNumberOfElements(S);
mwSize i, n = n_existing_fields * n_number_of_elements;
mxArray **mxr = (mxArray **) mxGetData(S);
:
mxArray **mxl = (mxArray **) mxGetData(plhs[0]);
:
for (i = 0; i < n; i++) {
if( *mxr ) {
*mxl = mxCreateReference(*mxr);
}
mxl++; mxr++;
}
Then for the one field element you want to replace, just mxDestroyArray the target location first (if necessary) and then set it using mxSetField or mxSetFieldByNumber. You need to use one of these two API functions rather than setting the address directly in order for MATLAB to properly change the mxArray type to SUB_ELEMENT and remove it from the garbage collection list.
Hi James,
Thanks for the feedback!
For point 3, what does a NULL input mean? Calling this from Matlab presumably that would not be possible? Or does a structure array have null values for unassigned elements? Where as presumably a single structure element wouldn't have extra fields?)
Since I've got something that sort of works right now I'm moving on to the next part of my project, but I plan on implementing your recommendations at a later point.
Thanks again, Jim
Point 3) When you first create a struct with mxCreateStructMatrix or mxCreateStructArray, all of the field elements will contain NULL values (unless the struct is empty and there are no field elements). I.e., every result of a mxGetField or mxGetFieldByNumber call will return a NULL value since that is what is physically in the data area of the struct ... nothing has been set yet. Same thing for adding a field with mxAddField ... those new extra field elements will physically have NULL in those spots.
You can also get this from the MATLAB m-code level. Anything that creates a struct without assigning field elements will physically have NULL values in those spots. E.g., if S doesn't exist, and you did this at the m-code level:
S(3).stuff = 5;
You didn't assign anything to S(1).stuff or S(2).stuff, so those spots are physically NULL in the mxArray data area. Now, if you access those spots at the m-code level you get an empty double result, i.e. using S(1).stuff is legal and it will be an empty double, but in fact this is something that MATLAB creates on the fly in those situations ... there is no empty double mxArray in those unassigned spots. You can easily check this by passing it to a mex routine and examining those spots.
The above NULL comments apply to cell arrays also, btw.

Sign in to comment.

More Answers (2)

James Tursa
James Tursa on 27 Jul 2016
Edited: James Tursa on 27 Jul 2016
I would start with this if you are trying to get stuff from a mat file:
Although I haven't looked at it in awhile to see if it still works for later versions of MATLAB.
If you are doing something else let me know and maybe we can customize it for your needs.
UPDATE:
I just did a quick check and the included savebadnames.c function does not work anymore. Apparently the matPutVariable API function now checks for invalid names. While this is probably a good thing, it also means I can't generate a bad mat file to use for checking the functionality of the main routine loadfixnames.c. I may have to write some custom C code to create the mat file from scratch without using the API functions. But that is not trivial, so it will not happen anytime soon ...
Hopefully the matGetNextVariable API function does not do this check. If it does then loadfixnames.c is toast ...

5 Comments

Hi James,
Thanks for the answer. I'm not sure where the mat file loading comes in, other than you already have (had?) code for handling it. I literally want to be able to do: s.('my awesome variable!') = 3 and have it work (or s = mexSetField(s,'my awesome variable!',3). For a couple of reasons I'd like this to be a struct. The closest thing I could find once I realized the main issue is modifying a struct in mex is Jan's http://www.mathworks.com/matlabcentral/fileexchange/28516-renamefield
I can work off of that. The structure copying that it does is rather sad, but presumably necessary :/
Thanks, Jim
Funny ... I thought you had a struct already with bad field names and needed to fix them ... not the other way around! That's where the mat file stuff comes in btw, because that is typically where these structs with bad field names come from. Good luck with Jan's code.
My apologies. I want to be able to retrieve and set these fields, without fixing the invalid name. I'm trying to make some code that will be as compatible across languages as possible, which means allowing arbitrary (nearly due to length limit :/) field names. I also want to keep things a struct (which has order, and also plays a bit nicer with some other code), rather than use containers.Map.
Some comments:
It sounds like you want to do the equivalent of the following with a mex routine:
Create a struct with arbitrary field names
Add arbitrary field names to an existing struct
Set an arbitrary field element to something, e.g.,
mystruct.('Bad Field Name$') = something
1) Creating a struct with arbitrary field names is easy in a mex routine, since the mxCreateStructArray API function apparently does not check for invalid field names, at least on the R2011b version I just spot checked. Not sure about later versions. If later versions do check for invalid field names, this becomes much more difficult and would probably require hacking into the mxArray. That in itself is very tricky because the method of storing field names in a mxArray is different depending on which version of MATLAB you are running.
2) Modifying an existing struct in a mex routine, either by adding a new field or by setting a field, has the problem of efficiency. The only way to take an input struct and modify it with official API functions is to first create a deep copy of it with mxDuplicateArray, and then do the modifications. If your field elements are large, then you get memory bloat and speed problems. The only way to avoid this is to use unofficial API functions to do what MATLAB does at the m-file level ... modify only the field element you want and make everything else a reference copy. This is memory efficient and fast, but you have no way to do this with official API functions. For a first cut at this, I would advise using the mxDuplicateArray approach just to get things working. Don't worry about the unofficial efficient approach unless you really really need it.
3) I haven't looked at Jan's code so I don't know what it does or how it does it. So I have no advice on using it. But writing the basic functionality described above in a mex routine would only take some 20-30 lines of code (plus argument checking). Let me know if you want me to post it or if you need more help.
Usage code is at the top of the linked file.
There's one change I'm planning on making in terms of making two for loops and avoiding created a shared copy of the value I am about to override (when the field exists). I should also probably check the length of the input name against the max length.
My intention is to wrap all this in a class with some subsref work and a try/catch for when the name is valid vs invalid (i.e. try to do the assignment, on failing trying the mex).
I'd be curious as to your feedback.

Sign in to comment.

Trying to fiddle around with invalid data is probably not going to be robust (or possible, if we've done our jobs well.)
If you're on Windows, why not try using a .NET OrderedDictionary from within MATLAB?
Q = System.Collections.Specialized.OrderedDictionary();
Q.Add('123 for all!', '2734')
Q.Item('does not exist') % returns []
Q.Item('123 for all!') % returns System.String('2734')
char(Q.Item('123 for all!')) % returns the char array '2734'
I imagine there's probably something similar you could do in Java if you need your code to run on a non-Windows platform.

1 Comment

Thanks for the suggestions. The problem with Java (and .NET) is all the extra work that goes on to convert the memory format from these languages into Matlab native entities. Also, I'm using a lot of code that is expecting a structure, not a .NET or Java class. That means I would constantly need to add in additional checks into other peoples code, or just convert to a structure.
I think it has been well accepted for a while now that writing invalid fields to Matlab structures is possible using mex. I've seen a fair bit of code that relies on this functionality, rather than checking the field names. After all, as I reminded myself while writing code related to this question, Matlab supports retrieval of fields with invalid names, just not writing them. Thus someone that writes invalid names in mex doesn't need to worry about retrieving the data in Matlab.
Given the use of dynamic field referencing in Matlab for over a decade, I'd think it would make more sense to support any string as a field name. Instead what I've seen way too often is ugly name mangling code to create a valid name for a field. I think the code and usage of code associated with name changes is way more harmful and confusing then just telling people they need to use parentheses with invalid field names.
Perhaps I'm missing something?

Sign in to comment.

Categories

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!