You are now following this question
- You will see updates in your followed content feed.
- You may receive emails, depending on your communication preferences.
Why does dir command find files not following pattern in Windows?
25 views (last 30 days)
Show older comments
I have series of files that I am trying to pick up with a dir command. I'm able to get the files, but I'm also picking up one extra, only while running the command in Windows, and I'm curious what syntax change I need to make to exclude the extra file.
flist = dir([parentdirectory,'\outputfiles.dat.*']);
Using the above command in a linux box I get only 'outputfiles.dat.###' files. Using the command in windows I get 'output.dat.###' and 'output.dat'. I would like to not get the 'output.dat' file.
Working in 2016b
4 Comments
Stephen23
on 7 Apr 2021
Ignoring the second trailing period character seems like a bug to me, expecially if the behavior is different on different OSs.
Create a bug report: https://www.mathworks.com/support/bug_reports/faq.html
Walter Roberson
on 7 Apr 2021
Edited: Walter Roberson
on 7 Apr 2021
ignoring explicit dot was also the case for MacOS in that release, not changed until at least R2020b (I would need to check the release notes for exact version)
In the historical 8.3 file system, the dot was not stored; the name and extension were stored in fixed length fields. It was not possible to tell the difference between a file name that ended with period and no extension, vs the same base name with no period or extension. This made it necessary for matlab to treat abc.* (period expected) as matching abc (no period, empty extension.) MATLAB does not not probe the file system characteristics to determine whether dots have to be explicit or not. And remember MATLAB supports 8.3 names, so ABC (no dot no extension) is potentially the 8.3 name for ABC. (dot, no extension)
Should MATLAB react differently for cases where there are more than one dots in the file name? If so then if the user has T=23.dat and T=23.5.dat then you would handle .* differently for the two even though the user probably doesn't not intend them to be treated differently.
Hmmm, question: on NTFS and matlab would
*3.*
match T=23.dat but not T=23.5.dat ? Does .* match multiple extensions? I suspect that it does.
Stephen23
on 7 Apr 2021
Edited: Stephen23
on 7 Apr 2021
"on NTFS and matlab would 3. match T=23.dat but not T=23.5.dat ? "
>> dir 3.
'3.' not found.
>> dir 23.
'23.' not found.
>> dir 23*
23.5.dat 23.dat
>> dir 23.*
23.5.dat 23.dat
"Does .* match multiple extensions? I suspect that it does."
I see no reason why it shouldn't, the DIR documentation does not place any restriction on what the wildcard can match:
>> dir A.*
A.B A.B.001 A.B.002 A.B.003
This example made me laugh:
>> dir *3.
A.B.003
>> dir *3.*
23.5.dat 23.dat
There it is in a nutshell, ladies and gentlemen.
Walter Roberson
on 7 Apr 2021
Edited: Walter Roberson
on 7 Apr 2021
Darn editor converted my wildcard to bold :(
But it is clear from what you posted that * does match multiple extensions
Accepted Answer
dpb
on 7 Apr 2021
That's the behavior of the OS dir command under Windows; '*' matches anything including nothing.
Unfortunately, in typical MS style, using "???" doesn't work either, it appears. If I try something like
file.dat.001
file.dat.002
...
dir file.dat.?
or
dir file.dat.??
return nothing in that they exclude the files with three characters after the .dat, but
dir file.dat.???
behaves the same as does the *
Under CMD.EXE, I think you'll be forced to either change the naming convention to be sure you have some character after the second dot that you can be sure will match -- for the above file pattern, either
dir file.dat.0*
dir file.dat.0??
work as desired. Of course, with three places, the leading 0 limits you to 100 files maximum so for more than that you would need another placeholder.
The JPSoftware CMD replacement TCC (TakeCommand Console) that I use has many extended facilities above and beyond CMD; one is the ability to write
dir file.dat.[0-9]*
which requires a match of a digit in the first position after the second dot which solves the problem neatly. Of course, you have to have the JPSoft command replacement in order to take advantage of such features (as would anybody else who tried to use any code taking advantage of the feature, of course). Hence, while it's neat it's probably not a solution you care about.
I think you'll simply have to check that the returned extension from
[d,f,e]=fileparts(filename);
is not empty.
NB: fileparts isn't terribly robust; it will return '.002' as the extension from the above, but will return 'file.dat' as the name; it only looks for the last dot/period found in the string.
As a stylistic point and for some help in coding, I'd suggest using fileparts to build the fully qualified name instead of explicit string catenation--it has the nicety of not requiring you to insert the dividers between filename sections and also automagically uses the system-dependent character.
flist = dir(fullfile(parentdirectory,'outputfiles.dat.*'));
25 Comments
Stephen23
on 7 Apr 2021
Edited: Stephen23
on 7 Apr 2021
"That's the behavior of the OS dir command under Windows; '*' matches anything including nothing."
Sure, but that does not explain why the period character is ignored (which is ultimately what the question is about).
The DIR documentation makes no mention that it will ignore trailing period characters at its own volition, so it is very reasonable to expect that they will be interpreted literally. Seems like a bug to me.
It seems like on Windows trailing period characters are silently ignored:
>> dir('A.B.*.') % buggy: expect no files (trailing period ignored)
A.B A.B.001 A.B.002 A.B.003
>> dir('A.B.*') % buggy: why is A.B returned? (trailing period ignored)
A.B A.B.001 A.B.002 A.B.003
>> dir('A.B*') % okay
A.B A.B.001 A.B.002 A.B.003
>> dir('A.B.') % buggy: why is A.B returned? (trailing period ignored)
A.B
>> dir('A.B') % okay
A.B
>> dir('AB') % okay (internal period is NOT ignored)
'AB' not found.
But period characters within the format string are matched literally. I do not see this inconsistent period matching behavior documented anywhere.
dpb
on 7 Apr 2021
What do you expect should be different between
dir A.B.*.
and
dir A.B.*
with the existing files you show? There are no files with three dots, the * matches anything whether it is/isn't a dot and the OS won't create a file with a trailing dot and no characters following it.
Stephen23
on 7 Apr 2021
Edited: Stephen23
on 7 Apr 2021
"What do you expect should be different between dir A.B.*. and dir A.B.* with the existing files you show?"
I would expect the format A.B.*. to NOT return any files, because exactly as you wrote "There are no files with three dots", which is also exactly why I created that example. And yet DIR does return filenames which do NOT contain three period characters (hence supports my claim that this behavior is either undocumented or buggy).
As my examples show, DIR handles period characters inconsistently.
"the OS won't create a file with a trailing dot and no characters following it"
That is a restriction that might apply to some OSs, but it has zero influence on how I expect DIR formats to be interpreted, which is based on the DIR documentation. Even if the OS changed its rules overnight, I would still expect DIR to work in exactly the same way (i.e. literally match characters that are not described in the documentation as having a special meaning).
I do not expect a tool to decide what I really intended to ask was something different...
What would you expect those two examples to return? (specifically noting what you wrote, that "There are no files with three dots" ) ?
Walter Roberson
on 7 Apr 2021
"The MATLAB dir function is consistent with the Microsoft® Windows® operating system dir command in that both support short file names generated by DOS."
... but short names cannot tell the difference between A (no dot no extension) and A. (dot, no extension)
dpb
on 7 Apr 2021
I expected what happened -- but perhaps that's simply because was already aware of the way MS CMD.EXE DIR works with the asterisk wildcard and that you can't generate a filename with just "A." where the trailing dot is significant.
Whether it is ideal behavior is something else again...
I miss VAX VMS with the trailing semi-colon and the cycle number support...
One additional extension to my note regarding fileparts earlier -- it is NOT an OS API but Mathworks code and so if the filename character string passed to it ends in a dot such as 'A.B.dat.', then it will return the trailing dot as the extension; as noted it only looks for the last dot in the string and uses that position with no other error/condition checking.
Bob Thompson
on 7 Apr 2021
Thanks for the feedback, I definitely can pretend like I understand the technical aspects a lot better now.
Is there anyway to force MATLAB to look for an additional character as the extension? Ultimately, my problem boils down to whether the numeric file extension exists, but dir and the * wildcard seem to indicate, 'all files that have something, or nothing, in place of the *.'
I'm asking mostly as a bandaid, as I won't be able to upgrade my version of MATLAB even if the maybe bug gets fixed later.
Also, I'm weirdly giddy about getting to submit a bug report, probably because for once it might not actually be user error.
dpb
on 7 Apr 2021
Edited: dpb
on 7 Apr 2021
"...I expect DIR formats to be interpreted, ... based on the DIR documentation."
I have/had no such expectations that MathWorks will have done anything beyond what the underlying OS API does, quirks and all for such OS-dependent and OS-specific interface functions.
To me, it would seem even more confusing if
dir A.B.*.
and
!dir A.B.*.
returned different results as they would with your expectation.
It is why, in fact, that I have used the aforementioned JPSoftware CMD replacement toolset for thirty-some years in lieu of the default MS-supplied CLI; it has the features to deal with the warts Bill and company created. It is, of course, not a generally available solution for most so don't rely on it for the forum other than to occasionally make folks aware of it as it is of inestimable value if one uses command line tools to any extent at all with Windoes.
I don't know when TMW finally did start, it used to be that the "?" placeholder wildcard was interpreted literally and unusable with dir(); it wasn't until this exercise I had discovered it is now being supported.
I don't have any older copies of MATLAB with license activation at the moment so can't go backwards and see how far/long ago that may have been changed.
Bob Thompson
on 7 Apr 2021
Using ? is possible in 2016b, but attempting to couple ? and * to allow for different size numeric extensions returns the same limitations where no extension is still accepted.
dir(fullfile(directory,'output.dat.?*'));
dpb
on 7 Apr 2021
Edited: dpb
on 7 Apr 2021
Yes, I pointed out that fly in the ointment regarding the behavior of CMD.EXE DIR above.
That is clearly at least a quality of implementation gaff if not considered a bug by MS design documents, but has been behavior under MS OS'es for "since forever" so isn't likely going to change.
I still see only two generic solutions --
- change the naming scheme so can make a recognizable wildcard pattern work in both OS given their limitations/behavior, or
- keep as is and search for the empty extension in returned collection and eliminate it.
On the earlier comment, I would not hold out any hope whatsoever that MathWorks will consider the behavior a bug or even a "quality of implementation" shortcoming. The MATLAB DIR() function will eventually call the underlying OS API and return whatever it does for the given input search pattern. If the OS has "issues", they (The MathWorks) will not consider it their job to clean those up. I would venture their position would also match mine above where they would not want the same command passed to the system via the bang operator to return a different result than would the builtin dir() command/function. The latter opinion is, of course, just my opnion. :)
As noted in my earlier answer, the workaround looks to be to follow the call to dir() if running under Windows to parse the file names returned via fileparts() and ensure the returned extension does match if you cannot change the naming scheme as outlined above to one that the Windows file system (limited) wildcard matching will work with as desired.
Stephen23
on 7 Apr 2021
"I have/had no such expectations that MathWorks will have done anything beyond what the underlying OS API does,"
The "Compatibility Considerations" section indicates that MATLAB does its own post-processing of whatever the OS delivers (otherwise this would be dependent on a particular Linux version, whereas it is dependent on a particular MATLAB version).
"To me, it would seem even more confusing if... returned different results as they would with your expectation."
Disagree: if it is not in the documentation, it is a bug.
In any case, it appears that the solution is to switch to a more reputable OS:
fclose(fopen('test.txt.','wt')); % works on Linux!
fclose(fopen('test.txt','wt'));
dir('test.txt.*')
test.txt.
dir('test.txt*')
test.txt test.txt.
dpb
on 7 Apr 2021
"Disagree: if it is not in the documentation, it is a bug."
Or the documentation is just incomplete or not fully compliant with the implementation.
I'd be more than surprised but shocked if TMW were to consider this to be an implementation bug on their side.
Microsoft themselves say
Naming Conventions
The following fundamental rules enable applications to create and process valid names for files and directories, regardless of the file system:
Use a period to separate the base file name from the extension in the name of a directory or file.
Use a backslash (\) to separate the components of a path. The backslash divides the file name from the path to it, and one directory name from another directory name in a path. You cannot use a backslash in the name for the actual file or directory because it is a reserved character that separates the names into components.
...
Do not end a file or directory name with a space or a period. Although the underlying file system may support such names, the Windows shell and user interface does not. However, it is acceptable to specify a period as the first character of a name. For example, ".temp".
dpb
on 7 Apr 2021
Edited: dpb
on 7 Apr 2021
I had noticed the behavior change on Linux platforms earlier -- I haven't had/used one in 20+ years so what knowledge I had at one time is now almost fully gone; I presume the change actually is one to make dir() more in line with what the underlying OS does than it was before?
NB: also this creates another incompatibility between the two OS behavior under MATLAB; it would seem quite possible (likely?) the reason for the prior behavior on Linux may have been introduced deliberately by TMW in order to make the two more similar.
dpb
on 7 Apr 2021
Whenever you get into OS-specific behavior, "there be dragons!"
One can try to mask them and maybe have consistent behavior in most cases, but almost always one will find something of this sort if stray away from the very straight and narrow most common cases.
Walter Roberson
on 8 Apr 2021
Disagree: if it is not in the documentation, it is a bug.
But that Compatibility Considerations indicates explicitly that on Windows .* will match files with no extension. So it is in the documentation.
Stephen23
on 8 Apr 2021
"But that Compatibility Considerations indicates explicitly that on Windows .* will match files with no extension. "
In the current online documentation I do not see any explicit statement about the format ".*" on Windows:

The statement "This change of behavior does not apply to Microsoft Windows platforms." is ambiguous, because it relies of knowing what the existing behavior on Windows is, which is not actually stated (it might be implied, but that does not match your comment that it "indicates explicitly").
dpb
on 8 Apr 2021
Edited: dpb
on 8 Apr 2021
BobT, perhaps StephenC will write you a regexp pattern that you can apply to the result of dir() to exclude any w/o an extension in the returned .name field of the directory struct returned. I'm not enough of a regexp maven to want to give it a go; I always end up off in the weeds every time I try one...how's that for vounteering somebody else? :)
That way, however, you'd have one extra step that would be essentially a do-nothing on the one side but cleanup the other side a little more simply than my earlier suggestion as far as high-level code.
Perhaps there's an enhancement request to TMW for dir() to have it do such filtering in its packaging of results returned from the OS.
As my earlier comment alluded to, I have always figured there really isn't anything going on internally other than packing the results returned from the OS API into the retuned MATLAB stuct array, but haven't done enough digging to be able to know whether is or is not so.
Stephen23
on 8 Apr 2021
Perhaps something like this (it checks that the filename ends with ".B.#", for any #:
>> S = dir('A.B.*');
>> {S.name}
ans =
'A.B' 'A.B.001' 'A.B.002' 'A.B.003'
>> X = cellfun(@isempty,regexp({S.name},'\.B\..+$','once'));
>> S(X) = [];
>> {S.name}
ans =
'A.B.001' 'A.B.002' 'A.B.003'
dpb
on 8 Apr 2021
Interesting, Walter. Hadn't noticed that yet having just downloaded R2020b a couple of days ago. But the example in the description turns out to have real meaning here
str = ["String was introduced in R2016b."
"Pattern was added in R2020b."];
extract(str,pat)
ans =
2x1 string array
"R2016b"
"R2020b"
since Bob T noted in his original posting he's usring R2016b. Will be handy going forward, though, indeed.
Walter Roberson
on 8 Apr 2021
Edited: Walter Roberson
on 8 Apr 2021
In R2016b (but no earlier)
S = dir('A.B.*');
S = S(endsWith({S.name}, num2cell('0':'9'))
Stephen23
on 9 Apr 2021
Question: why are you both assuming that the OP's trailing characters are digits?
I just searched this thread, read every comment... I don't see that stated anywhere.
dpb
on 9 Apr 2021
Edited: dpb
on 9 Apr 2021
I used a sequential number for the final extension just for illustration in my first answer because it was easy and because the original post started off with "I have series of files..." so the choice while not necessarily the one the OP used, matched the problem description.
Later, one comment included "Ultimately, my problem boils down to whether the numeric file extension exists,..." and a second comment included "Using ? is possible in 2016b, but attempting to couple ? and * to allow for different size numeric extensions..." both which seem to confirm the assumption.
I would presume if he were using some other convention that by now it would have been illustrated or commented on what was/is different in his pattern-matching attempts.
But, anything more specific hasn't been supplied, agreed; it is still possible some other sequencing convention could be in play.
Stephen23
on 9 Apr 2021
Edited: Stephen23
on 9 Apr 2021
@dpb: thank you for the explanation! I got the feeling that I could not see something obvious that everyone else could see: temporary forum-blindness, perhaps.
This is useful information, it could be applied to make the regular expression more targeted.
dpb
on 9 Apr 2021
.." temporary forum-blindness, ..."
Easy enough -- the interaction was pretty prolonged, albeit interesting and some constrasting viewpoints. :)
Wonder what OP finally ended up with for a solution...
More Answers (0)
See Also
Categories
Find more on File Operations in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!An Error Occurred
Unable to complete the action because of changes made to the page. Reload the page to see its updated state.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)