finding simillar folders (size and number of files anf files name)

2 views (last 30 days)
In an specified folder there are some folders for example in these names:a,b,c,d,e,f,... .In each of these folders also there are some .xls files. Sometimes same .xls files are inside some of them. for example the same .xls files that are inside 'a', there are also in folder 'd' too. (exactly the same names and size and numbers of .xls files).
now its needed finding these similar folders that inside them are the same. and then renaming one of them to '(numbers of .xls file)' and deleting others similar. for example inside of 'a','f','e','w','f','c' are same and inside of these are the same 16 .xls files. it's needed deleting 'f','e','w','f','c'. then renaming 'a' to '(16)'
we could determine these similar folders from the size of them also and i think this is the simplest way (but not accurate because maybe exist 2 folder in same size but not same inside)
  2 Comments
Jan
Jan on 19 Sep 2011
What exactly is the size of a folder? Are you looking for similar or equal files? Would a checksum of the files or of all files insider a folder (and subfolders?) help?
mohammad
mohammad on 19 Sep 2011
I am looking for equal folders that all files inside a folder to be same (equal) with all files of an another folder.

Sign in to comment.

Answers (1)

Jan
Jan on 19 Sep 2011
Usually files are compared using checksums, e.g. FEX: CalcMD5.
[EDITED]: You can use FEX: DataHash for the struct replied by DIR:
aDir = dir(FolderName);
isFile = not([aDir.isdir]);
fileSize = [aDir(isFile).size];
Hash = DataHash(sort(fileSize));
Now the number of the files is considered also and the sizes of the different files. It would be more accurate to use the names and dates also:
aDir = dir(FolderName);
isFile = not([aDir.isdir]);
Hash = DataHash(aDir(isFile));
Now you can store a list of already occurred Hash values in a cell string and if any(strcmp(Hash, HashList)) is TRUE, delete the folder, if FALSE rename it.
  4 Comments
mohammad
mohammad on 19 Sep 2011
thanks, no i am not sure ;)
i told because speed of comparing these folders is very important for me and because of this i suggested that
thanks a lot
let me check your answer
Jan
Jan on 19 Sep 2011
In general validity is more important than speed. Creating a wrong result with a high speed will lead to troubles, creating a correct result slowly will increase the consumption of coffee.

Sign in to comment.

Categories

Find more on File Operations in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!