4:51 - Thursday, 24 April 2014

Linux: Compare Directory Structure Without Comparing Files

#Topics: differences in directories without file compare,compare directory structure linux

What is the best and simplest way to compare two directory structures without actually comparing the data in files. This works fine:

diff -qr dir1 dir2

But it’s really slow because it’s comparing files too. Is there a switch for diff or another simple cli tool to do this?

The following (if you substitute the first directory for directory1 and the second for directory2) should do what you’re looking for and swiftly:

find directory1 -type d -printf "%Pn" | sort > file1find directory2 -type d -printf "%Pn" | sort | diff - file1

The fundamental principle is that it prints out all of the directories including subdirectory paths relative to the base directoryN directories.

This could fall down (produce wierd output) if you have carriage returns in some of the directory names but not others.

vimdiff <(cd dir1; find . | sort) <(cd dir2; find . | sort)

will give you a nice side-by-side display of the two directory hierarchies with any common sections folded.

This is optimum solution

diff --brief -r dir1 dir2

–brief switch reports only whether the files differ, not the details of the difference.

I usually use rsync for this task:

rsync -nav --delete DIR1/ DIR2

BE VERY CAREFUL to always use the -n, aka --dry-run, option, or it will synchronize (change the contents of) the directories.

This will compare files based on file modification times and sizes… I think that’s what you really want, or at least you don’t mind if it does that? I got the sense that you just want it to happen faster, not that you need it to ignore the difference between file contents. If you do want it to not list differing files with identical names, I think the addition of the --ignore-existing option will do that.

Also be aware that not putting a / at the end of DIR1 will cause it to compare the directory DIR1 with the contents of DIR2.

The output ends up being a bit verbose, but it will show you which files/directories differ. Files/directories present in DIR2 and not in DIR1 will be prefaced with the word deleting.

For some situations, @slartibartfast’s answer may be more appropriate, though you’ll need to remove the -type d option to enable the listing of non-directory files. rsync will be faster if you’ve got a significant number of files/directories to compare.

ls > dir1.txtls > dir2.txt

Then just diff the two lists.

I was just looking for solution for this problem. The solution that I liked the most was:

comm <(ls DIR1) <(ls DIR2)

It gives you 3 columns: 1 – files only in DIR1, 2 – files only in DIR2, 3 – files only in DIR3
For more details look at this blog post.

use “diff -qr” to get the different files and then filter out the file comparison with grep in order to only get the filenames that are only in one of the directories.

diff -qr dir1 dir2 | grep -v "Files.*differ" 

Share

Advertisement

Comment