The Linux Page

Copying large directories between computers

Twin Windows — perfect duplication

Since I'm moving to a new server, I need to copy all of Gb or Gb of data from my old computer to the new one. This is mainly three folders:

/home
/mnt/cvs
/var/www

There are two main problems here:

1. the files have various permissions and ownership which I do not want to lose, especially for websites and the cvs repositories (I still have a CVS, but that folder also include SVN and GIT repositories)

2. the files on the source computer require various permissions to be read, "namely", I have to be root to make sure I can read all those files...

In many cases, to copy files accross multiple computers one uses a tool such as `scp`. In our case, though, we need something that works better. That is, a solution which (1) can allow for root to do the transfer, (2) does not lose ownerhip info, (3) can transfer the data without having to make a copy on my old computer (because it is not unlikely I would not have enough room to make full copy of my data since my drives are quite full already).

So scp is no good. It works recursively, but it totally ignores all ownership and it can't be running as root (unless you less your root user use ssh and then copy the files in your root account. I guess that's okay on a set of local computers).

A solution is to use that good old tar tool. You can tar across an SSH tunnel and save the output on the other size or even immediately extract on the other side. I have one problem though, I'm 99.9% sure that I don't want to extract immediately because I am pretty sure that I have users and groups on my old server that do not yet exist on the new server. So I want to make sure the extraction works right and for that I need to get the whole file and extract those names. Then I'll be able to verify that all exist on my new computer.

So far, my home folder is some 400Gb and my www folder is over 15Gb... (still going) So as you can imagine, it's taking time to transfer everything.

Here is the command line I used:

sudo tar cjf - /var/www/ | ssh destination 'cat > www-data.tar.bz2'

As we can see, I use sudo to run tar. Since there is a pipe in between, ssh is not affected by sudo.

tar is ask to send its output in a file (f) and that file is stdout (-). That's what gets piped. It also compresses everything using bz2 (j option) so the transfer is a bit faster, although that kills one of my processors on my source computer... (I have only 4 here!)

The ssh command creates the shell tunnel and then execute the command shown between the quotes. Here I want to save the data to a tar file so I just use cat to send the output to a file that corresponds to the input. On my new computer, I have a ton of space (i.e. I'm going from 2Tb to 22Tb so I think I'll be fine for a little while, but I'll be doing videos, so it will fill up fast anyway.)

Note that you can check the file on the destination using another ssh or directly on the destination with a simple

ls -l www-data.tar.bz2

to see it growing.

If you'd like to extract immediately (i.e. you know that you can do that) then use the tar tool on the other side too:

... ssh 'tar xf -'

You may want to look at additional options such as the -C <dir> and various preserve flags to make sure permissions and such are kept as expected.

Note that if you need to be root to create the tar, you will need to be root to extract it. That may be complicated in one go. So you may need a lot more space on the destination computer to get the compressed archive and then extract it to the right folder.