Backup using rsync – part 1

Problem to solve

I have access to a friend’s server and am backing up some of his files, he has access to my server and is backing up some of my files. We have a VPN between each other’s servers and use samba to share specific files and folders. Our connections are good – to UK standards – however, that’s not gigabit fibre kind of speed. We also have several disconnections a day due to various reasons: small ISP disconnections, us playing with the server (some reboot or whatever) and anything else you can imagine – these are not production servers. Additionally my friend’s broadband connection has ‘fair usage policies’ depending on the time of the day.

In short, we have relatively poor connectivity and need a simple, yet reliable, way to exchange files between servers.

rsync is supposed to be the obvious tool for that, yet its flexibility makes it complex to use at times so here is what we have done.

The solution – we thought

rsync command

rsync -avP --stop-at=23:00 --delete --log-file=/home/username/log/backup_rsync.log --bwlimit=1000 -b --backup-dir=/media/username/5TB/somename-backupdir/ /mnt/somename-bla/ /media/username/5TB/rsync-backup-dir
  • avP
    • a: archive – this will ensure some file attributes such as timestamps and permissions are preserved
    • v: verbose – because I like seeing things
    • P: to combine –partial and–progress. Meaning it will restart an interrupted file download where it was interrupted instead of restarting from scrash and it will show the progress of each file download.
  • –stop-at: this tells rsync to run until the specified time of the day. Check the manual if you want to use this parameter in different ways.
  • –delete: if a file is deleted on the remote server, delete it locally. However, see below, we are going to move those deleted files to a backup directory instead of just deleting them.
  • log-file: where rsync is going to write its logs
  • –bwlimit: to tell rsync to limit the bandwith it uses.
  • -b: backup – To use in combination with backup-dir. If a file changes or is deleted from the remote server then rsync won’t delete them but instead, it will move them to the backup-dir. This, for my usage, is an improvement to –delete as I’d like to have a backup of the files deleted on the remote server. I will need to manually go clean up the backup-dir though or script something.
  • backup-dir: the local directory where rsync will move/store files that changed or were deleted on the remote server.
  • /mnt/somename-bla: directory to backup. In my case, that’s a ‘local’ directory as it is a mounted samba share.
  • /media/username/5TB/rsync-backup-dir: the local directory where rsync will copy the files from the remote server.

flock

Running the rsync command once is not a solution, we need to automate that and to do so we will add it to crontab and run it every 5 minutes, at least for the first few days until the first mirror is complete.Why? rsync might die for any unexpected reason and we need to make sure if that happens there is no need to manually check and that it restarts automatically.

However, we don’t want two instances of rsync running concurrently so we need to avoid that each time a cron job is started and that’s where flock comes to the rescue as it does the job beautifuly. The new command line is:

flock --verbose -n /tmp/a_name-rsync -c "rsync -avP --stop-at=23:00 --delete --log-file=/home/username/log/backup_rsync.log --bwlimit=1000 -b --backup-dir=/media/username/5TB/somename-backupdir/ /mnt/somename-bla/ /media/username/5TB/rsync-backup-dir"

Where /tmp/a_name-rsync is the path to a lock file of your choice.

Wrapping up

Because the remote server has fair use policy during 7am to 23:00 and proper unlimited usage at night I run two variations of the command, one from 7am to 23:00 and one from 23:00 to 7am. To achieve that we simply need to add two entries in crontab:

*/5 7-22 * * * /usr/bin/flock --verbose -n /tmp/a_name-rsync -c "/usr/bin/rsync -avP --stop-at=23:00 --delete --log-file=/home/username/log/backup_rsync.log --bwlimit=1000 -b --backup-dir=/media/username/5TB/somename-backupdir/ /mnt/somename-bla/ /media/username/5TB/rsync-backup-dir" >>/home/username/log/mylogfilename_rsync.log 2>&1
*/5 23,0-6 * * * /usr/bin/flock --verbose -n /tmp/a_name-rsync -c "/usr/bin/rsync -avP --stop-at=07:00 --delete --log-file=/home/username/log/backup_rsync.log -b --backup-dir=/media/username/5TB/somename-backupdir/ /mnt/somename-bla/ /media/username/5TB/rsync-backup-dir" >>/home/username/log/mylogfilename_rsync.log 2>&1

The first line runs the command  from 7am to 22:55 with a bandwith limit of 1k. The second one runs the command from 23:00 till 6:55am with no bandwith limit.

Lessons learned and problems

  1. if the rsync client is on mac OS, make sure to use brew to get the latest version of rsync as the one installed on mac is real old and missing many options. (2.6.9 on mac default install vs 3.1.3 with brew at the time this post is written).
  2. if you mount a samba share you may want to add the parameter iocharset=utf8 as it seems that mount uses some legacy charset by default.
  3. We know that the solution doesn’t work that well, we had over 15% of files in error that were re-downloaded by rsync (files that you can find in the –backup-dir folder).
  4. More importantly, we ran a checksum on the downloaded files and quite a few of them didn’t have the same checksum. That renders the solution useless. to have 15% of files being re-downloaded is bad enough, but to have files that are detected as OK by rsync where -in fact, they are not- is not part of a working solution. We need to be 100% sure that the files that rsync ‘think’ are OK are actually OK.

Conclusion

Despite appealing at first, this is not a working solution. We suspect this is due to the samba layer, rsync is not able to work at its full potential. Parts 2 and 3 will be using rsync to rsync solutions.

Leave a Reply

Your email address will not be published. Required fields are marked *