This series is written by a representative of the latter group, which is comprised mostly of what might be called "productivity users" (perhaps "tinkerly productivity users?"). Though my lack of training precludes me from writing code or improving anyone else's, I can, nonetheless, try and figure out creative ways of utilizing open source programs. And again, because of my lack of expertise, though I may be capable of deploying open source programs in creative ways, my modest technical acumen hinders me from utilizing those programs in what may be the most optimal ways. The open-source character, then, of this series, consists in my presentation to the community of open source users and programmers of my own crude and halting attempts at accomplishing computing tasks, in the hope that those who are more knowledgeable than me can offer advice, alternatives, and corrections. The desired end result is the discovery, through a communal process, of optimal and/or alternate ways of accomplishing the sorts of tasks that I and other open source productivity users need to perform.

Thursday, January 10, 2013

10th installment: resume an scp file transfer

NOTE: as a knowledgeable commenter later pointed out "[c]urrently (and since around 2004) the default transfer protocol in rsync *IS* ssh. There is no need for the '-e ssh' unless you directly connect to a remote rsync daemon." I have tested this claim and it is, indeed, true that the file transfer resumption using rsync does not require the -e ssh bit I stipulated in the instructions below. I did not manage to test whether the same alternate port switch (-p 1234) works, though I assume it does.

I recently went on vacation and, since my mythtv set-up was, for some crazy reason, not allowing me to do a direct download of recorded programming through the mythweb interface, I needed to find an alternate way of snagging those files. I have an ssh server running on my home LAN, so using scp for this seemed like it should work, though I knew it would take a bit of tinkering. Read on to see what sort of tinkering I did and, just as importantly, a way I discovered of resuming the disrupted download.

I first investigated the possibility of setting up an ssh tunnel, since the computer on my LAN that contains the video files is not the one running the ssh server. But doing that looked a tad beyond my skill level. So I decided I'd just copy those files over to the computer running the ssh server manually, then scp them to the remote computer from there.

These were very large files--i.e., > 2 GB--and, given the fairly limited rate at which I could transfer them, I expected there might be some disruption or disconnection during the download. Prior to beginning the dowloads, then, I searched google under "scp" and "resume," and I immediately came across results that showed how to use the rsync utility to resume disrupted downloads. This encouraged me to go ahead and try the scp download method.

As wikipedia informs us, "rsync is a software application and network protocol for Unix-like systems . . . that synchronizes files and directories from one location to another while minimizing data transfer." Though I had, when previously considering differing ways to back up certain directories on my computers, looked at some documentation on rsync, I had no prior experience with actually using it. Nonetheless, that's the solution I ended up employing--though I needed to do a slight adaptation for my circumstances. It seemed the slight variation I stumbled upon might warrant an entry on this blog.

Before describing in greater detail what I did, I should first at least mention a couple of other results I found that used differing utilities. One candidate used curl and sftp instead of rsync, while the other used the dd command. Since I did not attempt to implement either of those solutions, I will, after simply making note of the fact that those utilities apparently can be used for this, move on.

Getting back to rsync, the bulk of instructions I found for resuming scp transfers using it would not work "out of the box" for me because I run ssh on a non-standard port. For purposes of this blog entry, let's say that's port 1234. The question for me, then, was how to adapt the directions I'd found to the scenario involving the non-standard ssh port my LAN uses.

The resolution turned out to be fairly simple. I finally ran across the an incantation very close to what I needed here. A simplified sample entry follows (a slightly more complex rendition can be seen in the description for setting up an alias below):

rsync -P -e 'ssh -p 1234' localfile.mpg

Essentially, the command tells rsync to use ssh as the shell on remote end (the -e switch), while the -P switch tells it two things: that it should display the progress of the transfer, as well as that it only needs to do a partial transfer of the file. What falls between the inverted commas are the options that get passed to ssh--in this case -p 1234 stipulating the port to connect to on the remote end.

To simplify yet further this resuming process, an alias could, theoretically, be created as suggested here. That would would not work in my case, however, since aliases appear not to allow the passing of special options to ssh: entering alias scpresume='rsync -Pazhv -e ssh -p 1234' at the command line caused the port specification to be received as an option by rsync--an option it was unable to interpret. Thus, the more permanent solution of adding that line to .bashrc  would not work for me either.

To make the alias work for me, I had to set up an ~/.ssh/config file with the following content (as discussed here):

After doing that, I was able to create the alias as alias scpresume='rsync -Pazhv -e ssh (note that the alternate port now does not need to be specified since it's entered into your ~/.ssh/config file). It can now simply be run as scpresume remote:path-to/remotefile localfile. Once your ~/.ssh/config file is set up, the scpresume='rsync -Pazhv -e ssh line is what needs to be entered into your .bashrc to make scpresume a permanent part of your command-line environment.

Feel free to offer any improvements you may have or other suggestions for using alternate command-line utilities for resuming downloads. This method did the trick for me, and I was able to resume transfer of files that had petered out at varying points during the download process, but of course there could be other methods that are in some way superior.

ADDENDUM: In light of the comment of a knowledgeable reader, the correct full, non-redundant command to use for file transfer resumption using rsync would be
rsync -Pazhv -p 1234 localfile.mpg

1 comment:

  1. Currently (and since around 2004) the default transfer protocol in rsync *IS* ssh. There is no need for the '-e ssh' unless you directly connect to a remote rsync daemon. In that case rsync will use plain tcp. As seen by your code above you are not doing that though so this is a non-issue for you code and you can drop the '-e ssh' as it is default. Leave the '.ssh/config' though as that would still be needed to instruct ssh how to connect to that host.

    As per 'man rsync':
    There are two different ways for rsync to contact a remote system: using a remote-shell program as the transport (such as
    ssh or rsh) or contacting an rsync daemon directly via TCP. The remote-shell transport is used whenever the source or
    destination path contains a single colon (:) separator after a host specification. Contacting an rsync daemon directly
    happens when the source or destination path contains a double colon (::) separator after a host specification, OR when an
    rsync:// URL is specified (see also the “USING RSYNC-DAEMON FEATURES VIA A REMOTE-SHELL CONNECTION” section for an excep-
    tion to this latter rule).