This series is written by a representative of the latter group, which is comprised mostly of what might be called "productivity users" (perhaps "tinkerly productivity users?"). Though my lack of training precludes me from writing code or improving anyone else's, I can, nonetheless, try and figure out creative ways of utilizing open source programs. And again, because of my lack of expertise, though I may be capable of deploying open source programs in creative ways, my modest technical acumen hinders me from utilizing those programs in what may be the most optimal ways. The open-source character, then, of this series, consists in my presentation to the community of open source users and programmers of my own crude and halting attempts at accomplishing computing tasks, in the hope that those who are more knowledgeable than me can offer advice, alternatives, and corrections. The desired end result is the discovery, through a communal process, of optimal and/or alternate ways of accomplishing the sorts of tasks that I and other open source productivity users need to perform.

Sunday, March 27, 2016

Miscellaneous Monday quickies: manipulating and excising pages from pdfs

So you've gotten, via inter-library loan, the pdf's you requested to aid you in researching the article you're writing. For purposes of putting them on your e-reading device, the form they're in is probably perfectly suitable. But what if you'd like to do other things with one or more of them, such as printing them out? There is quite a range of pdf manipulation tools that can help you put them in a form more congenial to such aims.

One that I recently discovered, for example, allows you to "n-up" your pdf document, i. e., to put more than one page per sheet of paper. Should you wish to print the document, this can help you lessen paper waste. The utility is called pdfnup, and the relevant command for accomplishing this end is pdfnup --nup 2x2 myfile1.pdf myfile2.pdf. Presumably one could use 4x4 in place of 2x2 to get four pages per sheet instead of two.

This utility gives results similar to psnup, a utility I have used (and previously witten about in this blog) in the past for making booklets comprised of multiple pages per sheet of paper, though pdfnup likely lacks the advanced collating options of psnup. But psnup involves greater complexity in that it operates on postscript files, which usually need to be converted to or from some other format.

Getting back to the task at hand, should you wish to print out any of your pdf's with the aim of minimizing paper waste, you may well wish to eliminate extraneous pages from your document. In my experience, for example, inter-library loan pdf documents routinely include one or more copyright notice pages. Before printing such documents, I almost always try to exclude those pages--simple enough if you send them directly from the printer from a pdf reader. But what if you're taking the additional step of n-upping multiple pages per sheet?

As it turns out, pdfnup is actually part of a larger pdf-manipulation suite called pdfjam. And that suite enables you to not only n-up your pdf document, but to eliminate extraneous pages as part of the same process. To give an example, if you have a fifteen page document wherein the first and last pages are copyright notices that you wish to exclude from your 2-upp'ed version, you'd use the command

pdfjam MyDoc.pdf '2-14' --nup 2x1 --landscape --outfile MyDoc_2-up.pdf.

The meaning of the various command switches will, I think, be obvious.

This is just a thin slice of the capabilities offered by just one suite of pdf manipulating tools available under GNU/Linux. I have used other tools such as pdfedit, pdftk, flpsed, to good effect as well.

LATER EDIT: I just discovered a new and interesting functionality of pdfjam; it can convert pdf's from a4 to letter format (and vice versa). The relevant command is pdfjam --paper letter --outfile out.pdf in.pdf

Monday, March 21, 2016

Addendum to the seventh installment: imagemagick as a resource for the budget-constrained researcher

In this installment, I'll cover concatenating multiple image files into a multi-page pdf--a very handy trick the imagemagick utility convert makes possible. But first, a bit of grousing on the subject of academia, budget-constrained researching, and academic publishing.

Pricing for on-line academic resources tends, not surprisingly, to be linked to budgetary allowances of large academic institutions: what institutions can afford to pay for electronic access to some journal or other, for example, will influence the fee that will be charged to anyone wishing to gain such access. If one is affiliated with such an institution--whether in an ongoing way such as by being a student, staff, or faculty member, or in a more ephemeral way, such as by physically paying a visit to one's local academic library--one typically need pay nothing at all for such access: the institution pays some annual fee that enables these users to utilize the electronic resource.

But for someone who has no long-term affiliation with such an institution and who may find it difficult to be physically present in its library, some sort of payment may be required. To give an example, while doing some research recently for an article I'm writing, I found several related articles I need to read. I should mention that, since I am still in the early stages of writing my article, there are bound to be several additional articles to which I will need ongoing access. I will address two journal articles in particular, both of which were published between twenty and thirty years ago.

I discovered that both articles were available through an on-line digital library. I was offered the option of downloading the articles at between twenty and forty dollars apiece. At that rate--since one of my articles' main topics has received fairly scant attention in modern times and I might need to review only another twenty or so articles--it could cost me well over six hundred dollars just to research and provide references for this topic. The time I spend actually writing and revising the article--the less tangible cost, which will exceed by a substantial amount the time spent researching--is an additional "expense" for producing the material.

There are, of course, ways to reduce the more tangible costs. Inter-library-loan often proves a valuable resource in this regard since even non-academic libraries who may lack subscriptions to academic publishers or digital libraries can nonetheless request journals or books containing relevant articles or, even better yet, obtain for their patrons such articles in electronic format--typically pdf files--these latter often having been created by scanning from paper-and-ink journals or books.

Some digital libraries even offer free--though quite limited--access to their materials. In researching my project I found three articles available from such a site. On registration at their site, they offered free on-line viewing, in a low-resolution scan, of just a couple of articles--those being made available for viewing for only a few days. Once the limited number of articles was reached, only at the end of those few days could another article be viewed. For purposes of the budget-constrained researcher, while this is a promising development, it's not an entirely practicable one.

Being able to view an article on a computer screen is a lot better than having no electronic access to it at all. But it also is of no help in those circumstances where one may be without an internet connection. Having the ability to save the article to an e-reader would be preferable and far more flexible than reading it, one page at a time, in a browser window. But the service seems calculated to preclude that option without payment of the twenty- to forty-dollar per article fee. It turns out, however, that sometimes ways around such restrictions can be discovered. And that, finally, is where the tools mentioned in the first paragraph of this article enter in. Thus, without further ado, on the the technical details.

Some digital libraries actually display, on each page of the article that appears as you go about reading it in a web browser window, a low-resolution image of the scanned page. As I discovered, one can right-click on that image and select to save it to the local drive. The file name may have, instead of a humanly-comprehensible name, just a long series of characters and/or numbers. And it may, as well, lack any file extension. But I discovered that the page images could, in my case, be saved as png files. Those png files, then, appropriately named so as cause them to retain their proper order, could then, using imagemagick tools, be concatenated into a multi-page pdf. That multi-page pdf can then be transferred to the reading device of choice. I found that, although the image quality is quite poor, it is nonetheless sufficient to allow deciphering of even such smaller fonts as one typically finds in footnotes. Although involving a bit of additional time and labor, using this tactic can yet further defray the budget-constrained researcher's more tangible costs.

For reasons that will become obvious, the image files should be saved to a directory empty of other png files. How the images are saved is essentially a numerical question and is dependent on the total number of pages in the article. If the total number of pages is in the single digits, it would be a simple matter of naming them, for example, 1.png, 2.png, 3.png, and so forth. If the number of pages reaches double digits--from ten through ninety nine, zeros must be introduced so that all file names begin with pairs of numbers; for example 00.png, 01.png, 02.png, and so forth. The same formula would hold for--God forbid, since the task would become quite tedious--articles with total pages reaching triple digits.

Provided imagemagick is already installed, once the saving is done, the very simple formula convert *.png full-article.pdf can be used to produce the pdf of concatenated image files. Since the files have numerical prefixes, the program will automatically concatenate them in the proper order.

In the next installment I will be covering manipulation of pdf's provided through inter-library loan services--focusing on removal of extraneous pages (e.g., copyright-notice pages) routinely included by those services.

Thursday, February 4, 2016

Miscellaneous Thursday quickies: what's your bi-directional syncing utility?

So I've been pursuing a research project for the last year or so and have been locating and saving material related to it, as well as doing some of my own writing in the area. I keep that material in a particular folder. That's all fine and good. The problem is that I want the ability to work on the project while I'm at any of 3 different computers--computers that are often located in 3 different locales, some of which are even remote from my LAN. So, how to host the same folder on all three machines, and keep it current with the most recent changes made on any of the 3 computers?

I intend for this to be a manual process, i.e., one that will involve me manually running some program or script on each of the three machines, in order to update the folder. I should also mention that I have access to a shell account where I can run a number of utilities that can facilitate this--so a 4th computer, technically speaking, is involved as well. I envision the shell account functioning as a sort of central hub for keeping said folders in sync: a sort of master copy of the folder can be stored there and each of the three machines can syncronize with that folder as need will arise.

I'm still trying to puzzle out how to pull all this together and am looking at the sorts of software/utilities that can accomplish the task. I've only tested out one option thus far--bsync. I opted for that in an initial foray for its simplicity: it's just a python script that enhances the functionality of rsync (a great sync utility, but one that does not do bi-directional synchronization). So all I needed to do was download the script and make it executable.

Using the utility, I was able to put the most current copy of the folder at my shell account by just running bsync MyFolder me@my.shellacct.com:MyFolder (the MyFolder directory must already exist at the remote address). So I've at least made a beginning.

That said, I'm still in the early stages of investigating approaches to do the sort of bi-directional synchronization I'm after. Tests with bsync have gone well so far but, if I'm understanding correctly, this utility does not deal well with sub-folders--which could be an issue in my use scenario; it seems bsync will work best on a folder or directory that contains only files, while my directory has a few sub-directories under it.

Other possible options I've found are csync (which uses smb or sftp), osync, bitpocket, and FreeFileSync. The first 3 of these are most attractive to me since they are command-line utilities. FreeFileSync is a graphical utility, though it does appear that it can be run from the command line as well. I should also mention unison, which I've looked at but not pursued--the reason being that it apparently requires that the same version be installed on all concerned machines, which is something that will be unrealistic in my case (Arch runs on 2 machines, an older Ubuntu on another, and BSD on the fourth).

So, what is your bi-directional synchronization software preference? Any further tips or pointers to add on accomplishing this task?

Tuesday, January 12, 2016

Addendum to 11th installment: Lynx; scraping credentialed web pages

Sort of a dramatized headline for what I've accomplished using the command-line Lynx  browser, but not too far from the mark. I've described in previous entries how I've used lynx to accomplish similar goals of extracting target information from web pages, so this entry is a continuation along those same lines.

I recently signed up for a prepaid cellular plan touted as being free, though it is one limited to a certain (unreasonably low, for most) number of minutes per month. The plan has thus far worked well for me. The only real issue I have come across is that I had not yet discovered any way easily to check how many minutes I've used and how many are left. The company providing the service is, of course, not very forthcoming with that sort of information: they have a vested interest in getting you to use up your free minutes, hoping thereby that you'll realize you should buy a paid plan from them, one that includes more minutes. The only way I'd found for checking current usage status is to log in to their web site and click around til you reach a page showing that data.

Of course I am generally aware of the phenomemon of web-page scraping and also have heard of python and/or perl scripts that can perform more or less automated interactions with web pages (youtube-dl being one example). So I initally thought my task would require something along these lines--quite the tall order for someone such as myself, knowing next to nothing about programming in either python or perl. But then I ran across promising information that led me to believe I might well be able to accomplish this task using the tried and true lynx browser, and some experimentation proved that this would, indeed, allow me to realize my goal.

The information I discovered came from this page. There is found a description of how it is possible to record to a log file all keystrokes entered into a particular lynx browsing session--something reminiscent of the way I used to create macros under Microsoft Word when I was using that software years ago. The generated log file can then, in turn, be fed to a subsequent lynx session, effectively automating certain browsing tasks, such as logging into a site, navigating to, then printing (to a file, in my case) a page. Add a few other utilities like cron, sed, and mail, and I have a good recipe for getting the cellular information I need into an e-mail that gets delivered to my inbox on a regular basis.

The initial step was to create the log file. An example of the command issued is as follows:

lynx -cmd_log=/tmp/mysite.txt http://www.mysite.com.

That, of course, opens the URL specified in lynx. The next step is to enter such keystrokes are are necessary to get to the target page. In my case, I needed to press the down arrow key a few times to reach the login and password entry blanks. I then typed in the credentials, hit the down arrow again, then the "enter" key to submit the credentials. I then needed to hit the "end" key on the next page, which took me all the way to the bottom of that page, then the up arrow key a couple of times to get to the link leading to the target page. Once I got to the target page, I pressed the "p" key (for print), then the "enter" key (for print to file), at which point I was prompted for a file name. Once I'd entered the desired file name and pressed the "enter" key again, I hit the "q" key to exit lynx. In this way, I produced the log file I could then use for a future automated session at that same site. Subsequent testing using the command

lynx -cmd_script=mysite.txt http://www.mysite.com

confirmed that I had, in fact, a working log file that could be used for retreiving the desired content from the target page.

The additional steps for my scenario were to turn this into a cron job (no systemd silliness here!), use sed to strip out extraneous content from the beginning and end of the page I'd printed/retrieved, and to get the resulting material into the body of an e-mail that I would have sent to myself at given intervals. The sed/mail part of this goes something like

sed -n 24,32p filename | mail -s prepaid-status me@mymail.com*

* I can't go into particulars of the mail program here, but suffice to say at least that you need a properly edited configuration file for your mail sending utility (I use msmtp) for this to work.

Friday, April 17, 2015

Addendum to 12th installment: watermarks with copyright notice using LaTeX

So, it's been awhile. And there's been plenty I could have blogged about on the tech front. Like when I copied my Arch install to another hard drive, making it bootable. But I didn't. And now I've forgotten important details of how I did it. Oh well.

I can blog about this next item, though, which is still fresh in memory. I've got to write up  some articles and am sending them out for proofreading. So I wanted to mark them as drafts, something I already know how to do and have blogged about previously.

I decided to modify things a bit for the current task, though. This time I'm using only one utility to do the watermarking--LaTeX--and I'm tweaking things a bit further.

The challenge this time is making a watermark with a line break, as well as one that contains text with differing font sizes in the two lines. I want a really large font for the first line, which marks the document as a draft--as in my previous installment--but I want a really small font for the second line this time. That second line is where a copyright notice will be located.

Without further ado, here's the MWE (TeX-speak for minimum working example) I've come up with for accomplishing this:



This will place, diagonally across each page of the document, a notice in light gray font and in very large letters, with the word DRAFT. Underneath that, there will be text in much smaller font alerting that the material is under copyright claim of the document's author. It's also got a nice little feature that auto-inserts the year, so it's something that can be reused in varying documents over a period of time, relieving the composer of having to fiddle with minor details like dates.

So, that's about it for this installment!

LATE ADDITION: Just today I ran across a new means of watermarking that can be done on already-existing .pdf files. It involves using the program pdftk and is quite simple. You simply create an empty--except for your desired watermark--.pdf, then use the program to add the watermark to the already-existing .pdf. Something like the following:

pdftk in.pdf background back.pdf output out.pdf

(I ran across that here). I used LibreOffice Draw to create such a background watermark and easily added that to an existing .pdf. It worked great, though it should be noted that the watermark won't cover graphics; I assume there must be a way to make it do so, however.

Tuesday, November 18, 2014

Miscellaneous Tuesday quickies: creating and using ssh tunnels/proxies

This entry will concern tunneling so as to get around port-blocking restrictions. It's something I learned about some years ago, but had a difficult time wrapping my head around it. While I can't say I understand it a whole lot better now, I can at least say that I've been able to get it working.

In my case it was needed because I've been working in a library whose wifi network is set up to block a variety of non-standard ports. That's a problem for me since I run my (command-line) e-mail client on a home computer, and I connect to that computer via ssh--which, in turn, runs on a non-standard port. So, when I work in this library, I am unable to connect to my home machine to do e-mailing. There are also occasional problems with sites being blocked on this network (and, no, I'm not rying to view porn).

For this to work, one must have access to some machine outside the wifi network that runs ssh on port 443. I happen to have a shell account with a service that has just such a set-up.

In my case, then, I can set up the tunnel as follows:

ssh -L localhost:1234:my.dyndns.url:12345 -p 443 my-user@my.shellacct.net.

I am asked for my password, then logged into my shell account, and the tunnel is thus opened.

Then, to connect to ssh as running on my home machine, I simply issue

ssh -p 1234

To get around the occasional page blocking I've run into, I first downloaded a browser I will dedicate to this task--namely, qupzilla. Then, I need to set up a socks proxy, which is done via ssh, like so:

ssh -D 8080 my-user@my.shellacct.net -p 443

After that, it's a matter of configuring qupzilla (or your preferred browser) to route web traffic over the socks proxy you've just created. That's done by going to Edit > Preferences > Proxy Configuration, ticking the Manual configuration radio button, selecting socks5 from the drop-down menu, then entering localhost into the first field next to that and 8080 in the Port field. Click Apply and Ok, and qupzilla will be set to route its traffic over your proxy, thus avoiding the blocks instituted by the wifi network.

With this basic information, it should be clear how other sorts of ssh tunnels and/or proxies could be set up.

Friday, July 18, 2014

Miscellaneous Friday quickies: The Plop boot manager; what is it and why would you need it?

Prefatory remark: I am uncertain of the licensing status of the project discussed in the posting below, but I suspect it may not--unlike most of the other utilities I discuss in this blog--be open-source.

Unless you, like me, are stubbornly trying to repurpose aging hardware, this tool might not be of much interest to you. But it allowed me to get an older machine booting from USB when BIOS limitations were interfering, , so I decided to document here the fairly simple procedures I followed to accomplish this in case they might be of benefit to others.

How old was said machine? Well, old enough to not only have problems booting from USB flash drives (BIOS USB boot options were limited to USB floppies or ZIP disks), but to have a floppy drive in it as well! A single core machine, as you might guess, although the motherboard did at least have SATA headers--which made it a good candidate for the project I had in mind.

I learned, through some on-line research, about the Plop boot manager--touted for enabling systems to boot from USB even where BIOS settings limited it--and that floppy disk images of the boot manager are included in the download. So I dug up and dusted off a floppy, downloaded the image, and wrote it to the floppy the GNU/Linux way--using dd:

dd if=/path/to/plpbt.img of=/dev/fd0

And that disk did, in fact, allow me to boot sanely from a USB thumb drive I'd plugged into the system. On boot, a starfield simulation reminiscent of the old Star Trek intro (ok, I'm dating myself here) appeared on the screen, in the foreground of which was a boot menu from which I could select the medium I wished to boot. And, sure enough, USB was one of the items.

That wasn't quite all I needed for my own application, however; you see, my hope was to have this machine run headless. So, how to make the boot manager default to booting from the USB drive after a certain number of seconds?

For that, it turns out, I needed another program included in the download called plpbtcfg. That program is what allows one to modify the binary file plpbt.bin. And plpbt.bin needs to be accessed somehow as well in order to modify it--accomplished in my case by mounting plpbt.img as a looped file system.

So I ran mount -o loop /path/to/plpbt.img /mnt/loop. Once that image had been thus mounted, I cd'd to where I'd downloaded plpbtcfg and ran plpcfgbt cnt=on cntval=4 dbt=usb /mnt/loop/plpbt.bin: that gave the boot menu a four-second time count, after which the computer would automatically boot from USB. I rewrote, using dd again, that image, to the floppy. So, mission accomplished.

Except some other aspects of that machine's operation proved not very suitable to the application I was hoping to deploy it for, so I'm not sure it will finally be put into service. But that's another story . . .