Field Notes of an Audacious Amateur: 2016

This series is written by a representative of the latter group, which is comprised mostly of what might be called "productivity users" (perhaps "tinkerly productivity users?"). Though my lack of training precludes me from writing code or improving anyone else's, I can, nonetheless, try and figure out creative ways of utilizing open source programs. And again, because of my lack of expertise, though I may be capable of deploying open source programs in creative ways, my modest technical acumen hinders me from utilizing those programs in what may be the most optimal ways. The open-source character, then, of this series, consists in my presentation to the community of open source users and programmers of my own crude and halting attempts at accomplishing computing tasks, in the hope that those who are more knowledgeable than me can offer advice, alternatives, and corrections. The desired end result is the discovery, through a communal process, of optimal and/or alternate ways of accomplishing the sorts of tasks that I and other open source productivity users need to perform.

Thursday, October 20, 2016

Discussion topic 1: vim or emacs for personal wiki, etc?

Instead of my more typical how-to, this posting will aim to solicit input from readers. I realize I may be inviting some sort of flame war, but rest assured that my intentions are sincere: I really am largely ignorant of the respective virtues and and flaws of the two utilities on which I want to solicit input, having barely dabbled in either. My hope is that I might get input here on which will be the more worthwhile one to put further effort into learning.

First, a bit about my aims. I set up for myself some time ago a personal wiki--a vehicle for keeping track of inspiring ideas, tasks on which I am now working, or will need at some point in the future, to work, and a receptacle for various tech tips I have employed and which I may need again in the future to use, but which I have difficulty remembering. I wanted the wiki to be accessible to me, not just at home, but from the internet as well. Much as I wanted to keep the wiki's set-up and maintenance simple, at the time I deemed that deploying it under a web-serving scenario would be required. To that end, I implemented the MoinMoin wiki on a machine I administered.

That scenario has worked out acceptably well over the last few years. But it is now time to take that machine out of service. So I will be needing to reconstitute my wiki and so am revisiting the matter of how I will set up and administer it.

Having a preference for simple, resource-frugal utilities, I am hoping I might migrate my wiki to some command-line interface. The overhead and complexity of the web server most wikis involve is not really justified for my use case: in fact, I might be engaging in a bit of hyperbole in claiming that I use what I have as a real wiki--it's used more like just an organizer.

Under my best-case envisioned scenario, I could either ssh into my machine to consult and/or modify my wiki, or perhaps even host it at a shell account to which I have access. It's an appealing thought and one I hope I will soon be able to implement.

So far as I can tell, the two candidate command-line tools I might use for this are vimwiki and emacs in org-mode. And I must admit that my experience with both has been very slight. In fact, I've tried to avoid using either vim or emacs, typically gravitating to nano for the sorts of needs either of those utilities might otherwise fulfill. Perhaps emacs will be slightly more preferable since development on the vimwikiplugin seems to have ceased a little over 4 years ago, while emacs org-mode seems to have a quite active and extensive user and development base.

Both utilities, with their arcane interfaces and keystroke options have left me baffled and even trapped on more than one occasion. Having a few years of command-line interaction under my belt, I did recently manage a bit of experimentation with emacs org-mode--at least enough to convince me that it could be a suitable new vehicle for my wiki.

I had pretty much written off vim as a possible vehicle since, in past attempts to utilize it, I have found it even more obtuse and intractable than emacs. But that situation recently changed somewhat when I realized that one of the best tools for doing some routine maintenance on one of my Arch systems employs vimdiff. Having used that a few times, I can now say that I've recently managed, under the guise of vimdiff, to use vim successfully for some system maintenance tasks.

And just today I learn that emacs has its own diff implementation--ediff--as well. So emacs might also be serviceable in the system-maintenance capacity, should I decide that it will be more worthwhile to try and better learn emacs org-mode.

Bottom line here is that it looks as though I am going to be using one or other of these utilities routinely, so it is time I started learning it better. And I can, at the same time, use whichever I will be learning better, as the new vehicle for my wiki.

So I am looking for guidance and recommendations on which is likely better to suit my needs and disposition--or whether I might even have overlooked some other command-line utility for creating an maintaining a personal wiki. I should state that I am unlikely ever to do any sort of programming, so whatever may be the relative advantages of either with respect to coding, will be largely irrelevant for me. Rather, I would be using them for perhaps some simple editing functions, and mostly for some routine maintenance tasks (comparing updated config files with files already on my system) and for managing my wiki.

Let the discussion begin.

Afterthought: perhaps even creating a markdown file containing my wiki's material, then converting that to html for viewing with elinks/lynx could even work? In other words, a sort of homebrew solution?

Saturday, June 18, 2016

Another addendum to the seventh installment: imagemagick as a resource for the budget-constrained researcher continued

Continuing on the theme of the last couple of entries, the budet-contrained researcher may own or wish to acquire a piece of hardware that can aid him in obtaining needed materials from research libraries. For example he may need but a single article or perhaps a chapter from a book. Or maybe a bibliography. Obtaining limited segments of larger works such as those mentioned may be more difficult through inter-library loan channels, especially in the case of works that may contain more than one item of potential interest. It can happen that the researcher will need to go in person to inspect the work to decide which part is actually required.

Suppose the researcher is already on the premises of the local academic library and has located target material. Should he not wish to check out a book, he is left with the option of himself scanning the material. Of course these libraries often have scanners that they make available to patrons, so that is one possible option. Yet another option is for the researcher to use his own scanner, and this is where highly portable hardware such as the Magic Wand portable scanner comes in.

I invested in one of these a few years ago and it has proved quite useful. One of the problems with using it, though, is that, for the bulk of books and journals (i.e., those of more standard size) it seems to work best to scan pages sideways--horizontally, rather than vertically. In other words, it works best to start from the spine and to scan toward page edges. This, obviously, entails that roughly every other page will have been scanned in a different orientation from the page preceding.

Once all pages are scanned, they can be easily rotated in bulk to the desired orientation--by 90 or 270 degrees, as the case may be--using imagemagick's mogrify switch, like so; mogrify -rotate 90 *.JPG (a command like convert -rotate 270 PTDC0001.JPG PTDC0001-270rotate.jpg would perform much the same function while preserving the original file). In my case, it seemed best to first copy all odd, the all even, image files to separate directories prior to rotating them.

At this point, I needed to name all files with either odd or even numbers. My bash scripting skills being modest at best, I began scouring the internet for a solution that would aid me in doing this sort of bulk renaming. I found such a script at http://www.linuxquestions.org/questions/programming-9/bash-script-for-renaming-files-all-odd-617011/ and a bit of testing proved it to be a solution that would work for me.

I modified the script into 2 variants and named one rename-all_odd.sh and rename-all_even.sh. The scripts look as follows:

and

It was then a simple matter of copying all the renamed files into a separate directory and concatenating them into a pdf, as was covered in a previous installment.

Sunday, March 27, 2016

Miscellaneous Monday quickies: manipulating and excising pages from pdfs

So you've gotten, via inter-library loan, the pdf's you requested to aid you in researching the article you're writing. For purposes of putting them on your e-reading device, the form they're in is probably perfectly suitable. But what if you'd like to do other things with one or more of them, such as printing them out? There is quite a range of pdf manipulation tools that can help you put them in a form more congenial to such aims.

One that I recently discovered, for example, allows you to "n-up" your pdf document, i. e., to put more than one page per sheet of paper. Should you wish to print the document, this can help you lessen paper waste. The utility is called pdfnup, and the relevant command for accomplishing this end is pdfnup --nup 2x2 myfile1.pdf myfile2.pdf. Presumably one could use 4x4 in place of 2x2 to get four pages per sheet instead of two.

This utility gives results similar to psnup, a utility I have used (and previously witten about in this blog) in the past for making booklets comprised of multiple pages per sheet of paper, though pdfnup likely lacks the advanced collating options of psnup. But psnup involves greater complexity in that it operates on postscript files, which usually need to be converted to or from some other format.

Getting back to the task at hand, should you wish to print out any of your pdf's with the aim of minimizing paper waste, you may well wish to eliminate extraneous pages from your document. In my experience, for example, inter-library loan pdf documents routinely include one or more copyright notice pages. Before printing such documents, I almost always try to exclude those pages--simple enough if you send them directly from the printer from a pdf reader. But what if you're taking the additional step of n-upping multiple pages per sheet?

As it turns out, pdfnup is actually part of a larger pdf-manipulation suite called pdfjam. And that suite enables you to not only n-up your pdf document, but to eliminate extraneous pages as part of the same process. To give an example, if you have a fifteen page document wherein the first and last pages are copyright notices that you wish to exclude from your 2-upp'ed version, you'd use the command

pdfjam MyDoc.pdf '2-14' --nup 2x1 --landscape --outfile MyDoc_2-up.pdf.

The meaning of the various command switches will, I think, be obvious.

This is just a thin slice of the capabilities offered by just one suite of pdf manipulating tools available under GNU/Linux. I have used other tools such as pdfedit, pdftk, flpsed, to good effect as well.

LATER EDIT: I just discovered a new and interesting functionality of pdfjam; it can convert pdf's from a4 to letter format (and vice versa). The relevant command is pdfjam --paper letter --outfile out.pdf in.pdf

Monday, March 21, 2016

Addendum to the seventh installment: imagemagick as a resource for the budget-constrained researcher

In this installment, I'll cover concatenating multiple image files into a multi-page pdf--a very handy trick the imagemagick utility convert makes possible. But first, a bit of grousing on the subject of academia, budget-constrained researching, and academic publishing.

Pricing for on-line academic resources tends, not surprisingly, to be linked to budgetary allowances of large academic institutions: what institutions can afford to pay for electronic access to some journal or other, for example, will influence the fee that will be charged to anyone wishing to gain such access. If one is affiliated with such an institution--whether in an ongoing way such as by being a student, staff, or faculty member, or in a more ephemeral way, such as by physically paying a visit to one's local academic library--one typically need pay nothing at all for such access: the institution pays some annual fee that enables these users to utilize the electronic resource.

But for someone who has no long-term affiliation with such an institution and who may find it difficult to be physically present in its library, some sort of payment may be required. To give an example, while doing some research recently for an article I'm writing, I found several related articles I need to read. I should mention that, since I am still in the early stages of writing my article, there are bound to be several additional articles to which I will need ongoing access. I will address two journal articles in particular, both of which were published between twenty and thirty years ago.

I discovered that both articles were available through an on-line digital library. I was offered the option of downloading the articles at between twenty and forty dollars apiece. At that rate--since one of my articles' main topics has received fairly scant attention in modern times and I might need to review only another twenty or so articles--it could cost me well over six hundred dollars just to research and provide references for this topic. The time I spend actually writing and revising the article--the less tangible cost, which will exceed by a substantial amount the time spent researching--is an additional "expense" for producing the material.

There are, of course, ways to reduce the more tangible costs. Inter-library-loan often proves a valuable resource in this regard since even non-academic libraries who may lack subscriptions to academic publishers or digital libraries can nonetheless request journals or books containing relevant articles or, even better yet, obtain for their patrons such articles in electronic format--typically pdf files--these latter often having been created by scanning from paper-and-ink journals or books.

Some digital libraries even offer free--though quite limited--access to their materials. In researching my project I found three articles available from such a site. On registration at their site, they offered free on-line viewing, in a low-resolution scan, of just a couple of articles--those being made available for viewing for only a few days. Once the limited number of articles was reached, only at the end of those few days could another article be viewed. For purposes of the budget-constrained researcher, while this is a promising development, it's not an entirely practicable one.

Being able to view an article on a computer screen is a lot better than having no electronic access to it at all. But it also is of no help in those circumstances where one may be without an internet connection. Having the ability to save the article to an e-reader would be preferable and far more flexible than reading it, one page at a time, in a browser window. But the service seems calculated to preclude that option without payment of the twenty- to forty-dollar per article fee. It turns out, however, that sometimes ways around such restrictions can be discovered. And that, finally, is where the tools mentioned in the first paragraph of this article enter in. Thus, without further ado, on the the technical details.

Some digital libraries actually display, on each page of the article that appears as you go about reading it in a web browser window, a low-resolution image of the scanned page. As I discovered, one can right-click on that image and select to save it to the local drive. The file name may have, instead of a humanly-comprehensible name, just a long series of characters and/or numbers. And it may, as well, lack any file extension. But I discovered that the page images could, in my case, be saved as png files. Those png files, then, appropriately named so as cause them to retain their proper order, could then, using imagemagick tools, be concatenated into a multi-page pdf. That multi-page pdf can then be transferred to the reading device of choice. I found that, although the image quality is quite poor, it is nonetheless sufficient to allow deciphering of even such smaller fonts as one typically finds in footnotes. Although involving a bit of additional time and labor, using this tactic can yet further defray the budget-constrained researcher's more tangible costs.

For reasons that will become obvious, the image files should be saved to a directory empty of other png files. How the images are saved is essentially a numerical question and is dependent on the total number of pages in the article. If the total number of pages is in the single digits, it would be a simple matter of naming them, for example, 1.png, 2.png, 3.png, and so forth. If the number of pages reaches double digits--from ten through ninety nine, zeros must be introduced so that all file names begin with pairs of numbers; for example 00.png, 01.png, 02.png, and so forth. The same formula would hold for--God forbid, since the task would become quite tedious--articles with total pages reaching triple digits.

Provided imagemagick is already installed, once the saving is done, the very simple formula convert *.png full-article.pdf can be used to produce the pdf of concatenated image files. Since the files have numerical prefixes, the program will automatically concatenate them in the proper order.

In the next installment I will be covering manipulation of pdf's provided through inter-library loan services--focusing on removal of extraneous pages (e.g., copyright-notice pages) routinely included by those services.

Thursday, February 4, 2016

Miscellaneous Thursday quickies: what's your bi-directional syncing utility?

So I've been pursuing a research project for the last year or so and have been locating and saving material related to it, as well as doing some of my own writing in the area. I keep that material in a particular folder. That's all fine and good. The problem is that I want the ability to work on the project while I'm at any of 3 different computers--computers that are often located in 3 different locales, some of which are even remote from my LAN. So, how to host the same folder on all three machines, and keep it current with the most recent changes made on any of the 3 computers?

I intend for this to be a manual process, i.e., one that will involve me manually running some program or script on each of the three machines, in order to update the folder. I should also mention that I have access to a shell account where I can run a number of utilities that can facilitate this--so a 4th computer, technically speaking, is involved as well. I envision the shell account functioning as a sort of central hub for keeping said folders in sync: a sort of master copy of the folder can be stored there and each of the three machines can syncronize with that folder as need will arise.

I'm still trying to puzzle out how to pull all this together and am looking at the sorts of software/utilities that can accomplish the task. I've only tested out one option thus far--bsync. I opted for that in an initial foray for its simplicity: it's just a python script that enhances the functionality of rsync (a great sync utility, but one that does not do bi-directional synchronization). So all I needed to do was download the script and make it executable.

Using the utility, I was able to put the most current copy of the folder at my shell account by just running bsync MyFolder me@my.shellacct.com:MyFolder (the MyFolder directory must already exist at the remote address). So I've at least made a beginning.

That said, I'm still in the early stages of investigating approaches to do the sort of bi-directional synchronization I'm after. Tests with bsync have gone well so far but, if I'm understanding correctly, this utility does not deal well with sub-folders--which could be an issue in my use scenario; it seems bsync will work best on a folder or directory that contains only files, while my directory has a few sub-directories under it.

Other possible options I've found are csync (which uses smb or sftp), osync, bitpocket, and FreeFileSync. The first 3 of these are most attractive to me since they are command-line utilities. FreeFileSync is a graphical utility, though it does appear that it can be run from the command line as well. I should also mention unison, which I've looked at but not pursued--the reason being that it apparently requires that the same version be installed on all concerned machines, which is something that will be unrealistic in my case (Arch runs on 2 machines, an older Ubuntu on another, and BSD on the fourth).

So, what is your bi-directional synchronization software preference? Any further tips or pointers to add on accomplishing this task?

Wednesday, January 13, 2016

Addendum to 11th installment: Lynx; scraping credentialed web pages

Sort of a dramatized headline for what I've accomplished using the command-line Lynx browser, but not too far from the mark. I've described in previous entries how I've used lynx to accomplish similar goals of extracting target information from web pages, so this entry is a continuation along those same lines.

I recently signed up for a prepaid cellular plan touted as being free, though it is one limited to a certain (unreasonably low, for most) number of minutes per month. The plan has thus far worked well for me. The only real issue I have come across is that I had not yet discovered any way easily to check how many minutes I've used and how many are left. The company providing the service is, of course, not very forthcoming with that sort of information: they have a vested interest in getting you to use up your free minutes, hoping thereby that you'll realize you should buy a paid plan from them, one that includes more minutes. The only way I'd found for checking current usage status is to log in to their web site and click around til you reach a page showing that data.

Of course I am generally aware of the phenomemon of web-page scraping and also have heard of python and/or perl scripts that can perform more or less automated interactions with web pages (youtube-dl being one example). So I initally thought my task would require something along these lines--quite the tall order for someone such as myself, knowing next to nothing about programming in either python or perl. But then I ran across promising information that led me to believe I might well be able to accomplish this task using the tried and true lynx browser, and some experimentation proved that this would, indeed, allow me to realize my goal.

The information I discovered came from this page. There is found a description of how it is possible to record to a log file all keystrokes entered into a particular lynx browsing session--something reminiscent of the way I used to create macros under Microsoft Word when I was using that software years ago. The generated log file can then, in turn, be fed to a subsequent lynx session, effectively automating certain browsing tasks, such as logging into a site, navigating to, then printing (to a file, in my case) a page. Add a few other utilities like cron, sed, and mail, and I have a good recipe for getting the cellular information I need into an e-mail that gets delivered to my inbox on a regular basis.

The initial step was to create the log file. An example of the command issued is as follows:
lynx -cmd_log=/tmp/mysite.txt http://www.mysite.com.

That, of course, opens the URL specified in lynx. The next step is to enter such keystrokes are are necessary to get to the target page. In my case, I needed to press the down arrow key a few times to reach the login and password entry blanks. I then typed in the credentials, hit the down arrow again, then the "enter" key to submit the credentials. I then needed to hit the "end" key on the next page, which took me all the way to the bottom of that page, then the up arrow key a couple of times to get to the link leading to the target page. Once I got to the target page, I pressed the "p" key (for print), then the "enter" key (for print to file), at which point I was prompted for a file name. Once I'd entered the desired file name and pressed the "enter" key again, I hit the "q" key to exit lynx. In this way, I produced the log file I could then use for a future automated session at that same site. Subsequent testing using the command
lynx -cmd_script=mysite.txt http://www.mysite.com

confirmed that I had, in fact, a working log file that could be used for retreiving the desired content from the target page.

The additional steps for my scenario were to turn this into a cron job (no systemd silliness here!), use sed to strip out extraneous content from the beginning and end of the page I'd printed/retrieved, and to get the resulting material into the body of an e-mail that I would have sent to myself at given intervals. The sed/mail part of this goes something like
sed -n 24,32p filename | mail -s prepaid-status me@mymail.com*

* I can't go into particulars of the mail program here, but suffice to say at least that you need a properly edited configuration file for your mail sending utility (I use msmtp) for this to work.

Field Notes of an Audacious Amateur