Field Notes of an Audacious Amateur: cron

This series is written by a representative of the latter group, which is comprised mostly of what might be called "productivity users" (perhaps "tinkerly productivity users?"). Though my lack of training precludes me from writing code or improving anyone else's, I can, nonetheless, try and figure out creative ways of utilizing open source programs. And again, because of my lack of expertise, though I may be capable of deploying open source programs in creative ways, my modest technical acumen hinders me from utilizing those programs in what may be the most optimal ways. The open-source character, then, of this series, consists in my presentation to the community of open source users and programmers of my own crude and halting attempts at accomplishing computing tasks, in the hope that those who are more knowledgeable than me can offer advice, alternatives, and corrections. The desired end result is the discovery, through a communal process, of optimal and/or alternate ways of accomplishing the sorts of tasks that I and other open source productivity users need to perform.

Showing posts with label cron. Show all posts

Saturday, November 17, 2018

Twitter Alerts: A Trick for the Twitter-averse

I'm not a registered Twitter user and have never managed to think of a compelling reason to be one. In fact, the only time I ever really have or want anything to do with Twitter is when some Twitter feed comes up in an internet search. And all I do in those cases is read any relevant text and move on. I suppose I'm not much of a socialite and accordingly have little interest in social media phenomena such as this.

Recently, however, I became interested in joining a service that sends out invitations periodically on Twitter. Not having an account and not being interested in much of anything else Twitter represents or offers, I'm at a distinct disadvantage in this case: what am I supposed to do, start checking it every day for possibly months on end in hopes of stumbling upon the desired invitation? Not for me, obviously.

But I soon began to realize, based on other web-scraping and scheduling jobs I'd set up recently, that I would likely be able to automate the task of checking this Twitter feed for invitations. I had tools like text-mode browsers that seemed to render Twitter pages pretty well, as well as commands like grep for finding target text. And of course cron could play a key role in automating things as well. Accomplishing the task actually turned out to be quite simple.

I had already set up a way to check a Twitter feed using keystrokes and rendering the text in a terminal on my desktop: elinks -dump -no-numbering -no-references https://twitter.com/TargetTwitt | tail -n +21 | head -n -8 |less seemed to do the job just fine.* The problem with that approach with regard to the task at hand is that I would need to remember to use the key combination to check for invitations daily.

The next step, then, could be to recruit grep to search for target text--a keyword like "invit"--which, if found in the text essentially scraped from the Twitter feed, would trigger my machine to send me an e-mail. Since I already regularly use mailx to auto-send myself various sorts of e-mails, most of that aspect of this task was already in place as well.

The command I tested and that seemed to bring together well most of these various elements is as follows: body="$(elinks -dump -no-numbering -no-references https://twitter.com/TargetTwitt | grep -A 1 -B 1 the)" && echo "$body" | mailx -s Twit-invite me@my-em.ail.** That line, of course, uses, for testing purposes, a common word (the article "the") as the searched string to prove that the whole thing will work together as expected.

The command first dumps text from the Twitter feed to stdout then pipes it to grep, where grep looks for the target text-string. If the string is found, it is included--along with a couple of adjacent lines--in the body of an e-mail that mailx will sent to me (the scheme assumes that a valid smtp transport mechanism has been set up for mailx--a topic beyond the scope of this brief post). If the term is not found--something I also tested by changing the search term to one I was sure would not be included in the twitter feed--nothing further is done: the scraped text simply gets discarded and no e-mail is sent.*** The test passed with flying colors, so the only remaining thing to implement was to set up a daily cron job.

Though this configuration seems to work well and looks as though it will serve my purposes just fine, it likely could be improved upon. Should any readers of this blog have suggestions for improvements, feel free post them in the comments section below.

* lynx, cURL, wget, and other tools could easily likely replace elinks, and might even be more effective or efficient. Since I know elinks fairly well and I use it for other similar tasks, I did not investigate any of those.

** Command found and copied in large part from https://unix.stackexchange.com/questions/259538/grep-to-search-error-log-and-email-only-when-results-found.

*** More precisely, I think what happens is that when a string is searched for using grep and is not found, grep returns exit code 1, which, in the case of this series of commands, means the process does not proceed to && (which means something like "proceed to the next command on successful completion of the previous command").

Wednesday, January 13, 2016

Addendum to 11th installment: Lynx; scraping credentialed web pages

Sort of a dramatized headline for what I've accomplished using the command-line Lynx browser, but not too far from the mark. I've described in previous entries how I've used lynx to accomplish similar goals of extracting target information from web pages, so this entry is a continuation along those same lines.

I recently signed up for a prepaid cellular plan touted as being free, though it is one limited to a certain (unreasonably low, for most) number of minutes per month. The plan has thus far worked well for me. The only real issue I have come across is that I had not yet discovered any way easily to check how many minutes I've used and how many are left. The company providing the service is, of course, not very forthcoming with that sort of information: they have a vested interest in getting you to use up your free minutes, hoping thereby that you'll realize you should buy a paid plan from them, one that includes more minutes. The only way I'd found for checking current usage status is to log in to their web site and click around til you reach a page showing that data.

Of course I am generally aware of the phenomemon of web-page scraping and also have heard of python and/or perl scripts that can perform more or less automated interactions with web pages (youtube-dl being one example). So I initally thought my task would require something along these lines--quite the tall order for someone such as myself, knowing next to nothing about programming in either python or perl. But then I ran across promising information that led me to believe I might well be able to accomplish this task using the tried and true lynx browser, and some experimentation proved that this would, indeed, allow me to realize my goal.

The information I discovered came from this page. There is found a description of how it is possible to record to a log file all keystrokes entered into a particular lynx browsing session--something reminiscent of the way I used to create macros under Microsoft Word when I was using that software years ago. The generated log file can then, in turn, be fed to a subsequent lynx session, effectively automating certain browsing tasks, such as logging into a site, navigating to, then printing (to a file, in my case) a page. Add a few other utilities like cron, sed, and mail, and I have a good recipe for getting the cellular information I need into an e-mail that gets delivered to my inbox on a regular basis.

The initial step was to create the log file. An example of the command issued is as follows:
lynx -cmd_log=/tmp/mysite.txt http://www.mysite.com.

That, of course, opens the URL specified in lynx. The next step is to enter such keystrokes are are necessary to get to the target page. In my case, I needed to press the down arrow key a few times to reach the login and password entry blanks. I then typed in the credentials, hit the down arrow again, then the "enter" key to submit the credentials. I then needed to hit the "end" key on the next page, which took me all the way to the bottom of that page, then the up arrow key a couple of times to get to the link leading to the target page. Once I got to the target page, I pressed the "p" key (for print), then the "enter" key (for print to file), at which point I was prompted for a file name. Once I'd entered the desired file name and pressed the "enter" key again, I hit the "q" key to exit lynx. In this way, I produced the log file I could then use for a future automated session at that same site. Subsequent testing using the command
lynx -cmd_script=mysite.txt http://www.mysite.com

confirmed that I had, in fact, a working log file that could be used for retreiving the desired content from the target page.

The additional steps for my scenario were to turn this into a cron job (no systemd silliness here!), use sed to strip out extraneous content from the beginning and end of the page I'd printed/retrieved, and to get the resulting material into the body of an e-mail that I would have sent to myself at given intervals. The sed/mail part of this goes something like
sed -n 24,32p filename | mail -s prepaid-status me@mymail.com*

* I can't go into particulars of the mail program here, but suffice to say at least that you need a properly edited configuration file for your mail sending utility (I use msmtp) for this to work.

Wednesday, November 27, 2013

14th installment: home-brewed WOD by e-mail daily

I used to subscribe to an on-line dictionary's word-of-the-day (WOD) program. That entailed signing up, using a valid e-mail address, on their web site so that they would, each day, send a different WOD along with its definition to that address. The service proved to be a bit flaky, however, and the e-mails would sometimes get caught up in my spam filter. So, somewhere along the line--perhaps owing to an e-mail address change--I stopped receiving those educational e-mails.

I'd had in the back of my mind going back to using that service but hadn't signed up again--all the while having a nagging suspicion that it must be possible, using open source tools, to cobble together some way of doing this sort of thing from my own computer, thereby obviating the need to sign up for some service. But could I, with my modest technical acumen, actually pull this off? Read on to find out the result.

Meantime, as I've continued learning my way around GNU/Linux and computing in general, I made some headway in learning how to use the program remind to help me keep track of scheduled appointments and to-do items, even progressing so far as puzzling out how to get my computer to e-mail me my schedule on a regular basis. Perhaps I'll write more about that accomplishment--of which I'm quite proud--in a future entry.

The relevance of that observation to the present post is that I learned how to use the program mail, along with the small msmtp, for sending--when triggered by cron--to myself automated reminder e-mails from my system. So some major ingredients were actually already in place that would allow me finally to implement my own, home-brewed WOD-by-e-mail solution.

This was perhaps the final piece of the puzzle for me, although another crucial piece had been under my nose recently as well, something I ran across while investigating bash functions (I wrote about that a few installments earlier, as you can see here). By adapting one of the bash functions I'd found, I was first able to see the WOD from the command line by simply issuing wod from a command prompt. But I soon began forgetting to do that, which spurred me to consider once again having the WOD somehow e-mailed to me.

Finally, putting two and two together, I realized I could adapt the thrust of that function to my needs by having its output placed into the body of an e-mail that would be automatically sent to me each day at 6 A.M. Following is a description of how I did that.

A key ingredient I have not yet mentioned is the text-mode browser lynx, which produces an html file that gets parsed for material that will be inserted into the e-mail body: and I didn't mention it because lynx and me go back a long, long ways--clear back to the close of the twentieth century, to be precise. The line, swiped straight from the bash function I found on the web, is as follows: lynx -dump http://www.wordthink.com/. That simply "dumps the formatted output of the default document or those specified on the command line to standard output," as the man page tells us--obviously not enough to get a WOD into an e-mail body, but fairly close.

What's needed, then, is, like the bash function, to pipe that output through grep, searching for a certain pattern, then to extract from it the relevant lines which belong in the body of the e-mail. Those results then get piped to mail, which inserts the lines into the body of an e-mail. Below is the full line that I inserted into my crontab file, minus the bit that tells the line to be executed at 6 A.M. daily:

lynx -dump -nonumbers "http://www.wordthink.com/" | grep -A 10 -m 1 "Today's Word of the Day" | mail -s WOD my-addy@my-mail.com

This cron entry tells lynx to dump the page found at the specified link to standard output (whatever that means), then to pipe that through grep, searching for the phrase "Today's Word of the Day." Once that phrase is found, grep is to stop searching (the -m 1 switch--it's to look for only one instance) and to "print" the ten lines preceding (the -A 10 switch), which actually then get piped to mail and become the body of the message. The -s switch specifies what the subject line of the e-mail should be. The -nonumbers switch just tells lynx not to preface the links it finds in the page with numerals between square brackets, which it would otherwise do.

That's about it for this entry. I really do need to write up a remind entry, since I had hoped long and hard to find some scheduling utility that would be no-frills, yet powerful enough to issue me reminders on a regular basis. So that may be next on the agenda for this blog.

Some afterthoughts: piping the output of lynx -dump through grep to extract target text is not ideal, since--if I've read its man page correctly--you are limited to extracting text by line. A problem arises here because the number of lines for the target entries for the WOD can vary day-by-day. As a result, it is likely that either extraneous line(s) will be included on many days, or that some target line(s) will get cut off on other days. Perhaps piping the lynx -dump output through sed or awk--which, as I understand it are both far more flexible when it comes to identifying target text--might be a better solution. But because I am not well-versed in either of those utilities and because extracting the WOD from web sites whose layout may change at any time is a moving target, I am presently not attempting to improve on the method I've described here. I do, on the other hand, welcome suggestions for improving this WOD-by-e-mail solution from any reader who may know those--grep included--or other utilities better than I.

Wednesday, December 19, 2012

Eighth Installment: compress and encrypt/decrypt a directory

I recently visited a relative who is studying in the natural sciences and who, surprisingly, is even less capable in certain technical aspects of computing than I am. He was trying to create, on his Mac, a script that would run as a cron job, and asked me for some pointers. Though I know the basics about cron and was willing to pitch in, I wasn't so sure about the script: you see, calling my bash skills rudimentary would be high praise. Nonetheless I decided that, with some web searching, I might be able to assist with that, too. Sure enough, I was able to find just the sort of information that would help us create a script that would tar and compress, then encrypt, a target directory. Details--shamelessly lifted from various locales on the web--are included below.

Over the years that I've been using Linux I have, of course, read more than a few articles that describe methods of encrypting files or partitions. Most recently, for example, there appeared on Lxer an article that described a clever way of encrypting a local directory that then gets backed up to some cloud storage service like dropbox. I've bumped up against the issue of encryption when doing fresh installations as well, as it has been the case for some time now that an option is given on installation for many Linux distros of encrypting, for example, the /home directory.

Despite reading at least some of those articles with interest, I did not feel the need to implement such encryption on my own systems. So it was not until someone else asked my assistance in doing something like this that I actually tried it myself. As you will see, it was actually fairly simple to implement. But first, a few caveats.

I'll skip any details in the following description regarding the cron aspect of this project--not that I could provide a whole lot of enlightenment anyway--other than to say that it's a handy way to make programs or processes run on a set schedule on computers than run *nix. One way I've used it is to cause a weather map, which I've set up as the desktop background on one of my computers, to update every 10 minutes--look for a future entry in this blog on how I managed that.

I'll also not speak in any depth about another ingredient in this recipe--tar--other than to say that it is an abbreviation for for "tape archive." I myself do not understand its workings terribly well, though I've used it on several occasions. I will mention on a related note, however, that, in my research, I ran across articles that used another, similar, utility--dd (an abbreviation for "disk dump")--to create compressed and encrypted archives. But I did not follow up on the dd option and so cannot post any further information about how that was done.

Finally, I can't speak in any depth about the program I used for doing the encryption--openssl--or about another program with which I experimented and which also does encryption--gpg. But I promise, despite those rather glaring deficits, that I will describe something I managed to accomplish and which you, too, should be able to accomplish by following the steps outlined.

Perhaps in some future entry for this blog I'll be able to further explore tar, dd, and/or cron. But for now I'm going to focus my attention mainly on the option we ended up using, which involved mainly tar and openssl.

The relative in question, as I mentioned, works in the natural sciences. He has a directory of his ongoing work that he wants to back up regularly, but to which he does not want anyone else to have access. His choice for backing up beyond his own PC, is to use dropbox. So the task was, as mentioned, to compress and encrypt the target directory: moving it to the location on the local machine where dropbox would find it so as to back it up will also not be covered in this write-up, though that step did end up being part of his final resolution.

So, what's left? It was quite easy to find directions of the web for doing all this. I pretty much went with the first workable solution I found, which came from the linuxquestions forum (the relevant thread can be found here).

The incantation we used was as follows:

tar -cj target-dir | openssl enc -aes128 -salt -out target-dir.tar.bz2.enc -e -a -k password

What that line does may be evident to most, but I will offer a bit of review nonetheless. The target directory is first tar'red and compressed with bzip, (the c option stands for "create" and the j option specifies that the created file should be compressed with bzip2) then it is piped to openssl for encryption. The word "password" is, obviously, to be replaced by whatever password the user chooses.

One possible drawback to this method, as pointed out in the thread from which I lifted it, is that the encryption password gets entered, in plain text, right on the command line (which is slightly less of an issue with a cron script such as we were creating). Thus, anyone who can gain access to the machine can, by using the command line history, see what the encryption password was. Since someone gaining access to his computer and viewing the command line history was not a concern for the fellow I was helping, this is the solution we implemented. But that potential concern can be easily remedied by simply leaving off the -k password switch at the end, which has the effect of prompting the user for a password, which does not get echoed to the command line.

To decrypt the file, the following command--which prompts for the password--is used:
openssl enc -aes128 -in target-dir.tar.bz2.enc -out target-dir.tar.bz2 -d -a

The file can then be uncompressed and untar'red. This part of the process could likely be reduced from two steps (decryption, then uncompression/untar'ing) to one by using a pipe, but since it was presumed, for purposes of this project, that the file would act simply as insurance against loss--the need for ever actually recovering the content being very unlikely--I did not pursue streamlining that aspect.

I did manage to find and test a couple of other variants which I will offer here as well. The second variant was found here, and follows:

tar -cj target-dir | openssl enc -e -a -salt -bf -out target-dir.blowfish

It is much the same as the first variant, though it uses a different encryption method called blowfish. I am uncertain which of these two encryption schemes is considered better. To decrypt the compressed directory, the following command is used:

openssl enc -d -a -bf -in target-dir.blowfish -out target-dir-decrypt.tar.bz2

Finally, I discovered yet another variant, details about which can be found here. A sample of how to use this one is as follows:

tar -cjf target-dir.tar.bz2 target-dir/ | gpg -r user -e target-dir.tar.bz2

As will be noted, this variant uses gpg to encrypt the directory. Of course user must be replaced by the name of someone who has a valid gpg key on the system, usually the primary user of said machine or account.

An interesting feature I discovered about this method is that a time-sensitive gpg key can be created, i.e., one that expires after a certain interval. If I understand correctly how this works, once the key expires, the directory can no longer be decrypted.* This feature should, obviously, be used with care.

Decrypting the directory can be done in the following way:

gpg --output target-dir.tar.bz2 --decrypt target-dir.tar.bz2.gpg

The same two-step process of decrypting, then untar'ing/uncompressing, applies to these two methods as well.

This sums up what I have to offer in this entry. Now that winter is upon us northern-hemispherers, I may be able to post more frequent entries. There are a few things I've been wanting to document for some time now.

* Correction: an anonymous commenter writes of my claim that the key expiration makes decrypting the file no longer possible that "Unfortunately not. The expired key is no longer trusted but is still functional."

Field Notes of an Audacious Amateur