This series is written by a representative of the latter group, which is comprised mostly of what might be called "productivity users" (perhaps "tinkerly productivity users?"). Though my lack of training precludes me from writing code or improving anyone else's, I can, nonetheless, try and figure out creative ways of utilizing open source programs. And again, because of my lack of expertise, though I may be capable of deploying open source programs in creative ways, my modest technical acumen hinders me from utilizing those programs in what may be the most optimal ways. The open-source character, then, of this series, consists in my presentation to the community of open source users and programmers of my own crude and halting attempts at accomplishing computing tasks, in the hope that those who are more knowledgeable than me can offer advice, alternatives, and corrections. The desired end result is the discovery, through a communal process, of optimal and/or alternate ways of accomplishing the sorts of tasks that I and other open source productivity users need to perform.

Saturday, October 19, 2024

TeX/LaTeX: tex to txt

Since a .tex file is already essentially compatible with all text editors, you may be wondering why anyone would need to do any sort of conversion to .txt format. I would also have wondered at why such a task would need to be done--that is, until I ran into the need myself.

But I did run into such a need. The issue was that I had a nicely formatted .tex document--an article, actually, that I had translated with the aid of my computer, into a foreign language. I had one fairly competent translator check the machine translation over and do some corrections and thought I was ready to go. Then, another translator had a look and offered to do further improvements, to which I gladly assented. This is where I ran into a problem with the nicely-formatted file.

This translator, although quite well-qualified and fairly capable in matters technical, was nonetheless not at all familiar with Tex/LaTeX formatting. So I couldn't really give him the document in the most optimum format for me (.tex) for correcting. And at the same time getting from him the corrected text in a format like .pdf or .doc would further complicate my task of getting it back to its nicely-formatted .tex state. Thus, I decided that .txt would be the most neutral format to use for providing the translator with the computer-translated text for further revision. But how to do that?

Well, I had already created a pdf of the document, so I had that to work from. Using sed or awk to strip out all the TeX formatting codes would be an option for someone far better versed in those utilities that I am. But even that might prove a fairly involved and time-consuming task.

Some on-line searching revealed another possible solution: it involved using the utility pdftotext. It seemed worth a try.

Sure enough, running pdftotext file.pdf file.txt actually gave quite good results. There were a few anomalies I needed to clean up, but they were actually fairly few in number. I'd say the whole process took about 15 minutes total, after which, I had a .txt version of this 5k-word file that I could submit to the translator.

So, in the unlikely event that you may need to convert your .tex file to .txt, I can recommend the routine of first converting it to .pdf, then the resulting file to .txt. I should probably mention that this file didn't contain graphics, a table, or any sort of chart. So I can't vouch for how it would work on files containing comparatively more complex elements such as that. So, probably the less complex the document is, the more successfully it will convert using this method. So, there you have it, a method for converting .tex to .pdf to .txt

No comments:

Post a Comment