[Tshwanelex-l] Example of TLex → XeLaTeX workflow

James Crippen jcrippen at gmail.com
Sun Jul 15 20:07:31 EDT 2012


The following link is a work-in-progress sample of a TLex → XeLaTeX workflow.

http://www.drangle.com/~james/dictionary/

I start with TLex’s unformatted XML output (“use hidden characters”
and “export empty attributes” are turned on). The TLex styles are
ignored. The embedded output order information is ignored too, and
instead I rely solely on the order of elements in the database. To
keep things sane, it’s essential to ensure that there aren’t any
significant differences between the DTD order of stuff and the output
order. This has been a minor source of frustration for me a few times,
but I’ve gotten in the habit of making changes to both at the same
time. (It would be nice to have a ‘Reset Output Order’ function
implemented somehow...)

I use Saxon 9.4 HE to do all the XML processing, though any XML
library that can handle XSLT 2.0 and XPath 2.0 should suffice. The
XSLT stylesheet does most of the heavy lifting to convert the XML to
TeXML. The XSLT code I’ve written is pretty crude and repetitive, and
there’s lots of room for both algorithmic and efficiency improvements.
I’m an obsessive commenter, however, so I think it’s fairly easy to
follow despite the code verbosity.

TeXML is basically an intermediate language between XML and TeX, being
little more than an XML schema for (La)TeX commands and environments.
The whitespace output imposed by the TeXML processor is sometimes
inconvenient, but this can be fixed downstream in LaTeX.

The data are converted to actual LaTeX and then included by the main
document file. Some intervening sed scripts do some minor
reformatting. I found it easier to use basic Unix commands than to try
to figure out how to make XSLT’s regular expression stuff work for  me
properly.

The main LaTeX document is designed with the Memoir class, and depends
a lot on its advanced layout functionality. There are a bunch of
accessory files loaded in the preamble to keep the main document
fairly short. Just before the actual document body the various
dictionary commands are implemented. I’ve mostly used the LaTeX3
xparse package for commands, which should be available in any TeX
installation based on TeX Live 2011 or 2012.

I used XeTeX as the TeX engine simply because that’s what I’m most
comfortable with. I suppose that it could be converted to use LuaTeX
without much work. The sources are Unicode so non-Unicode TeX systems
(e.g. pdftex) won’t work without a lot of fuss.

None of this was designed with any kind of portability in mind because
if it works for me then I’m satisfied. It’s also very specific to my
particular dictionary structure; I don’t think that a general solution
for dictionary typesetting is really possible given the wildly varying
needs of different languages. But I do hope it will be of some use to
other TLex users interested in generating output with LaTeX.

An issue I hope to address in the future is separating the data file
into alphabetic chunks, then including each of them in a separate
chapter in the LaTeX document; this probably falls in the XSLT domain.
I haven’t figured out how to convince TLex to handle inline cross
references, but when I do I hope to make them into hyperlinks like the
regular crossrefs are currently. I also haven’t dealt with crossrefs
to anything other than lemmas. In addition, there are various things
in the database that I haven’t implemented in the LaTeX output yet.

Comments, questions, and suggestions are welcome. This list is
low-traffic enough that I’ll post replies here.

Cheers,
James


More information about the Tshwanelex-l mailing list