15 October 2009

Time, learning, teaching, markup

A rant about blog-writing technology.

This term I am not teaching so that I can focus my energies on writing a thesis. Instead, I found other creative ways of wasting time. I have recently succumbed to peer-pressure and started using Facebook and Twitter. I am also writing blog posts.

I realized recently that my knowledge about reducible flowgraphs is shallow. Normally I’d go and read about them from a textbook or some old articles and perhaps even do a few exercises. This time I decided to apply an advice I heard often: “The best way to learn a subject is to teach it.” Since I don’t teach, the next best thing is to write a blog post about it. (That’s not entirely true: Even if I would teach a blog post would be a better option, because I’m usually either demonstrating and have little say on the content, or I’m in charge of a course on a subject like Unix programming — good luck with fitting graph theory in there.)

The obvious question is: Why is this post not about reducible flowgraphs and instead it rants about an inexistent post about reducible flowgraph? OK, perhaps phrased like that it isn’t quite ‘obvious’. Maybe this works better: Get to the point!

Well, I’m technology frustrated. The mechanics of writing and publishing should take an insignificant amount of time compared to the time it takes to produce the content. Alas, that’s not the case when you want to convey information using:

  1. text
  2. drawings
  3. code (in a programming language)
  4. (math) formulas

Let’s see what’s available. (La)TeX is very good for text and formulas. You just go and write your text. A new paragraph is achieved by 2 enter keys, marking up a block of text with italics takes 5 extra characters, inserting a link takes 9 extra characters. For formulas, it is superb: I can typeset ab with 5 keys, including the 2 delimiters. (If you think I’m crazy to count keys, wait until I tell you how many you need in the xML family of languages.) For drawings (La)TeX is again very good if you use the TiKZ package. Describing pictures in that language is so cool that I often fantasize about using it to introduce kids to programming. For code, the LaTeX(-only) package listings does a decent job: You have to bracket the piece of code with the right environment (34 keys), but sometimes you need to configure it to get nice results.

However, there is a big problem with LaTeX: browsers don’t know how to render it. Browsers know how to render xML: HTML, XML with style-sheets, MathML, SVG, and so on. Unfortunately, writing in these languages is a nightmare because they were designed to be easily consumed and produced by programs, not by people. Of course, you may use an editor that hides the details, but there are two problems with that approach. First, I know of no good editor that handles all those four types of content. Second, switching between keyboard and mouse takes time and disrupts focus so wysiwyg editors are out of the question for me. The corollary to that is that what I want is a language that lets me type easily the content. After that is available, fancy editors can bring minor improvements.

But let me be more explicit about why using xML is horrible. For text it is fairly OK. A new paragraph is achieved by a minimum of 3 keys (or 7 keys if you want proper bracketing of tags) but you usually type the 2 extra enters anyway to have the paragraphs nicely layed out while editing. Marking up a block of text with italics takes 7 extra characters. Inserting a link takes 14 characters (5 more than for LaTeX, which is ironic considering that when people think HTML they think links). But it gets worse. The xML way to typeset ab is to use MathML and it takes… 89 characters! (Plus, locally the file must have the extension xml so I’m not sure it will work on Blogger.) Call me crazy, but I’m not willing to trade 5 characters for 89 characters no matter how better structured and more explicit the latter is. For drawings there’s something called SVG, which is similarly verbose, although perhaps not quite as bad when compared to TiKZ (or metapost). For typesetting code the xML solution is actually quite good: You just put the code in the proper environment (31 keys) and then you let a script do its job. I never saw it fail (although the implementation is quite hackish).

There is of course the option of writing LaTeX and converting that to xML. There are a few tools around to do just that. The best, in my opinion, is HeVeA. It handles text perfectly. For TiKZ drawings you can add an environment (again, around 30 keys) and it will make PNGs out of those. Cool. For formulas it uses plain HTML (no MathML) so the result works on older browsers but looks uglier than it could on newer ones. (MathML support is experimental, and it doesn’t work on my files.) You also have the option of transforming selected formulas to images (just like you do with TiKZ pictures). This makes sense if the formulas are complicated, in which case the HTML-only rendering is likely to suck. For code, HeVeA has some built-in special handling of the listings package, which produces results that I find quite ugly. HeVeA is almost ideal, except it is hard to customize, even though it’s supposed to be easy. For example, I want to translate the lstlisting LaTeX environment into a pre block with a certain class and let the script I mentioned do the syntax highlight. Why? It looks better and it is backed by Google, meaning that it’s likely to be more up-to-date than the internals of HeVeA. (HeVea has one great developer, but he is still one.) This customization seems to require changing the OCaml code of HeVeA and that is not easy. I also want to change some uses of df and dd tags and some purple (yuck!) things it generates but, again, I need to spend some time to figure out how to do it: It’s not easy. Of course, one solution for code is to turn it into images: That would look OK (though no colors) but it precludes readers to do copy and paste.

Another LaTeX to HTML converter is latex2wp. It handles text OK. It handles formulas by exploiting Wordpress support, which makes it unusable on Blogger or other sites (unless you are willing to piss off Wordpress for using their resources to drive traffic to a competitor). This brings up another point about HeVeA: If you are a Wordpress user it makes sense to exploit their support for LaTeX. But, good luck with convincing HeVeA to translate ab into latexab. Finally, if I remember correctly, latex2wp has no support for drawings or code.

HeVeA is a successor written by Luc Maranget of a convertor written by the Xavier Leroy. (And, if you don’t know who the latter is, then you are not a computer scientist.) Xavier remarked that an ideal solution for writing technical documentation would be to design a new language that is easily convertible to both LaTeX (to produce printable PDFs) and xML (for easy viewing in browsers), but practically most people writing technical stuff would prefer to use LaTeX and will be reluctant to learn a new language. That was before Wikipedia: Its markup language is convertible to both HTML and PDF (through LaTeX?). Alas, I think their tools are not publicly available. (If they are and are easy to use on Linux I want to know!) Supposing that the tools are available, there is a bigger problem: While text, code, and math are supported, drawings have to be written in the verbose SVG.

That was the motivation. Now comes the challenge.

Design a concise language for describing documents with text, drawings, code, and formulas. Make it feel familiar by having syntax similar to existing languages. Implement translations to LaTeX and xML. Make sure the implementation is very easy to customize. (Keep in mind the failings of HeVeA in this area.) The usage of the tool should be as easy as tool -tex docname.

Now, why didn’t I think of using that as a project when I was teaching?

PS: In case you wonder, I wrote this in LaTeX and used HeVeA.

No comments:

Post a Comment

Note: (1) You need to have third-party cookies enabled in order to comment on Blogger. (2) Better to copy your comment before hitting publish/preview. Blogger sometimes eats comments on the first try, but the second works. Crazy Blogger.