XML-Lit - A Simple XML-Based Literate Programming System (v1.0)

Rafael Sevilla

Software Developer
Inter.Net Philippines

          106 Esteban Street, Legaspi Village
          Makati City
          Philippines
          <sevillar@team.ph.inter.net>
        

xml-lit is Copyright © 2001 Rafael R. Sevilla

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA


Table of Contents

Introduction
Motivation for this program
Guidelines for submitting patches
System requirements
Using xml-lit
Producing documents for use with xml-lit
xml-lit Usage
The main program
Hash Tables and Hash Functions
The tangler
The fragment data structure
The Parser's State
The fragment data structure in action
Utility functions for new-style parsing
The character data handler
Finding stuff in the tree
The start element handler
The end element handler
Old-Style Parsing
Handler setting
Tangle the tree
Weaving
Error Handling
Further Improvements and Bugs to Fix
References

 

Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do.

 
--Donald E. Knuth 

--

Introduction

I recently found a simple program called xmltangle by Jonathan Bartlett [BART01] that provides a simple literate programming system based on DocBook. I have been somewhat frustrated by that program though; for one thing, it did not allow program code snippets to be enclosed within CDATA sections, which would make including a program inline a lot easier to do, and easier to read on screen while you're editing it, especially with programming languages that are chock full of <'s such as the typical C program, or worse yet, an XSL stylesheet, which I planned to use Jonathan's program for. So I set off to create a complete rewrite of the program, which uses James Clark's expat XML parser [CLAR01].

So now, I have come up with my own simple literate programming system, xml-lit which takes a similar approach, but instead of enclosing code snippets within within DocBook <programlisting/> tags, I define a new namespace xml-lit which contains all of the special elements which we use to support the system. The program is also backward-compatible with Jonathan's work given a command line switch.

Motivation for this program

I heard about the concept of literate programming long ago, but never really tried it because I felt at the time that it wouldn't really be very useful. Boy was I wrong. I recently got a hold of a copy of a small program called xmltangle and tried it. While it was a very nice program that worked okay, I had a number of small frustrations that gradually grew, which I have already mentioned before. But attempting to use it to mark up a small program I wrote for the office was a highly enlightening experience, and I understand now what Donald Knuth was saying when he said of the concept of literate programming: "In fact, my enthusiasm is so great that I must warn the reader to discount much of what I shall say as the ravings of a fanatic who thinks he has just seen a great light." [KNUT83] My brief work with xmltangle had convinced me that this is indeed a superior approach to the methodology of programming, because now, my job is not just to write instructions for a computer to do, but write those instructions in such a way as to make it possible for someone else to understand what I'm trying to do. Along the way the approach has forced me to think more carefully about everything I do when writing a program.

I of course contacted Jonathan Bartlett about his program, but since he didn't respond for a long time, I set out to make my own version of his program, just as an excercise. I was also going to do some work programming XML myself and felt this would be perfect as a vehicle for learning how to use the Expat library. At around the time I completed version 0.5, a nearly exact replica of Jonathan's program, I had also read Donald Knuth's original paper and a few other papers on the subject of literate programming, and thought of many ideas on how to improve xmltangle. I conceived of creating a new namespace, and then creating elements within the namespace that would support the literate programming concept. At around that time Jonathan Bartlett finally answered my email to him about his program, and I discussed this with him a little. He told me he had other ideas on how to add these other features (using XML processing instructions), so I then decided to strike out on my own. Now, here it is.

Guidelines for submitting patches

If you want to patch this program I will ask you to follow these simple rules:

  1. Make your patches to the file xml-lit.xml and no other.

  2. Document all of your changes.

  3. Clearly mark out the explanations for your changes by making callouts or otherwise marking them clearly so I can see them easily (e.g using the <emphasis> tag).

Long and complicated patches which do not obey these criteria will not be accepted. Do not patch the generated source files because they are NOT source code.