Weaving

The file weave.c contains another set of XML parser handlers that perform the weaving. Weaving has become somewhat more complicated than in version 0.5, because there are now certain elements in the namespace (such as fragmap and fragment elements) that should be rendered into something visible in the weaved output. I have thought this might be better done with XSLT stylesheets, and indeed it will be in the near future, but I also think that it might be best to keep external dependencies down to the absolute minimum. Accordingly, there is a weaver included in the source, but this weaver is also highly rudimentary:

  1. Remove the namespace declaration for XML-Lit

  2. Replace all the code elements with text that identifies the file.

  3. Note and identify all fragmap names and data, enclose the PCDATA within them with angle brackets and assign them a number.

  4. All the fragment elements get the text of the corresponding fragment map and its number.

Conceptually, this process is much, much simpler than the contortions involved in tangling, as forward references to fragment maps from fragments are not allowed (they could be supported in a future version, but how they would be useful in a sane literate program completely escapes me), so a tree does not need to be constructed to do this. The data structure that would support this we have used before: a hash table. The buckets of the hash table would contain the names and the text within a fragmap.

There is only one thing that complicates this mix: namespaces. There may be other namespaces in the mix, for example, with a document that used MathML or embedded SVG. To simplify handling this we attempt to perform the parse using a non-namespace-aware parser, so that namespace declarations look like attributes. We just look for the prefix for our target namespace, and perform the appropriate processing for each tag.

Warning

There is a limitation in this naive approach to namespace processing that should be obvious: do not attempt to define another prefix for the xml-lit namespace! This weave code is guaranteed to work only if the namespace has only one, unique prefix throughout the document.

      
--Code fragment from file: weave.c--


#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <expat.h>
#include "xml-lit.h"
#include "hash.h"
#include "elements.h"

char target_ns_prefix[80];

/* These are the output entities that produce the text for various
   kinds of weave output text. */
#define ASSIGN_STR "&#x2261;"
#define ASSIGN_ADD_STR "&#x2261;+"
#define OPEN_DEF_STR "&#x00AB;"
#define CLOSE_DEF_STR "&#x00BB;"

static hashtable ht;
static char *cur_fragmap_name = NULL;
static int fragmap_counter = 0;


      
    

Now, we have the start element handler below which outputs all the start elements except those that belong to the xml-lit namespace. But first we need to determine the prefix used by this namespace. Since we have a namespace-unaware parser, namespace declarations look just like attributes, so we look for an attribute that begins with 'xmlns:' and see if the value of the attribute is equal to the URI of xml-lit. The rest of the attribute name is the prefix used.

      
--Code fragment from file: weave.c--


static void
weave_elem_start(void *ud, const char *el, const char **attr)
{
  int i;
  FILE *fp = (FILE *)ud;

  for (i=0; attr[i]; i+=2) {
    if (strncmp(attr[i], "xmlns:", 6) == 0 &&
        CHECKNS(attr[i+1], NAMESPACE_URI, NAMESPACE_URI_LENGTH)) {
      strncpy(target_ns_prefix, attr[i]+6, 80);
      strncat(target_ns_prefix, ":", 80);
      break;
    }
  }


      
    

Then we output each element. We first check whether the element belongs to our target namespace by seeing whether it has our prefix. If so, it is processed and then discarded.

      
--Code fragment from file: weave.c--


  if (strncmp(el, target_ns_prefix, strlen(target_ns_prefix)) == 0) {
    char *localname;
    int tval;

    localname = strrchr(el, ':') + 1;
    tval = ELEM_UNKNOWN;
    for (i=0; elements[i].name; i++) {
      if (strcmp(localname, elements[i].name) == 0) {
        tval = elements[i].tval;
        break;
      }
    }
    if (tval == ELEM_UNKNOWN)
      return;
    switch (tval) {
    case ELEM_CODE:
      for (i=0; attr[i]; i+=2) {
        char *localname;

        localname = strrchr(attr[i], ':') + 1;
        if (strcmp(localname, ATTR_FILENAME) == 0)
          fprintf(fp, "\n&#x002D;&#x002D;Code fragment from file: %s&#x002D;&#x002D;\n", attr[i+1]);
      }
      break;
    case ELEM_FRAGMAP:
      for (i=0; attr[i]; i+=2) {
        char *localname;

        localname = strrchr(attr[i], ':') + 1;
        if (strcmp(localname, ATTR_NAME) == 0) {
          if (hash_lookup(ht, attr[i+1]))
            error_exit("fragment map id %s redefined\n", attr[i+1]);
          cur_fragmap_name = strdup(attr[i+1]);
          fragmap_counter++;
          if (hash_install(ht, cur_fragmap_name, (void *)fragmap_counter))
            error_exit("hash table space exhausted (use -T to increase)\n");
          fprintf(fp, "\n%s (%s) [%d]: ", OPEN_DEF_STR, attr[i+1],
                  fragmap_counter);
          break;
        }
      }
      break;
    case ELEM_FRAGMENT:
      for (i=0; attr[i]; i+=2) {
        char *localname;
        int counterval;

        localname = strrchr(attr[i], ':') + 1;
        if (strcmp(localname, ATTR_NAME) == 0) {
          if (!(counterval = (int)hash_lookup(ht, attr[i+1])))
            error_exit("fragment map id %s referenced but not found\n", attr[i+1]);
          fprintf(fp, "\n%s (%s) [%d] %s %s\n", OPEN_DEF_STR, attr[i+1],
                  counterval, CLOSE_DEF_STR, ASSIGN_ADD_STR);
          break;
        }
      }

    }
    return;
  } else {
    fprintf(fp, "<%s", el);
  }


      
    

Now, we output the attributes. It should be fairly easy to do so. All that needs to be done to do it is print out each attribute name followed by the value in quotes, so we loop through each one at a time. After the last attribute has been done, we print a closing '>' and end. The only special case is if the attribute is actually a namespace declaration for the xml-lit namespace, in which case it gets thrown away.

      
--Code fragment from file: weave.c--


  for (i=0; attr[i]; i+=2) {
    if ((strncmp(attr[i], "xmlns:", 6) == 0) &&
         CHECKNS(attr[i+1], NAMESPACE_URI, NAMESPACE_URI_LENGTH))
      continue;
    else
      fprintf(fp, " %s=\"%s\"", attr[i], attr[i+1]);
  }
  fprintf(fp, ">");

}

      
    

Similar work is done with closing tags. Again, we test whether it belongs to the xml-lit namespace, and output nothing if it it is.

      
--Code fragment from file: weave.c--



static void
weave_elem_end(void *ud, const char *el)
{
  FILE *fp = (FILE *)ud;
  char *localname;
  int tval, i;

  if (strncmp(el, target_ns_prefix, strlen(target_ns_prefix)) == 0) {
    localname = strrchr(el, ':') + 1;
    tval = ELEM_UNKNOWN;
    for (i=0; elements[i].name; i++) {
      if (strcmp(localname, elements[i].name) == 0) {
        tval = elements[i].tval;
        break;
      }
    }
    if (tval == ELEM_FRAGMAP)
      fprintf(fp, "%s\n", CLOSE_DEF_STR);
    return;
  }
  

      
    

and then simply output all other elements as closing tags:

      
--Code fragment from file: weave.c--


  fprintf(fp, "</%s>", el);
}

      
    

We also set a default handler that will simply output all other stuff within the document without change.

      
--Code fragment from file: weave.c--

static void
weave_defaulthandler(void *ud, const XML_Char *s, int len)
{
  FILE *fp = (FILE *)ud;
  fwrite(s, sizeof(XML_Char), len, fp);
}
      
    

Now, we have setup_weave() which is supposed to set up the handlers to the XML parser passed to it, open the output file, and set that as the user data.

      
--Code fragment from file: weave.c--


void
setup_weave(XML_Parser p, char *fn, int hashtablesize)
{
  char *outfn;
  FILE *fp;

  outfn = (char *)malloc(sizeof(char)*(strlen(fn) + 6));
  strcpy(outfn, fn);
  strcat(outfn, ".out");
  fp = fopen(outfn, "w");
  if (!fp) {
    perror(progname);
    error_exit("unable to open output file %s\n", outfn);
  }

  XML_SetReturnNSTriplet(p, 1);
  XML_SetUserData(p, fp);
  XML_SetElementHandler(p, weave_elem_start, weave_elem_end);
  XML_SetDefaultHandler(p, weave_defaulthandler);
  ht = hash_init(hashtablesize);
  free(outfn);
}
  
      
    

This weaver is a very simple affair, and produces ugly but understandable replacement text for the XML-Lit namespace elements. Improvements to this will have to wait for the next version.