The main program

The header file xml-lit.h contains prototypes for exported functions, declarations for exported global variables, and useful macro definitions. All modules which use this header file must include expat.h before including it because some of the prototypes use Expat data structures.

      
--Code fragment from file: xml-lit.h--



#ifndef _XML_LIT_H

#define _XML_LIT_H


      
    

These two functions: setup_tangle() and setup_weave() set up the default handlers for the tangle and weave modes of the program. They are defined in the section called “The tangler” and the section called “Weaving” respectively. The other function treetangle() writes the tangled files out.

      
--Code fragment from file: xml-lit.h--



extern void setup_tangle(XML_Parser p, int processmode, int hashsize);
extern void treetangle(void);
extern void setup_weave(XML_Parser p, char *fn, int hashsize);


      
    

These two functions usage_exit() and warn() perform error reporting. The latter prints or exits for a general error (fatal or nonfatal), and the former prints a brief summary of the options to the program. They are defined in the section called “Error Handling”. There is also error_exit() which produces a level 5 warning for fatal errors.

      
--Code fragment from file: xml-lit.h--


extern void usage_exit(void);
extern void warning(int level, const char *fmt,...);
extern void error_exit(const char *fmt,...);

      
    

The following are global variables used to control the program's overall behavior. The progname variable gives the name under which the program was invoked. infilename gives the current file being processed by the program. processmode defines whether we perform new-style xml-lit style processing with <xml-lit:code/> elements (if true), or old-style xmltangle style processing with <programlisting/> tags (if false). cmode controls whether C-style #line directives are generated when source is tangled. These last two variables are controlled by command line switches.

      
--Code fragment from file: xml-lit.h--


extern char *progname;
extern char *infilename;
extern int cmode;

      
    

These macros define constants used for new-style processing. NAMESPACE_URI defines the URI used for the xml-lit namespace.

      
--Code fragment from file: xml-lit.h--


#define NAMESPACE_URI "http://dido.engr.internet.org.ph/projects/xml-lit/"
#define NAMESPACE_URI_LENGTH 50

      
    

OS_ELEM and OS_ATTR define the element and attribute looked for by the program during old-style xmltangle-style processing.

      
--Code fragment from file: xml-lit.h--


#define OS_ELEM "programlisting"
#define OS_ATTR "role"

      
    

These constants control the parser. DELIMCHAR defines the delimiter the parser uses to separate the URI and the local name in namespace-aware processing. BUFSIZE controls the size of the buffer used by the program. PATHSEP is the system path separator. Change it to '\\' under MS-DOS or Windows.

      
--Code fragment from file: xml-lit.h--


#define DELIMCHAR '|'
#define BUFSIZE 8192
#define DEFAULT_HASH_SIZE 6
#define PATHSEP '/'
#define VERSIONSTR "1.0"

      
    

The CHECKNS macro tests whether a particular name belongs to a namespace given to it.

      
--Code fragment from file: xml-lit.h--



#define CHECKNS(name, ns, nslength) (strncmp((name), (ns), (nslength))==0)

#endif


      
    

Expat works in a manner similar to SAX, in that the document is parsed serially and handlers are set for various "events" that occur within the document, such as a start tag or end tag appearing. Therefore, the basic operation of xml-lit is built around this functionality. Handlers are set for start and end tags, the tags are tested to see if they are of the type recognized by the program, and the appropriate action taken. Of course, the Expat library first needs to have these handler functions set up, and it would need to feed the parser with data from the input file, which is the almost the sole purpose of main(). We shall soon see how this works in main.c.

Expat requires a buffer where intermediate XML text is stored, which we declare here. The default size of the buffer is controlled by BUFSIZE, declared above, and can be adjusted to improve efficiency if that is needed. We also declare the variables filename, cmode, and processmode whose purposes have already been previously explained, and finally weave, which determines whether we are in weave or tangle mode.

      
--Code fragment from file: main.c--


#include <stdio.h>
#include <string.h>
#include <expat.h>
#define _GNU_SOURCE
#include <getopt.h>
#include "xml-lit.h"

static char buffer[BUFSIZE];
char *infilename;
char *progname;
int cmode = 1;
static int processmode = 1;
static int weave = 0;


      
    

To read the options we use the GNU version of getopt, which allows short options and long options. Future versions of this program will have a rudimentary getopt for systems that don't have GNU getopt available. The variables optarg and optind are internal variables used by getopt. The long_options array contains the long option equivalents of the options.

      
--Code fragment from file: main.c--


extern char *optarg;
extern int optind;

static struct option long_options[] = {
  { "c-mode", no_argument, NULL, 'c' },
  { "help", no_argument, NULL, 'h' },
  { "no-line", no_argument, NULL, 'n' },
  { "hash-table-size", required_argument, NULL, 'T' },
  { "tangle", no_argument, NULL, 't' },
  { "version", no_argument, NULL, 'v' },
  { "weave", no_argument, NULL, 'w' },
  { "xmltangle", no_argument, NULL, 'x'},
  { "Xml-lit", no_argument, NULL, 'X' }
};


      
    

The main program begins here. It is a relatively simple program, mainly a support structure for Expat, reading in data from the XML file specified on the command line and sending it to the parser. It will first check whether there are enough arguments on the command line:

      
--Code fragment from file: main.c--


int
main(int argc, char **argv)
{
  FILE *infile;
  XML_Parser p;
  int c, hash_size = 6, oindex;

  progname = strrchr(argv[0], PATHSEP);
  progname = (progname) ? (progname + 1) : argv[0];
  if (argc < 2) {
    fprintf(stderr, "Too few arguments.\n");
    usage_exit();
  }

      
    

then it attempts to parse the command line. The switches recognized at the moment are -n, which turns of 'C mode' and prevents the parser from generating #line preprocessor directives, and -c which does exactly the opposite. There is also a -X switch which causes the program to revert to old-style xmltangle-style behavior (looking up role elements) and -x which does the opposite. It does this by looping through all of the arguments which begin with a '-' and stopping when it sees one which is no longer a command line switch.

      
--Code fragment from file: main.c--


  while ((c=getopt_long(argc, argv, "chntT:wXx?", long_options, &oindex)) != -1) {
    switch(c) {
    case 'c':
      cmode = 1;
      break;
    case 'n':
      cmode = 0;
      break;
    case 't':
      weave = 0;
      break;
    case 'T':
      hash_size = atoi(optarg);
      break;
    case 'v':
      fprintf(stderr, "XML-Lit Version %s\n", VERSIONSTR);
      exit(0);
      break;
    case 'w':
      weave = 1;
      break;
    case 'x':
      processmode = 0;
      break;
    case 'X':
      processmode = 1;
      break;
    case '?':
    case 'h':
    default:
      usage_exit();
      break;
    }
  }

      
    

Now it will attempt to open the file specified on the command line. If there are none it will print an error message and exit.

      
--Code fragment from file: main.c--


  if (optind >= argc) {
    fprintf(stderr, "No input files\n");
    usage_exit();
    exit(1);
  }
  infilename = argv[optind];
  infile = fopen(infilename, "r");
  if (!infile) {
    perror(progname);
    error_exit("file %s could not be opened\n", infilename);
  }

      
    

All the Expat-related work begins here. In Weave mode we call setup_weave() to set up the default handlers the parser will use, while in tangle mode, we call setup_tangle(). If we are in Weave mode or in old-style processing mode, we create a parser that is not namespace-aware, because namespace processing is not relevant in those cases, being a waste of energy for the latter, and a requirement in the former case, since we need to output other namespace declarations as though they were attributes.

      
--Code fragment from file: main.c--


  p = (processmode && !weave) ? XML_ParserCreateNS(NULL, DELIMCHAR) : XML_ParserCreate(NULL);
  if (!p) {
    error_exit("Couldn't allocate memory for parser\n");
  }

  (weave) ? setup_weave(p, infilename, hash_size) : setup_tangle(p, processmode, hash_size);


      
    

This is the main loop used by Expat. It is essentially the same as the main loop used in the examples of Expat's use in the source distribution [CLAR01], nothing fancy is done here. It simply reads data into the buffer and passes it to the parser which in turn will call the appropriate handlers which we defined above. It does this until there is no longer any data within the file passed, whence it will close the files for a clean exit.

      
--Code fragment from file: main.c--


  for (;;) {
    int done;
    int len;

    len = fread(buffer, sizeof(char), BUFSIZE, infile);
    if (ferror(infile)) {
      perror(progname);
      error_exit("Error reading file %s\n", infilename);
    }
    done = feof(infile);

    if (!XML_Parse(p, buffer, len, done)) {
      error_exit("Parse error at line %d:\n%s\n",
	      XML_GetCurrentLineNumber(p),
	      XML_ErrorString(XML_GetErrorCode(p)));
    }
    if (done)
      break;
  }
  fclose(infile);
  if (!weave && processmode)
    treetangle();
  return(0);
}