XmlPL By Example: Web Document Processor

This example illustrates using XmlPL to create a Web document formatter. More specifically, we will create an XmlPL program which reads a simple XML file and produces a web document in XHTML format with a hyper-linked table of contents and properly numbered sections and subsections. All in 74 lines of code.

Normally, this takes a fair amount of manual labor to do in XHTML alone. This entire website was create using a similar, albeit more complicated, XmlPL program.

Steps

  1. Create The Web Page
  2. Create an Input Document
  3. Create the Processor
  4. Create the Table of Contents
  5. Process sections
  6. Compile & Process

1. Create The Web Page

First we use the main function to create a basic Web page.

import xmlpl.xml;

node[] main(document in) {
  <html>
    <head>
      <title>value(in/webdoc/@title);</title>
    </head>
    <body>
      <h1>value(in/webdoc/@title);</h1>
    </body>
  </head>
}

You could also include a style sheet in the head.

2. Create an Input Document

Notice we are already make some assumptions about the input file. Lets create an input file now so we have some data to process. We will use a little Lorem ipsum to fill the space.

<webdoc title="Lorem ipsum">

  <h2>Table of Contents</h2>
  <contents/>

  <section title="Lorem ipsum">
    <p>
      Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Vestibulum
      tristique. Suspendisse non nisl. Ut viverra augue sed quam. Aliquam
      vel ante a risus mattis egestas. Mauris felis elit, placerat quis,
      pellentesque eget, dapibus vitae, nunc. Pellentesque convallis
      molestie enim. Aenean ornare orci ut dolor. Nulla facilisi. Nulla
      lorem. Cras imperdiet sem vel enim. Nullam vehicula vulputate
      massa. Ut a eros eu sapien dictum pulvinar.
      <a href="http://en.wikipedia.org/wiki/Vestibulum">Vestibulum</a>
      ut tellus.
    </p>

    <p>
      Sed et diam. Quisque pulvinar vestibulum arcu. Praesent at metus quis
      libero volutpat consectetuer. In id leo quis enim luctus
      eleifend. Vivamus lobortis sem vel risus. Pellentesque eleifend ipsum
      eget mi. Donec vel arcu. Praesent nonummy nisl eu massa. Etiam
      pede. <i>Nulla et quam non risus ullamcorper faucibus. Vivamus sit amet
      felis.</i> Praesent tempor. Proin sed felis non sem vehicula
      sodales. Maecenas sapien. Aenean consequat pede eu justo. Donec
      gravida, libero eu dictum ullamcorper, elit orci consectetuer lorem,
      eu pretium eros augue pellentesque leo. Sed mauris orci, viverra in,
      vestibulum non, cursus ac, odio. Sed a diam vel turpis pellentesque
      hendrerit.
    </p>

    <p>
      Praesent condimentum mauris eu est. Nulla vitae nisl a nunc luctus
      dignissim. In interdum, ligula sed tincidunt molestie, magna felis
      viverra ante, vitae ultrices est lectus vitae quam. Pellentesque
      laoreet justo ac pede. Suspendisse fringilla lectus ut justo. Morbi
      vitae libero. Suspendisse nisl enim, bibendum ac, rutrum non, feugiat
      vulputate, purus. Donec ultricies euismod nunc. Nam ullamcorper
      dignissim lectus. Cum sociis natoque penatibus et magnis dis
      parturient montes, nascetur ridiculus mus. Cum sociis natoque
      penatibus et magnis dis parturient montes, nascetur ridiculus
      mus. Praesent lorem. Nulla facilisi. Morbi vel sem vitae lorem laoreet
      lacinia.
    </p>
  </section>

  <section title="Pellentesque habitant">
    <p>
      Pellentesque habitant morbi tristique senectus et netus et malesuada
      fames ac turpis egestas. Pellentesque posuere urna eu nunc. Etiam vel
      massa. Cras at pede porttitor augue aliquam suscipit. Donec
      nulla. Quisque iaculis. Cras aliquam consequat magna. Aenean vitae sem
      in nisl elementum egestas. Etiam sed odio id diam condimentum
      fringilla. Aliquam gravida. Suspendisse fermentum, odio vel
      consectetuer molestie, dui orci iaculis nunc, ut varius lacus turpis
      ut quam. Nunc mattis, mauris ac accumsan tincidunt, nisl ligula
      nonummy urna, at vulputate ligula dui quis nisl. In laoreet
      scelerisque nulla. Sed eget lorem.
    </p>

    <section title="Morbi felis metus">
      <p>
        Morbi felis metus, laoreet vitae, vehicula at, rutrum quis,
        nibh. Praesent at magna. Proin dolor magna, hendrerit eget, sagittis
        aliquet, pellentesque eu, turpis. Nunc faucibus enim quis
        enim. Quisque at ipsum id elit gravida lobortis. Nunc euismod. Mauris
        nisi mauris, venenatis sed, luctus ac, eleifend id, neque. Cras sit
        amet leo. In sollicitudin metus vel sapien. Fusce semper erat in
        lorem. Fusce suscipit varius purus. Aenean in purus eget diam laoreet
        imperdiet. Duis sit amet magna. Fusce pharetra. Nunc
        sollicitudin. Donec ante mauris, placerat et, pulvinar quis, dictum
        at, mauris.
      </p>
    </section>

    <section title="Pellentesque placerat">
      <p>
        Pellentesque placerat, magna convallis sodales egestas, elit lorem
        rhoncus diam, non tempus turpis nunc eleifend purus. Nunc rhoncus
        massa vel neque. Integer arcu velit, tincidunt dignissim, fermentum
        sed, faucibus id, massa. Donec ipsum. Vivamus elementum urna eu
        massa. Nulla facilisi. Sed porta. Cum sociis natoque penatibus et
        magnis dis parturient montes, nascetur ridiculus mus. Etiam faucibus
        ligula eu purus. Nulla quis tortor vel risus scelerisque
        feugiat. Pellentesque vitae augue sed turpis molestie
        scelerisque. Fusce congue vehicula tellus. Maecenas in lorem ac lorem
        bibendum laoreet.
      </p>
    </section>
  </section>
</webdoc>

The only tags we will concern our selves with are the webdoc, contents, and section tags. The other tags are standard XHTML and will be copied verbatim into the output document.

Build the program and process the input file using the following commands:

xmlplcc webdoc.xpl
./webdoc <input.xml >output.html

This produces the following webpage:

3. Create the Processor

You will notice that only the page title and header are shown this is because we have not done anything with the rest of the input data. There is one more thing to take care of before we start producing interesting output. The program must scan through the input file and perform special actions when it sees contents or section tags. All other tags must be copied. The following code does this and is easily extendible to handle new types of tags if you choose to add them.

import xmlpl.xml;
import xmlpl.curl;

:: Forward Declaration
node[] evaluate(element e);

node[] doChildren(element e) {
  foreach (e/node())
    if (Element(.)) evaluate(Element(.));
    else .; :: If it's not an element just copy it.
}

node[] doContents(element e) {
}

node[] doSection(element e) {
}

node[] evaluate(element e) {
  switch (name(e)) {
  :: Process contents starting from the parent element
  case "contents": doContents(e/..); break;
  case "section": doSection(e); break;

  default:
    :: Copy the element
    <(name(e))>
      e/@*; :: Copy attributes

      doChildren(e);
    </>
  }
}

node[] main(document in) {
  <html>
    <head>
      <title>value(in/webdoc/@title);</title>
    </head>
    <body>
      <h1>value(in/webdoc/@title);</h1>

      foreach (in/webdoc/*)
        evaluate(.);

    </body>
  </head>
}

4. Create the Table of Contents

Let's now create the table of contents.

import xmlpl.xml;
import xmlpl.curl;

:: Forward Declaration
node[] evaluate(element e);

node[] doChildren(element e) {
  foreach (e/node())
    if (Element(.)) evaluate(Element(.));
    else .; :: If it's not an element just copy it.
}

node[] doContents(element e) {
  <ol>
    foreach (e/section) <li>
      <a href=("#" + url_escape(@title))>
        value(@title);
      </a>

      if (./section) doContents(.);
    </li>
  </ol>
}

node[] doSection(element e) {
}

node[] evaluate(element e) {
  switch (name(e)) {
  :: Process contents starting from the parent element
  case "contents": doContents(e/..); break;
  case "section": doSection(e); break;

  default:
    :: Copy the element
    <(name(e))>
      e/@*; :: Copy attributes

      doChildren(e);
    </>
  }
}

node[] main(document in) {
  <html>
    <head>
      <title>value(in/webdoc/@title);</title>
    </head>
    <body>
      <h1>value(in/webdoc/@title);</h1>

      foreach (in/webdoc/*)
        evaluate(.);

    </body>
  </head>
}

Notice that we include the curl library and use the url_escape so that our anchors are URL safe, i.e. with out spaces or other URL special characters.

If you compile this program and process the input it should produce the following output:

5. Process sections

Now we process the sections:

import xmlpl.xml;
import xmlpl.curl;

:: Forward Declaration
node[] evaluate(element e, integer depth, integer &index, string label);

node[] doChildren(element e, integer depth, integer &index, string label) {
  foreach (e/node())
    if (Element(.)) evaluate(Element(.), depth, index, label);
    else .; :: If it's not an element just copy it.
}

node[] doContents(element e) {
  <ol>
    foreach (e/section) <li>
      <a href=("#" + url_escape(@title))>
        value(@title);
      </a>

      if (./section) doContents(.);
    </li>
  </ol>
}

node[] doSection(element e, integer depth, integer &index, string label) {
  integer subindex = 1;
  string newlabel = label + index + ".";

  <a name=(url_escape(e/@title))>
    <("h" + (depth + 1))>
      newlabel;
      " ";
      value(e/@title);
    </>
  </a>

  doChildren(e, depth + 1, subindex, newlabel);

  index++;
}

node[] evaluate(element e, integer depth, integer &index, string label) {
  switch (name(e)) {
  :: Process contents starting from the parent element
  case "contents": doContents(e/..); break;
  case "section": doSection(e, depth, index, label); break;

  default:
    :: Copy the element
    <(name(e))>
      e/@*; :: Copy attributes

      doChildren(e, depth, index, label);
    </>
  }
}

node[] main(document in) {
  integer index = 1;

  <html>
    <head>
      <link rel="stylesheet" href="style.css" type="text/css"/>
      <title>value(in/webdoc/@title);</title>
    </head>
    <body>
      <h1>value(in/webdoc/@title);</h1>

      foreach (in/webdoc/*)
        evaluate(., 1, index, "");

    </body>
  </head>
}

Several variables have been added to keep track of section numbering.

This final version also adds the following style sheet to make the text a little easier on the eyes:

body {
  width: 700px;
  margin: auto;
  background-color: #dddddd;
}

h2,h3,h4 {
  border-bottom: 1px solid black;
}

6. Compile & Process

Again, compile the program and process the input with the following commands:

xmlplcc webdoc.xpl
./webdoc <input.xml >output.html

This should produce the following web page: